How does the confidence formula work?

Arvexi Cortex scores each account using a five-factor weighted formula: variance analysis (35%), auto-reconciliation rate (20%), matching completeness (20%), materiality threshold (15%), and historical accuracy (10%). Each factor produces a score between 0 and 1. The weighted sum yields a final confidence score between 0 and 100. The formula is fully transparent and deterministic. You can inspect every factor's contribution on any account's detail page, and your auditors can independently verify every calculation.

How much does a Cortex sweep cost?

A typical account investigation costs between $0.05 and $0.15, depending on the number of data tools invoked and narrative complexity. A full sweep across a 200-account portfolio runs approximately $0.08 total. You control costs through three levers: the max AI calls setting, the TOP_N sweep mode (which limits investigations to the lowest-confidence accounts), and the materiality threshold (which skips immaterial variances). Narrative-only sweeps are even cheaper since they skip the full investigation pipeline.

How often should I run sweeps?

Most teams run a FULL sweep once per close cycle. Monthly for monthly closers, quarterly for quarterly reporters. Between full sweeps, a daily TOP_N sweep of the 20-50 lowest-confidence accounts catches emerging issues before they compound. For high-risk accounts that need continuous monitoring, event-driven mode triggers immediate investigation when qualifying events occur, regardless of the sweep schedule.

What is the calibration feedback loop?

After each sweep, controllers and reviewers confirm or override Cortex conclusions. When a reviewer corrects a score, say, marking a variance as acceptable that Cortex flagged, the correction pattern is stored. In future sweeps, Cortex applies that learned pattern to similar situations. Over successive close cycles, confidence scores become more accurate without any manual tuning. Teams typically see measurable improvements after three to four cycles of consistent feedback.

What is event-driven mode?

Event-driven mode transforms Cortex from a scheduled batch processor into a real-time monitor. When specific events occur, a GL balance change, a new reconciliation item, a failed auto-match, Cortex triggers an immediate investigation on the affected account. During IN_CLOSE periods, event-driven mode is especially valuable because it re-scores accounts as data flows in, rather than waiting for the next scheduled sweep.

Can Cortex take actions automatically, or only investigate?

Cortex investigates and recommends, but it cannot take destructive actions without human approval. The BLOCKED_ACTION_TYPES configuration explicitly prevents Cortex from posting journal entries, certifying reconciliations, approving accounts, or modifying master data. These guardrails are intentional. Cortex is designed to do the investigative work, pulling data, tracing transactions, computing variances, generating narratives, and present its conclusions for human sign-off.

Arvexi Cortex Configuration: Optimizing Autonomous Reconciliation

What Cortex does

Arvexi Cortex is the autonomous investigation engine at the center of the Arvexi platform. It reviews account reconciliations, identifies variances that need attention, investigates root causes using a suite of seven specialized data tools, and generates audit-ready work papers that document every step of its reasoning. All without human intervention.

The core principle is simple: Cortex does the investigative work, humans review exceptions. For a typical 200-account portfolio, 80-85% of accounts reconcile cleanly and require only a cursory review. The remaining 15-20% have variances, open items, or anomalies that require genuine accounting judgment. Cortex ensures your team spends its time on the accounts that matter, not the ones that are routine.

Cortex operates through seven data tools, each designed for a specific type of analysis: a variance analyzer that decomposes differences by component, a transaction tracer that follows items from the GL back to source documents, a match engine that resolves reconciling items using eight condition types, a trend detector that compares current-period data against historical patterns, a materiality calculator that evaluates variances against your defined thresholds, a narrative generator that drafts investigation work papers with full citations, and a flux analyzer that computes period-over-period movements. Each tool produces structured output that feeds into the confidence score and the final work paper.

Every investigation surfaces exceptions. The specific items Cortex cannot resolve on its own. These exceptions are routed to the appropriate reviewer with full context: the confidence score, the investigation narrative, the data each tool produced, and Cortex's recommendation. The reviewer makes the final call. This division of labor is what makes autonomous reconciliation practical at scale.

Confidence scoring formula

Every account in your reconciliation portfolio receives a confidence score between 0 and 1. The score represents Cortex's assessment of how likely the account is to be correctly reconciled. Higher scores mean lower risk. Lower scores mean the account needs investigation. The formula is fully deterministic and published. your auditors can verify every calculation independently.

The five-factor formula

The confidence score is a weighted sum of five factors, each producing a value between 0 and 1:

Confidence = (Variance × 0.35) + (AutoRecon × 0.20) + (Matching × 0.20) + (Materiality × 0.15) + (Historical × 0.10)

The weights reflect the relative importance of each factor for reconciliation assurance. Variance analysis carries the heaviest weight because the core question. Does the GL balance agree with the supporting detail?. Is the most direct measure of reconciliation quality. Historical accuracy carries the lightest weight because past performance is informative but not deterministic.

Variance analysis (35%)

The largest weight goes to the fundamental reconciliation question. The variance factor scores 1.0 when the variance is zero and declines proportionally as the absolute variance increases relative to the account balance. A $50 variance on a $1M account scores higher than a $50 variance on a $5,000 account. This proportionality ensures that Cortex evaluates variances in context, not as raw dollar amounts.

Auto-reconciliation rate (20%)

This factor measures how much of the reconciliation was completed automatically. If 95% of line items matched without human intervention, the auto-reconciliation factor is 0.95. If only 40% matched, the factor is 0.40. A high auto-reconciliation rate indicates clean, well-structured data from the source systems. which correlates strongly with reconciliation accuracy.

Matching completeness (20%)

The matching factor evaluates whether all reconciling items have been resolved. Open items. Unmatched transactions, unexplained reconciling items, stale items carried forward from prior periods. Reduce the score. An account with zero open items scores 1.0 on this factor. An account with aged items that have been open for multiple periods scores significantly lower.

Materiality threshold (15%)

The materiality factor evaluates the variance against your organization's defined materiality threshold. If the variance is well below materiality, the factor scores high even if the variance is not zero. This prevents Cortex from over-investigating immaterial variances while ensuring material differences always receive attention. The threshold is configurable at the entity, account group, and individual account level.

Historical accuracy (10%)

The historical factor examines the account's track record across prior close cycles. An account that has been correctly reconciled for twelve consecutive periods with no audit adjustments scores higher than an account that required manual corrections in three of the last four periods. History is a strong predictor of future reliability, and this factor ensures that consistently clean accounts are not over-investigated.

Interpreting scores

Cortex maps the raw confidence score to three tiers that drive investigation behavior:

GREEN (≥ 0.85). High confidence. The account is likely correctly reconciled. Suitable for streamlined review or auto-certification if your policy allows it. Cortex generates a brief summary narrative but does not run the full investigation pipeline.
AMBER (≥ 0.50). Moderate confidence. Minor open items, small variances, or data quality concerns are present. Cortex runs the full investigation pipeline, generates a detailed narrative, and flags specific items for reviewer attention.
RED (< 0.50). Low confidence. Significant reconciliation issues detected. Cortex runs an extended investigation, generates a comprehensive work paper, and routes the account to the senior reviewer queue with urgent priority.

Sweep modes

A sweep is a single run of Cortex across a set of accounts. Arvexi supports three sweep modes, each designed for a different operational context. The choice between them is a tradeoff between coverage, depth, and cost.

FULL sweep

A FULL sweep investigates every account in the portfolio. Cortex computes confidence scores for all accounts, runs the full data tool pipeline on accounts below the investigation threshold, generates work papers, and flags accounts that need human review. A FULL sweep on a 500-account portfolio typically completes in 15-30 minutes depending on the complexity of the reconciliations.

Use FULL sweeps at the end of each close cycle when you need comprehensive, auditable coverage across the entire portfolio. Most teams run one FULL sweep per close, timed for the day after all journal entries have posted and sub-ledger feeds are complete.

TOP_N sweep

A TOP_N sweep investigates only the N lowest-confidence accounts. You configure N (default: 20). Cortex scores all accounts to determine the ranking but only runs the full investigation pipeline on the bottom N. This mode is cost-effective for daily monitoring between close cycles. You get updated confidence scores across the portfolio and deep investigations on the accounts that need them most.

TOP_N is the recommended mode for daily or weekly sweeps. It catches emerging issues. A new unmatched transaction, a variance that appeared after a late journal entry. Without incurring the cost of investigating accounts that remain clean.

NARRATIVE_ONLY sweep

A NARRATIVE_ONLY sweep scores all accounts and generates investigation narratives, but does not run the full data tool pipeline. This mode produces work papers based on the scoring data alone, without the deep transaction-level analysis that the seven tools provide. Use it for mid-cycle check-ins where you want updated confidence scores and high-level observations without incurring full investigation cost.

Scheduling sweeps

Sweeps can be triggered manually from the Cortex dashboard or scheduled to run automatically on a recurring cadence. Most teams configure two recurring schedules that work in tandem: a monthly FULL sweep aligned with the close calendar, and a daily or weekly TOP_N sweep that catches issues between closes.

Schedules are configured in the Cortex settings page. You select the sweep mode, the cadence (daily, weekly, monthly, or custom cron expression), the specific time, and which account groups to include. Cron-based scheduling gives you precise control. For example, you can schedule a FULL sweep for the third business day of each month at 6:00 AM, after your ERP has finished posting period-end journals.

Stale run timeout

Cortex enforces a configurable stale run timeout (default: 10 minutes) that terminates any investigation that has not progressed within the timeout window. This prevents a single problematic account from blocking the entire sweep. If an investigation times out, Cortex logs the failure, assigns the account a RED score, and continues to the next account. Timed-out accounts appear in the sweep summary with a clear "investigation timeout" flag so your team knows to investigate them manually.

Sweep windows and quiet hours

To avoid running sweeps during active data loading, configure a quiet window. If your ERP posts journal entries between midnight and 4:00 AM, schedule the daily sweep for 6:00 AM to ensure Cortex operates on the final posted data. Sweep results are available in the dashboard within minutes of completion, and a summary notification is sent to the configured Slack channel or email distribution list.

Cost control

Each investigation involves one or more LLM calls to analyze data, generate narratives, and produce work papers. Controlling cost is a matter of controlling how many accounts are investigated and how deeply. Arvexi provides three primary levers, and most teams find that the defaults produce the right balance between coverage and spend.

Investigation cost breakdown

A typical account investigation costs between $0.05 and $0.15, depending on the number of data tools invoked and the complexity of the narrative. Simple accounts with clean matches might use two or three tool calls ($0.05). Complex accounts with multiple reconciling items, historical anomalies, and long narratives might use eight to ten tool calls ($0.15). A full sweep across a 210-account portfolio costs approximately $0.08 total. A fraction of a cent per account for the straightforward cases that Cortex resolves without deep investigation.

For context, the labor cost of manual investigation typically runs $15-$40 per account when performed by a staff accountant. Even a conservative estimate of five minutes per account at $60/hour yields $5 per account. Cortex reduces investigation cost by 95-99% while documenting its work more thoroughly than most manual processes achieve.

Max AI calls

The max AI calls setting caps the total number of LLM invocations per sweep. If Cortex reaches the cap before completing all investigations, it stops and reports which accounts were not reached. Set this limit based on your budget and the number of accounts. A reasonable starting point is ten calls per account times the number of accounts you expect to investigate.

NARRATIVE_ONLY for cost reduction

If cost is a primary concern, consider running NARRATIVE_ONLY sweeps for routine monitoring and reserving FULL sweeps for the close cycle. Narrative-only investigations skip the expensive data tool pipeline and produce work papers from scoring data alone. The trade-off is less depth. You get confidence scores and high-level observations but not the transaction-level tracing that the full pipeline provides.

Calibration feedback

The calibration feedback loop is how Cortex gets smarter over time. After each sweep, controllers review Cortex conclusions and either confirm or correct them. These corrections are not discarded. They are stored as calibration data and applied in future sweeps. Over successive close cycles, Cortex learns your team's judgment patterns without any manual tuning.

How corrections flow back

When a controller corrects a confidence score. For example, overriding an AMBER score to GREEN because a particular variance type is expected for that account. The correction pattern is recorded with full context: the account type, the variance characteristics, the scoring factors at the time of correction, and the controller's rationale. In subsequent sweeps, Cortex checks incoming accounts against the library of stored corrections. When it finds a match, it adjusts its scoring accordingly.

This is not a black-box learning process. Every adjustment Cortex makes based on calibration data is logged and visible in the investigation narrative. The work paper will note, for example, "Confidence adjusted from 0.72 to 0.88 based on calibration pattern #47: timing variances under $2,000 on prepaid expense accounts are consistently marked acceptable by reviewers." Auditors can review the full calibration library and the specific patterns that influenced any given score.

Calibration metrics

The Cortex dashboard tracks calibration performance over time: confirmation rate (what percentage of conclusions were confirmed without changes), override rate, rejection rate, and average confidence accuracy by account group. These metrics tell you whether Cortex is improving and where it still needs the most human oversight. Teams typically see measurable accuracy improvements after three to four close cycles of consistent feedback.

Cortex rules

Cortex rules let you codify specific accounting policies and judgment calls that Cortex should follow during investigations. Rules are conditional instructions that override or augment Cortex's default behavior for specific situations. Think of them as standing orders. The institutional knowledge that lives in your team's heads, made explicit and enforceable.

Rule types

Threshold overrides. Adjust the investigation threshold for specific accounts or account groups. For example, you might set a tighter threshold for intercompany accounts (investigate anything below 0.95) and a looser threshold for low-risk prepaid accounts (investigate only below 0.60).
Skip rules. Tell Cortex to skip investigation for accounts matching specific criteria, such as "skip all cash accounts with a variance under $100."
Escalation rules. Force investigation and flag for senior review when specific conditions are met, such as "always escalate intercompany accounts with a variance above $10,000."
Classification rules. Tell Cortex how to classify specific variance types, such as "timing differences on revenue accounts are expected if the variance reverses within five business days."
Narrative rules. Require specific disclosures in the investigation narrative, such as "always include the prior period balance for goodwill accounts."

Configuring rules by account type

Rules are written in plain English on the Cortex Rules page. Cortex interprets each rule using its language model and applies it during investigations. Each rule has a scope (all accounts, a specific account group, or a specific account) and a priority (rules with higher priority override conflicting lower-priority rules). Start with five to ten rules that encode your most common judgment calls. The decisions your senior accountants make automatically but have never documented. Add more rules as you encounter situations where Cortex's default behavior does not match your team's expectations.

Learned patterns from the calibration feedback loop can also be promoted to formal rules. If Cortex has learned through calibration that timing variances under $2,000 on prepaid accounts are always acceptable, you can codify that as an explicit rule to make the behavior transparent and auditable.

Event-driven mode

Event-driven mode transforms Cortex from a scheduled batch processor into a real-time monitor. Instead of waiting for the next scheduled sweep, Cortex triggers an investigation immediately when a qualifying event occurs. This is particularly valuable during IN_CLOSE periods, when data is changing rapidly and waiting for the next batch sweep means working with stale confidence scores.

Supported event triggers

Cortex listens for two primary event types that signal a change in reconciliation state:

emitGlBalanceChanged. Fired when the GL balance for a monitored account changes. This captures journal entries, adjustments, and any other postings that affect the account balance. During IN_CLOSE periods, this event triggers immediate re-scoring so the confidence dashboard always reflects the latest data.
emitReconciliationCreated. Fired when a new reconciliation is initiated for a monitored account. This captures both manual reconciliation starts and automated feeds, ensuring Cortex evaluates the new reconciliation data as soon as it arrives.

You configure which event types trigger investigations and which account groups are eligible. For each event type, set the scope (which accounts), the materiality threshold (minimum dollar change to trigger), and the investigation depth (FULL or NARRATIVE_ONLY).

IN_CLOSE behavior

During IN_CLOSE periods. The active close window when your team is processing reconciliations. Event-driven mode is especially powerful. Every GL balance change triggers an immediate re-score of the affected account. If the new score drops below the investigation threshold, Cortex runs a full investigation and updates the work paper in real time. Your team sees confidence scores that reflect the latest data, not yesterday's batch sweep. This eliminates the common close-time problem of reviewing accounts based on stale information.

BLOCKED_ACTION_TYPES

As a safety mechanism, Cortex never takes certain actions automatically, even in event-driven mode. The BLOCKED_ACTION_TYPES configuration explicitly prevents Cortex from:

Posting journal entries. Cortex can draft adjusting entries and include them in the work paper, but posting to the GL requires human approval
Certifying reconciliations. Cortex can recommend that an account is ready for certification, but the sign-off is a human action
Approving accounts. Cortex can flag accounts as "recommended for approval" but cannot mark them as approved in the workflow
Modifying records. Cortex can suggest changes to lease records, schedules, or master data, but cannot alter them directly

These guardrails are intentional and reflect a fundamental design principle: Cortex handles the investigative and analytical work that consumes 80% of your team's time, while humans retain authority over every action that changes financial data. The blocked action types are configurable. If your organization's controls allow it, you can narrow the list. But most teams keep the defaults.

Event-driven cost considerations

Because event-driven mode can trigger investigations at any time, it has the potential to generate higher costs than scheduled sweeps if many events fire in a short period. Use the daily cost cap setting to limit event-driven spend. When the cap is reached, new events are queued and processed during the next scheduled sweep instead of immediately. This ensures cost predictability without sacrificing coverage. The events are still processed, just on a batch schedule rather than in real time.

Arvexi Cortex configuration: optimizing autonomous reconciliation