ArvexiDocumentation

Understanding confidence scores

The five-factor formula

The confidence score is a weighted average of five independent factors. Each factor produces a sub-score between 0.00 and 1.00, and the final score is the weighted sum:

Score = (Variance * 0.35) + (AutoRecon * 0.20) + (Matching * 0.20) + (Materiality * 0.15) + (Historical * 0.10)

The weights reflect the relative importance of each signal. Variance is weighted highest because an unexplained difference between GL and subledger is the strongest indicator of reconciliation quality. Historical consistency is weighted lowest because past performance, while informative, does not guarantee current accuracy.

Color thresholds

The numeric score maps to three color bands:

  • GREEN: score ≥ 0.85. The reconciliation looks solid. Cortex found no significant issues. A reviewer can focus their attention elsewhere unless the account is high-risk.
  • AMBER: score ≥ 0.50 and < 0.85. The reconciliation has some issues that warrant attention. One or more factors scored below threshold. The reviewer should examine Cortex's findings and determine whether action is needed.
  • RED: score < 0.50. The reconciliation has significant issues. Cortex identified material variances, stale items, or patterns that suggest potential errors. This account requires immediate attention before certification.

These thresholds are the system defaults. They can be adjusted per entity or per account group in Settings > Cortex > Scoring Thresholds. Some organizations set a higher GREEN threshold (e.g., 0.90) for balance sheet accounts while keeping 0.85 for P&L accounts.

Factor 1: Variance (35%)

The variance factor measures the unexplained difference between the GL balance and the subledger or expected balance, relative to the account's materiality threshold.

  • 1.00: the variance is zero. The GL and subledger are in perfect agreement, or all differences are fully explained by documented reconciling items.
  • 0.50 - 0.99: the unexplained variance exists but is below the materiality threshold. The sub-score decreases linearly as the variance approaches the threshold.
  • 0.00 - 0.49: the unexplained variance exceeds the materiality threshold. The sub-score drops sharply, as a material unexplained variance is the single most important red flag in a reconciliation.

The formula for this factor:

variance_factor = max(0, 1 - (unexplained_variance / materiality_threshold))

If no materiality threshold is configured for the account, Cortex uses the entity-level default threshold. If neither is set, a system default of 1% of the account's average trailing-twelve-month balance is applied.

Factor 2: Auto-Reconciliation (20%)

The auto-reconciliation factor measures what percentage of the account's transaction volume was automatically matched by Arvexi's auto-reconciliation rules.

  • 1.00: 100% of transactions were auto-matched. Every entry in the GL had a corresponding entry in the subledger, and the match was unambiguous.
  • 0.50 - 0.99: a portion of transactions required manual matching or remain unmatched. The sub-score is the ratio of auto-matched transactions to total transactions.
  • 0.00: no transactions were auto-matched. This typically indicates a data quality issue (mismatched reference numbers, inconsistent formats) or an account type that does not support auto-matching.

Accounts that do not use transaction matching (e.g., balance- only reconciliations) receive a neutral score of 0.80 for this factor, reflecting that the absence of matching is expected, not a deficiency.

Factor 3: Matching (20%)

The matching factor evaluates the overall matching quality across all transactions in the period, both auto-matched and manually matched.

  • Unmatched item count: how many transactions on either side (GL or subledger) remain unmatched at the time of the investigation.
  • Unmatched item amount: the total dollar value of unmatched items, relative to the account balance.
  • Match age: how long unmatched items have been outstanding. Recently posted items (within the current period) are expected to be unmatched and are penalized less than items from prior periods.

The sub-score combines these three signals. An account with a handful of unmatched items from the current period that total less than 1% of the balance will score near 0.90. An account with 50 unmatched items spanning three periods will score much lower.

Factor 4: Materiality (15%)

The materiality factor assesses whether the reconciling items and variances are proportionate to the account's size and risk profile.

  • Reconciling items vs. threshold: are individual reconciling items within the account's materiality threshold? A single reconciling item that exceeds the threshold drags this factor down significantly.
  • Aggregate reconciling items vs. balance: what is the total of all reconciling items as a percentage of the account balance? A large accumulation of small items can be as concerning as a single large item.
  • Item documentation: do the reconciling items have explanations and expected resolution dates? Undocumented items receive a heavier penalty.

This factor complements the variance factor. An account can have zero unexplained variance (variance factor = 1.00) but still score low on materiality if the reconciling items that explain the variance are individually material, poorly documented, or excessively aged.

Factor 5: Historical (10%)

The historical factor compares the current period's reconciliation to the prior three periods for the same account.

  • Score trend: is the confidence score improving, stable, or deteriorating? An account that scored 0.92, 0.91, 0.90, 0.88 over four periods shows a deteriorating trend that drags this factor below 1.00.
  • Balance volatility: how much does the account balance swing between periods? High volatility is not inherently bad (some accounts are naturally volatile), but unexplained volatility reduces the sub-score.
  • Recurring findings: if Cortex flagged the same issue in prior periods and it remains unresolved, the historical factor penalizes the account. This is the “you were told about this before” signal.

For new accounts with fewer than three periods of history, this factor defaults to 0.75, a conservative-neutral starting point that neither rewards nor punishes the absence of historical data.

Interpreting scores

The confidence score is a prioritization tool, not a final verdict. Use it to allocate reviewer time efficiently:

  • RED accounts first: these have material issues that must be resolved before certification. Open the Cortex findings, examine the flagged items, and work with the preparer to resolve them.
  • AMBER accounts next: these have issues worth examining, but they may or may not require action. Read the findings. If the issue is a known timing difference or an expected pattern, document it and move on.
  • GREEN accounts last: these are the low-risk accounts. A quick spot-check is sufficient. Focus your detailed review time on RED and AMBER.

The score does not replace the reviewer's judgment. An account can score GREEN but still warrant investigation if the reviewer has context that Cortex does not, for example, knowledge of a pending restatement or a known process change that affected the period.

How to verify independently

Cortex shows its work. For any scored account, you can verify the score by:

  1. Open the investigation log: navigate to the account's reconciliation, click Cortex > Investigation Log. Every tool call, response, and reasoning step is recorded.
  2. Check each factor: the score breakdown panel shows the sub-score for each of the five factors. Click any factor to see the underlying data points that drove that sub-score.
  3. Review the data: the tool responses in the investigation log show you exactly what data Cortex saw. If the variance factor scored low, look at the Balance Details response to see the exact unexplained amount.
  4. Compare to prior periods: the historical tab shows the account's confidence scores over time. Unexpected changes in score indicate something changed in the underlying data or reconciliation quality.
  5. Provide feedback: if the score does not match your assessment, use the calibration workflow to correct it. Cortex learns from corrections and adjusts future scoring.

Score persistence and recalculation

Confidence scores are calculated once per investigation and persist until the account is re-investigated. Scores are not updated automatically when the underlying data changes (e.g., when the preparer adds a reconciling item or clears a variance).

To get an updated score after making corrections, run a targeted investigation on the account. The new investigation will see the corrected data and recalculate all five factors.

During a full close sweep, Cortex investigates every assigned account in sequence. The sweep results show a dashboard with score distribution (how many GREEN, AMBER, RED), the top findings across all accounts, and any accounts whose scores changed significantly from the prior period.

Was this article helpful?