AI in Accounting
The future of the financial close: AI agents that do the work
Four levels of AI in accounting, from commentator to controller. Where the industry is today, where Arvexi is, and the trust architecture that makes AI autonomy possible.
The financial close has not fundamentally changed in 30 years. The tools have improved (spreadsheets gave way to ERP modules, ERP modules gave way to cloud platforms) but the process is the same. Pull data. Match transactions. Investigate variances. Document findings. Get approval. Repeat for every account, every entity, every period.
AI agents are about to change the process itself. Not incrementally. Structurally.
Four levels of AI in accounting
It helps to think about AI in accounting as a maturity model with four distinct levels. Each level represents a step change in what the AI does and what the human does.
Level 1: The Commentator
The AI observes your financial data and comments on it. "This account has a larger variance than last month." "This transaction matches a known pattern." "This balance is outside the expected range."
Level 1 AI is essentially reporting with intelligence. It tells you where to look, but it does not look for you. The investigation, the judgment, and the documentation are still your responsibility.
Most accounting AI products on the market today operate at Level 1. They flag, they score, they suggest. They do not act.
Level 2: The Investigator
The AI does not just flag a problem. It investigates it. It pulls data from multiple sources, cross-references evidence, identifies the root cause, and presents a structured finding with supporting documentation.
At Level 2, the AI does the work that previously required a senior accountant to sit in front of a screen for hours. The human's role shifts from investigator to reviewer. You read the finding, assess whether the conclusion is sound, and approve or override.
This is where Arvexi Cortex operates today. The investigation agent has access to seven data tools, follows structured investigative methodology, and produces audit-ready work papers. It handles the 80% of investigation work that is systematic, freeing your team to focus on the 20% that requires genuine professional judgment.
Level 3: The Preparer
The AI does not just investigate variances. It prepares complete reconciliations and produces draft financial statements. It pulls data, matches transactions, investigates exceptions, documents findings, generates journal entries for adjustments, and assembles the reconciliation package.
At Level 3, the human's role is quality assurance. You review a completed work product, not a list of items to investigate. The difference is the same as the difference between editing a draft and writing from scratch.
Level 3 requires something Level 2 does not: the ability to take action on financial records, not just analyze them. Posting journal entries, updating reconciliation status, certifying balances: these are write operations, not read operations. The trust bar is higher.
Level 4: The Controller
The AI manages the close end-to-end. It assigns work, tracks deadlines, escalates issues, makes judgment calls within defined parameters, and involves humans only for decisions that exceed its authority or confidence threshold.
Level 4 is not a near-term reality for most organizations. It requires not just capable AI, but organizational trust, regulatory acceptance, and a track record of reliable autonomous operation at Levels 2 and 3.
But it is the direction. And the organizations that build the trust infrastructure now, at Level 2, will be the ones that reach Level 4 first.
Four levels of AI in accounting
Level 1: Commentator
AI observes data and flags anomalies. Investigation and documentation remain manual.
Level 2: Investigator
AI pulls data, cross-references evidence, and produces structured findings. Humans review.
Level 3: Preparer
AI prepares complete reconciliations and draft financial statements. Humans do QA.
Level 4: Controller
AI manages the close end-to-end, involving humans only for decisions exceeding its authority.
Where the industry is today
Be honest about the landscape. The vast majority of accounting teams operate below Level 1. They reconcile in spreadsheets, investigate by opening multiple screens and copying data between them, and document in Word templates. Their "automation" is copy-paste with keyboard shortcuts.
The teams that use modern reconciliation platforms (BlackLine, Trintech, FloQast, and others) operate at Level 1. The platform standardizes the process, provides dashboards and status tracking, and applies rules-based automation to transaction matching. AI features flag anomalies and suggest matches. But investigation is manual. Documentation is manual. The human is still the engine.
A small number of organizations are beginning to experiment with Level 2. They use AI to investigate specific account types (bank reconciliation, intercompany, high-volume transaction matching) and find that the AI handles the investigation as well as or better than manual work, in a fraction of the time.
The gap between Level 1 and Level 2 is not a technology gap. The technology exists. It is a trust gap.
The trust architecture
AI autonomy in accounting requires a trust architecture: a system of controls, transparency, and oversight that gives organizations confidence that AI-produced work meets their standards.
This is not a philosophical problem. It is an engineering problem. And it has specific components:
Full observability. Every action the AI takes is logged and reviewable. Every data source queried, every tool called, every reasoning step, every conclusion drawn. If an auditor asks "how did you arrive at this finding," the answer is a complete, step-by-step trace, not "the AI said so."
Cortex logs every investigation at the tool-call level. You can replay any investigation and see exactly what data the agent retrieved, what it compared, and how it reached its conclusion.
Confidence scoring. Not all AI outputs are equally reliable. A reconciliation where every transaction matched exactly and the balance ties to the penny is high-confidence. A reconciliation where the AI had to make assumptions about a timing difference with incomplete data is lower-confidence.
Confidence scoring quantifies this distinction. Every reconciliation gets a score. Your team sets the threshold for auto-certification versus human review. The score is not a black box. It is derived from specific, observable factors that your team can audit.
Calibration and learning. When your team overrides an AI conclusion, the system learns. Not by blindly adjusting, but by incorporating the feedback into its reasoning for future similar situations. Over time, the confidence scores become more accurate and the override rate decreases.
This feedback loop is the mechanism by which trust builds. Your team starts skeptical, reviews everything, overrides frequently. As the AI calibrates to your data, your policies, and your team's judgment patterns, the override rate drops and the auto-certification threshold rises.
Authority boundaries. Level 2 AI reads data and produces findings. It does not post journal entries or change balances. The human reviews and acts. As trust builds and the organization moves toward Level 3, authority boundaries expand, but always under explicit organizational control.
This is the critical difference between responsible AI and autonomous AI. Responsible AI operates within boundaries set by the organization. Autonomous AI operates without them. Arvexi's architecture is designed for the former.
What is coming
The trajectory from Level 2 to Level 3 is not a technology leap. The technology to prepare reconciliations end-to-end exists now. What changes is organizational readiness:
- Audit firm acceptance. External auditors need to develop frameworks for relying on AI-prepared work papers. This is happening (the Big Four are all investing heavily in AI-audit methodology) but it is not yet standardized.
- Regulatory clarity. SOX and IFRS do not prohibit AI-prepared reconciliations, but they do not explicitly address them either. Regulatory guidance on AI in financial reporting controls will emerge over the next 2-3 years.
- Track record. Organizations need evidence that Level 2 AI produces reliable results consistently over multiple close cycles before they will authorize Level 3 operations. This evidence is being built now, one close at a time.
The strategic implication
Organizations that adopt Level 2 AI now gain two advantages. The obvious one is operational: faster close, lower cost, fewer errors. The less obvious one is strategic: they are building the calibration data, the trust infrastructure, and the organizational muscle memory that Level 3 requires.
An organization that starts at Level 2 in 2026 will have 12-18 months of calibration data by the time Level 3 capabilities are widely available. Their AI will be tuned to their data. Their team will understand the trust model. Their auditors will have reviewed AI-produced work papers across multiple periods.
An organization that waits until Level 3 is "ready" will start the calibration process from zero. They will be 12-18 months behind, not in technology adoption, but in trust maturity. Arvexi's Financial Close platform is designed for this progression - ultimately enabling a continuous close where reconciliation and investigation happen throughout the month. Cortex operates at Level 2 today, with the architecture to support Level 3 as organizational readiness builds.
Cortex is not just a product feature. It is an investment in your organization's readiness for the next generation of the financial close.
See Cortex in action or explore the investigation agent and confidence scoring capabilities.
Stay in the loop
Subscribe to our newsletter to receive the latest from Arvexi.
More stories