Industry Insights
What 1,000 leases and six months of data entry reveal about extraction costs
The lease abstraction market is $2.5 billion -40% larger than the software market itself. Here is what the numbers actually look like, and why AI changes the equation.
The lease abstraction market -the business of reading lease PDFs and entering data into accounting systems -is a $2.5 billion industry. That is 40 percent larger than the lease accounting software market itself. The pain of getting data into systems costs more than the systems themselves.
Most of that spend goes to manual labor: offshore BPO teams, Big 4 consultants, or internal staff dedicating months to what is fundamentally data entry. The numbers are striking when laid out plainly.
The real cost of manual extraction
Offshore BPO teams in India charge $90 to $250 per lease. A portfolio of 500 or more leases takes four to eight weeks to process. The price is low, but the turnaround is slow and the quality is inconsistent -missed escalation clauses, incorrect commencement dates, payment schedules that do not account for CPI adjustments.
Big 4 firms -KPMG, PwC, Deloitte, EY -charge $500 to $4,000 per lease depending on complexity. A 1,000-lease engagement runs six to eight months. KPMG contractors bill $100 to $150 per hour for what is, at its core, data entry. The premium buys a recognized name on the SOC report and a layer of professional liability insurance. It does not buy faster or more accurate extraction.
Internal staff extraction looks cheaper on paper but rarely is in practice. The loaded cost per lease runs $120 to $500 when you account for salary, benefits, and the opportunity cost of pulling accountants off their actual work. A team of four dedicating three to six months to extraction is not doing monthly close, not managing audits, not handling the work they were hired for.
Platform vendor extraction sits in the middle. A vendor's implementation team charges $70 to $80 per lease as a baseline for bulk upload processing of 1,000 leases -roughly $54,000 to $70,000 for the engagement. Full human extraction with all clauses captured runs $150 per lease, pushing the total to $115,000 to $200,000 for the same portfolio. These are published implementation fees, not estimates.
Why enterprises pay for extraction and call it a bargain
The math is straightforward. A lease team of four people -two staff accountants and two seniors -costs roughly $500,000 per year in fully loaded compensation. Dedicating that team to manual extraction for two months represents about $83,000 in direct labor cost, plus the opportunity cost of pulling them off close, audit prep, and every other responsibility. An extraction engagement at $70,000 to $150,000 replaces those two months and lets the team keep doing their actual work.
The enterprise is not paying for data entry. It is paying to keep its team focused on close. Two months of four accountants' time devoted to reading PDFs and typing numbers into templates is time they are not spending on journal entries, reconciliations, quarterly disclosures, and audit requests. The implementation fee looks expensive until you calculate what it costs to have that team unavailable for two months.
Controllers who have been through the exercise once understand this immediately. The second time a portfolio needs re-abstraction -after an ASC 842 amendment, a system migration, or an acquisition that doubles the lease count -the decision is not even close. The team's time is worth more than the vendor's fee.
There is also the error cost that rarely shows up in the initial calculation. Manual extraction by staff accountants under time pressure produces data quality issues that surface during audit. A missed escalation clause on a material lease can misstate the right-of-use asset by hundreds of thousands of dollars. The rework cycle to correct it -re-reading the original document, recalculating the amortization schedule, restating prior periods -costs more than getting it right the first time.
Why AI-only extraction falls short
The current generation of AI extraction tools solves part of the problem. Prophia charges roughly $20 per document and focuses on commercial real estate. It extracts text competently but has no ASC 842 accounting intelligence -no payment escalation parsing, no amendment chain handling, no classification logic. It reads leases. It does not understand lease accounting.
LeaseLens runs about $25 per lease for basic extraction. No classification output. No confidence scoring. No multi-format export. The output requires substantial manual post-processing before it is usable in any accounting system.
Trullion bundles extraction with its platform, starting at $3,000 per year. There is no standalone extraction offering -you buy the full platform or you get nothing. User reviews consistently flag weakness on pro-rated payments, complex amendment structures, and documents that deviate from standard templates.
Generic document AI platforms -Hyperscience, Instabase, and their peers -handle structured documents well. Invoices, tax forms, bank statements. But they do not understand lease accounting. They cannot distinguish an operating lease from a finance lease. They cannot parse a payment escalation structure with CPI indexing. They cannot follow an amendment chain that modifies base terms across three successive documents. They extract text from PDFs. They do not extract lease accounting data.
The gap is specific and consequential. None of these tools understand ASC 842 classification tests. None handle payment escalation structures where base rent increases annually by the greater of 3 percent or CPI. None trace amendment chains that modify commencement dates, extend terms, or add expansion space with blended payment schedules. None separate lease components from non-lease components in a way that feeds directly into a right-of-use asset calculation.
They solve the OCR problem. The OCR problem was never the hard part.
What changes when extraction understands accounting
When the extraction engine understands lease accounting natively -ASC 842 classification criteria, payment escalation structures with variable indexing, amendment chains that modify base terms, lease and non-lease component separation -the output is not extracted text. It is structured data ready for import into any lease accounting system. Commencement dates, payment schedules, escalation rates, renewal options, termination penalties, and classification determinations arrive as validated fields, not raw strings that someone has to interpret.
Combined with confidence-based QA routing, the workflow changes fundamentally. High-confidence fields are auto-verified. Low-confidence fields are routed to human reviewers with the relevant clause highlighted and the extracted value presented for confirmation. The reviewer spends seconds per field instead of minutes per page. Native output to platform-specific formats -LeaseQuery, CoStar, Visual Lease, ProLease, and others -eliminates the manual mapping step that typically adds another week to the process.
The economics shift by an order of magnitude. The cost per lease drops. The timeline compresses from months to weeks. And the accuracy is higher than pure human extraction, because the engine never gets fatigued at lease 847 of 1,000 and the human reviewers focus only on what needs their judgment -the ambiguous clauses, the non-standard structures, the edge cases that actually require accounting expertise.
Where the industry goes from here
The best lease accounting teams will not manage large portfolios by hiring more people to read PDFs. They will deploy intelligence where it creates the most value -parsing documents, structuring data, running classification tests, generating amortization schedules -and reserve human judgment for where it matters most. The accountant's role shifts from data entry to data oversight. The portfolio scales without the team scaling with it. That is not a future state. For the teams that have already made the shift, it is how the work gets done today.
Stay in the loop
Subscribe to our newsletter to receive the latest from Arvexi.
More stories