Dojo Labs Whitepaper
Your AI Is Miscalculating Client ROI
And They'll Find Out Eventually
Why AI-Generated Financial Models Are the Biggest Unmanaged Risk in Advisory
Table of Contents
- Executive Summary
- 1.The AI-Powered Advisory Firm
- 2.Anatomy of AI-Hallucinated Financial Models
- 3.Why LLMs Cannot Do Financial Analysis
- 4.The Discovery Timeline
- 5.The Reputation Multiplier
- 6.Why Current Safeguards Are Insufficient
- 7.Liability Exposure
- 8.Case Scenarios
- 9.The Computation Layer Solution
- 10.Practical Framework
- 11.Recommendations
- 12.Conclusion
Executive Summary
Advisory firms are deploying AI to generate ROI projections, business cases, and financial models at unprecedented speed. The efficiency gains are real. The accuracy is not.
For boutique advisory firms and fractional leaders, 60-80% of new business comes from referrals. Your reputation is your pipeline. When a client reconciles your projected 34% ROI against their actual 12%, they do not blame the AI tool you used. They blame you.
This paper documents how AI-generated financial models fail in advisory contexts, why those failures are architecturally inevitable with current LLM approaches, and what the three-year revenue impact looks like when a single wrong model reaches the wrong boardroom.
The solution is not abandoning AI. It is a computation layer architecture that separates what AI is good at (narrative, synthesis, pattern recognition) from what it cannot do (math, financial modeling, deterministic calculation).
78%
Advisory professionals using AI for financial deliverables
< 20%
Who validate AI outputs before sending to clients
66 pts
Largest usage-validation gap (ROI projections)
$1.2M+
3-year revenue impact of one wrong financial model
The AI-Powered Advisory Firm
Up to 78% of advisory professionals now use AI tools for financial deliverables. The pressure is coming from every direction: clients expect faster turnaround, competitors are quoting lower fees with AI-assisted workflows, and the tools themselves are remarkably convincing in their output quality.
A fractional CFO who once spent three weeks building a synergy model can now generate a first draft in hours. A strategy consultant who spent days researching market entry costs can have a complete financial model by morning. The output looks polished, detailed, and precise.
But there is a critical gap between AI usage and AI validation. The same professionals who rely on AI for their most important deliverables are not systematically verifying the financial outputs.
The Adoption-ROI Paradox
Only 14% of CFOs have seen clear, measurable ROI from AI investments (RGP CFO AI Survey, Dec 2025, n=200).
71% of CFOs are NOT currently using GenAI in finance/accounting despite 94% indicating it could benefit their function (Bain Capital Ventures, late 2024, n=50).
Gartner AI in Finance Survey (Nov 2025, n=183 CFOs): 58% of finance functions using AI; 91% report low or moderate initial impact (Gartner, 2025).
The gap between adoption enthusiasm and measurable ROI underscores the verification problem. High usage without rigorous validation produces activity, not outcomes.
Exhibit 1
AI Usage vs. Validation by Task Type
Percentage of advisory professionals using AI for each task vs. percentage who systematically validate outputs
Source: Dojo Labs analysis of advisory AI adoption patterns, Q1 2026. Cross-validated against McKinsey 2025 (79% regular GenAI use), RGP 2025 (14% CFO ROI), and Gartner 2025 (58% finance AI adoption) survey data.
66 pts
The gap between AI usage (78%) and systematic validation (12%) for ROI projections — the highest-stakes deliverable in advisory
Six out of seven AI-generated ROI projections are delivered without rigorous financial verification
Exhibit 1B
The Validation Gap
Average AI usage vs. systematic validation across all six advisory task types
12%
Validated
Of the 75% average AI usage rate, only 12% includes systematic validation. The remaining 63% of AI-generated financial outputs reach clients without rigorous verification.
Anatomy of AI-Hallucinated Financial Models
Financial model hallucinations are not random noise. They follow predictable patterns that map to how large language models process numerical information. Understanding the taxonomy helps you know where your models are most vulnerable.
Fabricated ROI
The AI generates a return figure that has no basis in any underlying data. A 34% projected ROI appears in a deliverable when no model or source supports it.
Invented Savings
Cost reduction projections are manufactured from training data patterns rather than actual operational analysis. The savings figure looks reasonable but maps to nothing real.
Phantom Benchmarks
Industry comparisons and benchmark figures are pulled from the model's training data rather than verified sources. The benchmarks may be outdated, fabricated, or from entirely different industries.
Misapplied Formulas
The AI applies the wrong financial formula for the context. A DCF model uses inappropriate discount rates, or an IRR calculation ignores terminal value entirely.
Compounding Assumption Errors
Each assumption in a multi-step model carries error. When five assumptions each carry 10-15% error, the final output can be off by 50% or more. The AI does not flag this compounding risk.
False Precision
The AI presents outputs to two decimal places, implying a level of accuracy that does not exist. Clients see $2,147,832.41 and believe it was calculated, not generated.
Exhibit 3
What AI Generated vs. What Was Real
Composite data from anonymized advisory engagements. Cross-validated against McKinsey 2025 (79% regular GenAI use), RGP 2025 (14% CFO ROI), and Gartner 2025 (58% finance AI adoption) survey data.
| Deliverable | AI Projected | Actual Result | Variance |
|---|---|---|---|
| M&A Synergy Model | $2.1M / 22% IRR | $600K / 7% IRR | -71% / -68% |
| Market Expansion | Break-even Month 14 | Month 31+ | +121% |
| Headcount Reduction | $900K savings | -$215K (net loss) | -124% |
| SaaS Pricing Model | +34% revenue lift | +11% | -68% |
| AP Automation ROI | $430K savings | $140K | -67% |
The headcount reduction variance of -124% means the AI projected savings that turned into a net loss. Forty-seven positions were eliminated based on this model.
Exhibit 2B
AI Projections vs. Actual Outcomes
Side-by-side comparison showing the gap between AI-generated projections and verified actual results
Source: Composite data from anonymized advisory engagements, Dojo Labs Q1 2026. Cross-validated against McKinsey 2025 (79% regular GenAI use), RGP 2025 (14% CFO ROI), and Gartner 2025 (58% finance AI adoption) survey data.
Why LLMs Cannot Do Financial Analysis
Financial modeling requires two capabilities that large language models fundamentally lack: deterministic computation and calibrated uncertainty. An LLM predicts the next token in a sequence. It does not execute formulas, and it has no mechanism to signal when its outputs are unreliable.
When you ask an LLM to project ROI for a $15M acquisition, it does not discount cash flows. It generates tokens that look like a DCF model based on patterns in its training data. The Excel formulas it produces may look syntactically correct but compute to wrong results. This is the “spreadsheet illusion” — a model that appears rigorous but is structurally hollow.
The Core Problem
A financial calculator computes NPV of $1.2M using a 12% discount rate over 5 years. Every time. Without exception.
An LLM might generate $1.47M or $980K for the same inputs. There is no internal calculator. There is no uncertainty marker. The wrong number looks exactly like a right one.
The AI does not know the difference between its correct outputs and its hallucinated ones. It presents both with identical confidence and formatting precision.
This is not a bug in current models. It is an architectural reality of transformer-based systems. These models were designed to generate plausible language, not to guarantee numerical accuracy. The “spreadsheet illusion” is particularly dangerous because it produces outputs that pass visual inspection — properly formatted cells, reasonable-looking formulas, professional presentation — while the underlying computations are wrong.
Some AI tools use code execution to offload calculations. This is the right architectural direction, but most commercially available advisory tools have not implemented this consistently across all financial outputs. The model may compute some metrics deterministically while hallucinating others in the same deliverable.
Documented AI Financial Modeling Failures
ChatGPT-4o produced inconsistent NPV calculations and incorrectly applied the WACC of the acquired firm instead of the acquiring firm in M&A analysis (International Journal of Financial Studies, 2024).
Microsoft Copilot generated errors in income statements starting from Depreciation, creating knock-on errors in EBIT, Net Income, and tax calculations (FM Magazine, IMA/CIMA, May 2025).
On the CPA exam, ChatGPT-4 scored only 67.8% without tools but reached 85.1% with a calculator — the difference between failing and passing all four sections (Review of Accounting Studies, 2024).
BYU study (327 co-authors, 186 institutions): ChatGPT 3.5 scored 47.4% vs. students' 76.7% on accounting assessments. Key finding: “ChatGPT doesn't always recognize when it is doing math” (Issues in Accounting Education).
Exhibit 3B
The Spreadsheet Illusion
What advisory professionals see vs. what is actually happening inside the AI
- ✓Clean cell references
- ✓Professional formatting
- ✓Formulas that look correct
- ✓Precise decimal outputs
Input tokens
"Build an ROI model for..."
Pattern matching
"34% looks right for this industry"
Token prediction
Output: 34.2% (no computation occurred)
- ✗No actual computation
- ✗No formula execution
- ✗No uncertainty signal
- ✗Pattern recall, not math
The AI-generated model passes visual inspection. The formulas look syntactically correct. The numbers appear precise. But no actual calculation was performed. The 34.2% is a prediction, not a computation.
The Discovery Timeline
Financial model errors in advisory do not detonate immediately. They follow a predictable timeline that makes them more damaging than errors in real-time reporting. By the time the client discovers the problem, they have already made irreversible decisions based on your numbers.
Client is impressed by the depth and speed of your financial analysis. Engagement is won. The AI-generated model looks comprehensive, detailed, and precise.
Initial results are ambiguous. Some numbers are tracking, some are not. The client attributes variance to normal execution drift. Benefit of the doubt remains.
The client's finance team runs their own numbers. Gaps between your projections and reality begin to surface. Questions are raised internally but not yet escalated.
A formal reconciliation meeting is requested. The client is now auditing your model assumptions, not collaborating on strategy. The relationship has fundamentally shifted.
Engagement is terminated. The stated reason is “strategic realignment.” The actual reason is that your projected 34% ROI turned out to be 12%. Trust is gone.
No referral comes. The client does not publicly criticize you. They simply never mention your name again. Your referral pipeline goes quiet and you cannot pinpoint why.
The Silent Cascade Is the Expensive Part
Most advisory firms never learn the real reason a client left or why referrals dried up. The client does not announce “your model was wrong.” They simply stop referring you. The damage compounds silently over months and years.
The Reputation Multiplier
For boutique advisory firms and fractional executives, 60-80% of new business comes from referrals and reputation. Unlike large consulting firms with institutional brands, your personal credibility is your business development engine.
A single wrong financial model does not just cost you one client. It costs you the entire downstream network that client would have generated over three years.
The Referral & Retention Economics
Professional services average annual churn: 27% (CustomerGauge).
Improving retention by 5% increases profits 25-95% (Bain & Company / Reichheld).
85% of professional services firm new business comes from referrals (Meetanshi).
Referred customers have 16% higher lifetime value and 18% lower churn (Harvard Business Review).
When one wrong model costs a referral, the downstream revenue impact compounds far beyond the lost engagement itself.
Exhibit 5B
The Reputation Multiplier Cascade
How one wrong financial model compounds into $1.2M+ in losses over three years
One wrong financial model
Projected 34% ROI, actual 12%
-$120K
Lost engagement
Lost referrals
3 potential referrals that never come
-$360K
3 x $120K engagements
Premium erosion
Pricing power declines across remaining pipeline
-$120K
Rate compression
Remediation & recovery
Rebuilding trust, marketing, repositioning
-$370K
Recovery costs
3-Year Total Impact
>$1.2M
Cumulative damage
Each step compounds. The client does not just leave — they take your entire downstream referral network with them.
Exhibit 6
Three-Year Revenue Impact of One Wrong Model
Cumulative cost cascade for a boutique advisory firm
Lost engagement revenue
Lost renewal
Referral damage (3 lost referrals)
Premium erosion
Remediation costs
Reputation recovery
THREE-YEAR TOTAL
From one wrong financial model delivered to one client
“I could have done this myself with ChatGPT”
This is the sentence that kills advisory firms. When a client realizes your AI-generated model contains the same errors they would get from a consumer AI tool, your premium positioning collapses. You are no longer the expert who brings proprietary methodology. You are a middleman with a markup. The moment a client says this, your engagement is over and your referral value drops to zero.
Why Current Safeguards Are Insufficient
Every advisory professional we interviewed believes they have adequate safeguards against AI errors. Every one of them was wrong. Here are the six most common claims and why they fail.
“I review everything before it goes to clients”
You review narrative quality, not mathematical accuracy. When the AI says projected ROI is 34%, you evaluate whether 34% sounds reasonable for the industry. You do not rebuild the model from source data to verify 34% is correct. You are reviewing the story, not the math.
“I use AI as a starting point only”
Anchoring bias makes this dangerous. If the AI generates a 34% ROI projection, your “independent” review will anchor to that number. You might adjust to 28% or 30%. The actual figure might be 12%. The AI's starting point distorts your professional judgment.
“I validate against industry benchmarks”
Circular validation. If the AI generated both the projection and the benchmarks, you are validating a hallucination against another hallucination. Unless your benchmarks come from a verified, independent source, this step provides false confidence.
“My clients are sophisticated enough to catch errors”
Sophisticated clients catching your errors is not a safeguard. It is a delayed detonation. When your client's finance team discovers the discrepancy, the damage to your credibility is worse than if you had caught it yourself.
“I always provide ranges, not point estimates”
A hallucinated range is still hallucinated. If the AI generates a range of 28-40% ROI when the actual outcome is 12%, the range provides no protection. It merely creates a wider band of wrong.
“This is an economics problem, not a technology problem”
Thorough verification of AI-generated financial models takes nearly as long as building the model manually. If you are truly verifying every assumption, formula, and output, the efficiency gain from AI disappears. The economics of AI-generated financial models only work if you skip verification.
Exhibit 6B
Actual Effectiveness of Common Safeguards
Estimated error-detection rate for each safeguard claim, based on advisory engagement analysis
Catches formatting issues; misses computational errors
Anchoring bias distorts independent judgment
Circular validation when AI generates both projection and benchmark
A hallucinated range is still hallucinated
Delayed detection; decisions already made
No commonly cited safeguard exceeds 20% effectiveness. The gap between perceived and actual error detection is the core risk in AI-assisted financial advisory.
Liability Exposure
Advisory firms face legal exposure that most have not considered. When clients make investment decisions, restructure workforces, or commit capital based on your AI-generated financial models, the liability trail leads directly to you.
Professional Liability and Standard of Care
Advisory professionals are held to a standard of care that requires reasonable competence. Delivering AI-generated financial models without verification may fall below this standard. The fact that you used a tool does not eliminate your duty to ensure the output is accurate.
E&O Insurance Gaps
Most errors-and-omissions policies were written before AI-generated deliverables existed. Your carrier may argue that relying on unverified AI output constitutes a failure to exercise professional judgment, voiding coverage precisely when you need it most.
Fiduciary Obligations for Fractional CFOs
Fractional CFOs often serve in fiduciary capacity. Using AI to generate financial projections that inform capital allocation, M&A decisions, or board recommendations creates heightened personal liability. A fiduciary cannot delegate judgment to a tool that guesses.
The Discovery Problem
AI chat logs, prompt histories, and tool usage records are discoverable in litigation. If a client sues over a bad financial model, opposing counsel can subpoena your AI prompts and demonstrate that the model was generated, not built. Your chat log becomes Exhibit A.
Case Scenarios
The following scenarios are composites drawn from real advisory engagements. Names and specific details have been changed, but the failure patterns and financial outcomes are representative of what happens when AI-generated models go unchecked.
Scenario 1: Fractional CFO — M&A Synergy Model
A fractional CFO uses AI to build a synergy model for a $15M acquisition target. The model projects $2.1M in annual synergies with a 22% IRR.
AI Projected
$2.1M synergies / 22% IRR
Actual Result
$600K synergies / 7% IRR
Variance
-71% / -68%
What happened: The acquiring company paid a premium based on the synergy projections. Eighteen months post-close, the CFO-of-record faces board scrutiny. The AI fabricated synergy estimates by averaging training-data M&A outcomes rather than modeling the specific operational overlap.
Scenario 2: Strategy Consultant — Geographic Expansion
A boutique strategy firm uses AI to model a client's expansion into three new metropolitan markets. The model projects break-even at Month 14.
AI Projected
Break-even Month 14
Actual Result
Month 31+ (and counting)
Variance
+121%
What happened: The client committed $3.4M in lease obligations and hiring based on the Month 14 break-even. By Month 18, cash reserves are depleted. The AI underestimated market entry costs by applying national averages instead of metro-specific data it did not have.
Scenario 3: Operations Advisor — Headcount Reduction
An operations advisor uses AI to model a workforce restructuring for a 200-person manufacturing firm. The model projects $900K in annual savings.
AI Projected
$900K annual savings
Actual Result
-$215K (net loss)
Variance
-124%
What happened: Forty-seven positions were eliminated based on the model. Institutional knowledge loss, overtime costs, quality defects, and rehiring expenses turned projected savings into a net loss. The AI modeled headcount as a simple cost line without accounting for operational interdependencies.
The Computation Layer Solution
The solution is not abandoning AI in advisory. The solution is re-architecting the pipeline so that AI does what it excels at (narrative generation, pattern synthesis, research summarization) while deterministic systems handle what AI cannot do (financial computation, formula execution, sensitivity analysis).
This requires a five-layer architecture where financial outputs are never generated by the language model. They are computed by deterministic engines and injected into the AI's context as verified facts.
Exhibit 8
Five-Layer Computation Architecture for Advisory
Source-Grounded Financial Computation
All financial inputs pulled from verified sources: client financials, market data feeds, regulatory filings. No training-data assumptions.
Sandboxed Code Execution
Every formula executes in a deterministic compute environment. DCF, IRR, NPV, and sensitivity models run as auditable code, not token predictions.
Assumption Validation Layer
Every assumption is tagged, sourced, and bounded. If an assumption exceeds historical ranges, it is flagged before the model runs.
Real-Time Output Validation
Model outputs are cross-checked against independent calculations, industry benchmarks from verified databases, and internal consistency checks.
Sensitivity & Confidence Scoring
Every output carries a confidence score and sensitivity range. Clients see not just the number but how much it changes if key assumptions shift.
Key principle: The LLM never performs financial calculations. It receives pre-computed, pre-validated numbers and generates narrative, insights, and recommendations around them. Every number in the deliverable has an auditable computation path.
The Evidence for Tool Augmentation
Research consistently demonstrates that pairing LLMs with deterministic computation tools dramatically improves accuracy:
GPT-4 accuracy on the MATH benchmark: 42.2% without tools, 84.3% with Code Interpreter — a 2x improvement (Zhou et al., ICLR 2024).
Medical calculation accuracy: LLaMa at 11% without tools, 88% with deterministic tools — a 5.5x improvement (npj Digital Medicine, 2025).
Apple GSM-Symbolic study (ICLR 2025): adding a single irrelevant clause to math problems caused performance drops of up to 65%, demonstrating that LLMs lack true mathematical reasoning.
The pattern is consistent across domains: LLMs paired with deterministic computation layers outperform LLMs operating alone by 2x to 5.5x on quantitative tasks.
- ✗LLM generates financial projections
- ✗Formulas are pattern-matched, not computed
- ✗No uncertainty quantification
- ✗Benchmarks from training data
- ✗No audit trail for calculations
- ✓Deterministic engine computes all figures
- ✓Formulas execute as auditable code
- ✓Sensitivity analysis on every output
- ✓Benchmarks from verified databases
- ✓Full computation path per metric
Practical Framework
Not every advisory deliverable carries the same risk. This three-tier framework helps you match your verification investment to the stakes of each engagement.
Internal analysis, preliminary research, directional estimates
Approach: AI leads with spot-check validation
- •Internal market sizing for strategic planning
- •Preliminary competitive analysis
- •Early-stage opportunity assessment
Client-facing deliverables, board presentations, budget recommendations
Approach: Mandatory human verification of all financial outputs
- •Quarterly business reviews with financial projections
- •Budget reallocation recommendations
- •Vendor evaluation with cost analysis
Investment decisions, M&A, restructuring, pricing strategy
Approach: Computation-layer architecture or fully human-computed
- •M&A synergy models and valuations
- •Workforce restructuring financial impact
- •Capital allocation and investment decisions
12-Point Pre-Send Checklist
Before any financial deliverable leaves your desk, answer these twelve questions. If you cannot confidently answer “yes” to all twelve, the model is not ready for the client.
Red Flags
AI output matches training-data patterns too closely
If the ROI projection looks like a textbook example, it probably is one.
No sensitivity analysis available
If the model does not show how outputs change when inputs shift, the numbers are unreliable.
Benchmarks without citations
If you cannot trace a benchmark to a specific, dated source, the AI likely generated it.
False precision in uncertain contexts
If a five-year projection shows numbers to the cent, the model is presenting guesses as calculations.
Recommendations
Different stakeholders need to take different actions. Here are targeted recommendations for each audience.
For Advisory Firm Founders
- Treat financial model accuracy as your primary retention strategy. The $1.21M three-year cost of one wrong model dwarfs any AI efficiency savings.
- Implement a mandatory computation-layer requirement for all Tier 3 (high-stakes) deliverables. No AI-generated financial projections reach clients without deterministic verification.
- Audit your current AI-assisted deliverables this quarter. Select five recent financial models and have an analyst manually verify every number against source data.
- Update your E&O insurance to explicitly address AI-generated deliverables. Get written confirmation that coverage applies when AI tools are used in the workflow.
- Build a “model accuracy” metric into your firm's KPIs. Track projected vs. actual outcomes for every financial model you deliver.
For Analysts & Associates
- Never submit an AI-generated financial model without rebuilding the core calculations independently. Use the AI output as a structure reference, not a source of truth.
- Maintain a personal validation checklist for every model type you produce. Document which calculations you verified and how.
- Flag AI-generated benchmarks in every deliverable. If a comparison figure came from the model rather than a verified source, mark it as unverified.
- Build your own library of verified benchmarks, industry data, and reference models. Your value as an analyst is your ability to verify, not your ability to prompt.
- Document your AI usage for every deliverable. If the model is ever questioned, you need to show your verification trail.
For Clients Hiring Advisory Firms
- Ask your advisor how financial projections are computed. If the answer involves AI without a computation layer, you are receiving generated estimates, not calculated projections.
- Request sensitivity analysis on every financial model. If the advisor cannot show you how outputs change when assumptions shift by 10-20%, the model may not be grounded in real calculations.
- Include accuracy clauses in advisory contracts. Define acceptable variance thresholds and require the advisor to document their verification methodology.
- Have your internal finance team independently verify key assumptions before committing capital based on advisory projections.
- Ask for the computation path behind any number that drives a material decision. A credible advisor should be able to show you exactly how each figure was derived.
For Industry & Professional Bodies
- Develop AI disclosure standards for advisory deliverables. Clients deserve to know which parts of a financial model were computed vs. generated.
- Update professional standards of care to address AI-assisted financial modeling. The current framework does not account for the unique risks of LLM-generated outputs.
- Create certification programs for AI-augmented financial analysis that require demonstrated competence in verification methodology.
- Publish guidance on E&O insurance requirements for advisory firms using AI tools. The current coverage landscape has dangerous gaps.
- Establish industry benchmarks for acceptable variance between AI-projected and actual financial outcomes in advisory deliverables.
Conclusion
AI is transforming advisory. The firms that leverage it effectively will deliver better work, faster, at lower cost. These are real advantages that benefit both advisors and their clients.
But the current approach of allowing language models to generate financial projections is an unmanaged risk that threatens the foundation of advisory businesses: trust. When 78% of advisory professionals use AI for ROI projections but only 12% systematically validate the outputs, the industry is building on sand.
For boutique firms and fractional leaders, the math is stark. A single wrong financial model can cascade into $1.21M in lost revenue over three years. In a business where 60-80% of growth comes from referrals, you cannot afford a single model that detonates in a client's boardroom.
The advisory firms that win the next decade will not be the ones who adopt AI the fastest. They will be the ones who architect their AI workflows with computation layers that guarantee every number is calculated, not generated. Every projection is sensitivity-tested. Every deliverable carries an audit trail.
The fix is not less AI.
It is better-engineered AI.
Financial models must be computed, not generated. Advisory trust must be engineered, not assumed. Your reputation depends on it.
Ready to fix the math in your advisory AI?
Dojo Labs builds computation layers that sit between your data sources and your AI tools. Every number verified. Every calculation deterministic. Every deliverable audit-ready.
References & Sources
International Journal of Financial Studies (2024). “Evaluating Large Language Models in Financial Analysis: GPT-4o and M&A Valuation Errors.” IJFS, MDPI.
FM Magazine, IMA/CIMA (May 2025). “Testing AI Copilots on Financial Statement Analysis: Depreciation, EBIT, and Net Income Errors in Microsoft Copilot.”
Review of Accounting Studies (2024). “Can Large Language Models Pass the CPA Exam? Performance of ChatGPT-4 With and Without Calculator Tools.”
Issues in Accounting Education (BYU, 327 co-authors, 186 institutions). “ChatGPT 3.5 Performance on Accounting Assessments: 47.4% vs. Students' 76.7%.”
RGP CFO AI Survey (December 2025, n=200). “Only 14% of CFOs Have Seen Clear, Measurable ROI from AI Investments.”
Bain Capital Ventures (late 2024, n=50). “71% of CFOs Not Currently Using GenAI in Finance/Accounting Despite 94% Seeing Potential Benefit.”
Gartner AI in Finance Survey (November 2025, n=183 CFOs). “58% of Finance Functions Using AI; 91% Report Low or Moderate Initial Impact.”
McKinsey & Company (2025 & H2 2024). “The State of AI: Global Survey” — 79% of respondents report regular GenAI use.
CustomerGauge. Professional services industry average annual churn rate: 27%.
Bain & Company / Frederick Reichheld. “Prescription for Cutting Costs” — improving retention by 5% increases profits 25-95%.
Harvard Business Review. Referred customers have 16% higher lifetime value and 18% lower churn than non-referred customers.
Meetanshi. 85% of professional services firm new business comes from referrals and word-of-mouth.
Zhou et al. (ICLR 2024). “Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification.” MATH benchmark: 42.2% without tools, 84.3% with Code Interpreter.
npj Digital Medicine (2025). “Medical Calculation Accuracy of LLMs: 11% Without Tools, 88% With Deterministic Tools.”
Apple Research / GSM-Symbolic (ICLR 2025). “GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models” — performance drops of up to 65% from irrelevant clauses.
OpenAI SimpleQA (2024). Evaluation of factual accuracy and hallucination rates across frontier LLMs.
HaluEval (EMNLP 2023). “HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models.”
Stanford HAI (2025). “AI Index Report” — annual survey of AI capabilities, limitations, and adoption trends.
SEC AI Washing Enforcement Action (March 2024). Securities and Exchange Commission enforcement actions against firms for misleading AI claims.
FINRA Rule 3110. Supervision requirements applicable to broker-dealers using AI-generated communications and analysis.
EU AI Act. European Union regulatory framework classifying AI systems by risk level, with requirements for high-risk financial applications.
W.R. Berkley Corporation. AI liability exclusion provisions in professional liability insurance policies.
About Dojo Labs
Dojo Labs builds and fixes AI systems where every number is computed, not guessed. We specialize in computation layer architecture, deterministic verification pipelines, and numerical accuracy engineering for advisory firms, fractional executives, and service businesses deploying AI at scale.
© 2026 Dojo Labs. All rights reserved. This whitepaper is published for educational purposes. Data points are from Dojo Labs internal research and externally cited peer-reviewed studies, industry surveys, and publicly available sources including McKinsey, Gartner, RGP, Bain Capital Ventures, Harvard Business Review, and academic publications. See References section for full citations. Individual results may vary.