⚡ Powered by Finn Pillar

AI Financial Controller: What It Actually Does, What It Doesn't, and How We're Building One

Last updated: 15 May 2026. Living document, updated as the build at our enterprise client progresses.

> About the author. Geordie Wardman runs TestVentures.net, fractional AI ops for enterprise finance teams and youth nonprofits. Currently building an AI financial controller for a mid-market enterprise client. Twenty years operating businesses, building daily in public since 2026. Bermudian by background, Swiss by residence. Reach: geordie@testventures.net · LinkedIn.

A senior partner at one of our enterprise clients asked me last month whether they should hire a financial controller. I'd been working the other side of the same question for a month: whether the same job could be done by software with a human in the loop for the calls that mattered. The conversation forced me to write down what I actually thought.

This is that write-up.

It's for the CFO or operating partner who is staring at a controller hire, a salary number that won't go down, a six-to-twelve month ramp, and a quiet question in the back of their head: is this actually the right answer in 2026, or is there now a different one? I'm not going to tell you the answer is obvious. It isn't. But I will tell you what we know, what we don't know, and what we're building right now with one of our clients.

An AI financial controller, also called an AI accounting controller, agentic finance controller, or AI CFO assistant, is a system that sits on top of your ledger and does most of what a human controller does, with a human reviewing the calls that need judgement. Different name, same job. The disambiguation matters because "AI controller" on its own usually returns industrial automation (PLCs, motor controllers, robotics). This piece is about the finance role.

What is an AI financial controller?

An AI financial controller is a system of scripts and agents that performs the day-to-day work of a financial controller, with a human reviewing the calls that require judgement.

It is not a chatbot. It is not a copilot bolted onto a spreadsheet. It is not a single SaaS tool. It is a small operating system stitched together around an existing ledger, doing the reconciliation, the AP capture, the recurring journals, the FX revaluation, the inter-company plumbing, and the close prep. The human stays on the strategy work, the judgement calls, and signing the financial statements.

The simple way to think about it: most of a controller's day is rules-based work running against rules-based data. That work can be automated. The remainder, which is the judgement work and the regulatory representation, stays human. Gartner and Deloitte's CFO Signals have both flagged "agentic finance" as the 2026 trend most likely to compress mid-market finance team headcount over a 24-month horizon. That's the trend we're building inside, not predicting from the outside.

What does a financial controller actually do day to day?

The role varies by company size, but the workload is consistent across mid-market and lower-enterprise finance teams.

A controller owns the books. That means reconciling bank statements to the ledger, capturing and coding accounts payable, processing accounts receivable, running recurring journals, handling FX revaluation, managing inter-company accounting, preparing the month-end close, building the audit-ready trail, and feeding inputs into the board pack and cash digest.

Roughly 60 to 75% of those tasks are reconciliation against rules. The rules are sometimes ledger rules (which GL account does this hit?), sometimes policy rules (what is the materiality threshold?), sometimes data rules (does this transaction match this invoice?). The work is repetitive, rule-bound, and high-volume. A good controller knows the rules cold and applies them fast.

The remaining 25 to 40% is judgement. Provisions. Accruals with subjective estimates. Related-party transactions. Going-concern assessment. Audit responses. Investigation of anomalies that the rules can't classify. Conversations with the CFO about what to do when something looks off.

The rules portion is what AI can do well. The judgement portion is what AI cannot do, and where the human stays. If you've never written this list down for your own team, the audit prompt from Day 46 is the right starting point.

Can an AI controller replace a human controller?

Not entirely, and the people who say it can are either selling something or haven't tried to do it.

The honest answer is that a well-built AI controller does roughly 60 to 75% of the controller workload by hours, freeing the senior finance leader to do the strategy work that companies actually pay senior finance leaders for. The remaining workload still needs a human, but that human can be a part-time CFO, a fractional finance lead, or an existing senior finance leader who used to spend 80% of their time in reconciliation hell and now spends 80% of their time on strategy and board work.

So the right question isn't "can AI replace my controller?" The right question is "can AI take 60 to 75% of the controller workload off my senior finance leader so they can do the work I'm actually paying them to do?" The answer to that question, in 2026 and beyond, is yes.

How does an AI financial controller actually work?

At the level a CFO needs to understand, the architecture is straightforward.

Inputs come in from the existing data sources: bank exports, vendor invoices, expense receipts, payroll files, sales records, FX rates. Each one arrives in a slightly different format. A small script cleans each one and converts it into the single format the ledger expects. Same data, just rearranged so the ledger can read it. The technical term is "reshape script." The plain-English version is "translator." It doesn't change what the data says, only the format the ledger can read.

A matcher then runs against the cleaned-up data, pairing transactions to ledger entries on a set of rules (date proximity, amount match, reference number, vendor code, prior reconciliation pattern). Matches are written back to the ledger as reconciliation entries. Unmatched items are queued as exceptions for a human to review.

AP capture works on a similar pattern. A vendor invoice arrives. OCR extracts the line items. A coding rule assigns the GL account based on vendor history and line description. A three-way match checks against the PO and goods receipt. Items above the materiality threshold are routed to a human for approval. Items below the threshold are auto-coded and written to the ledger.

Recurring journals run on a schedule. FX revaluation runs nightly with a published rate source. Inter-company eliminations run at month-end. Close prep produces the audit-ready trail and the inputs the senior finance leader needs to sign off.

Every write to the ledger produces an audit row. Who wrote what, when, from what input, against what rule, with what confidence score. That audit row is the difference between a system the auditors trust and a system the auditors will refuse to sign off on. We'll come back to this.

What the first month of a build actually looks like (anonymised). When we started the live build with our current client, I assumed phase one would be mostly engineering. It wasn't. The first month was maybe 70% credential negotiation, 30% code. Getting six specific write permissions approved by the ledger vendor's support team across three time zones, registering a developer portal account, waiting on sandbox copies that arrived with the wrong month in the name. The bottleneck in the early build is not engineering. It is access. If you're scoping a build like this, budget two to four weeks of calendar time before any code does anything useful in the target environment. That's normal. That's not the team underperforming.

How much does an AI financial controller cost compared to hiring one?

This is where the hedging needs to start, because the comparison is not a clean one.

A full-time financial controller in a Western finance market costs somewhere in the range of €80,000 to €120,000 per year all in (salary plus social charges plus benefits plus recruiter fee amortised), depending on jurisdiction and seniority. Robert Half's 2026 salary guide and the Hays salary guide both put the same role inside the €80,000 to €120,000 base salary band for mid-market finance teams. When I queried an AI model on the same question, it returned €185,000 to €260,000. That number isn't wrong for senior controllers at large multinationals in expensive cities, but it's miscalibrated to where this question usually lands. The honest take: AI is confident in finance numbers it shouldn't be confident in. Verify the salary band from a primary source before quoting it to a board.

Then add six to twelve months of ramp before they're useful at full capacity. Add the cost of the senior finance leader's time supervising them. Add the cost of turnover at the two-year mark, which is roughly when financial controllers in the open market start looking again. That stack of costs is significant. You're thinking about how AI can help you in finance because the headline salary number is only half the real cost.

A fractional AI controller build, done right, is a one-time engineering cost of roughly €40,000 to €80,000 (depending on complexity and ledger system) plus a monthly retainer for ongoing tuning and operation in the range of €2,000 to €8,000 per month. That's a build cost roughly equal to one year of a controller's loaded salary, plus a monthly operating cost that's a fraction of the same. Over 24 months, the total cost of ownership for the AI option lands meaningfully below the hire option, even with generous assumptions on build slippage.

But: this comparison is novel. Very few mid-market finance teams have actually built one of these and run it side-by-side with a hired controller for long enough to confirm the numbers. I'm running one of those builds right now. I'll publish the actual numbers when I have them. Anyone telling you the comparison is settled is overselling.

The honest read for a finance leader making the decision today: the build option looks materially cheaper over a two-year horizon, but the hire option is a known quantity and the build option is not. That's the actual trade.

Comparison: hire, build, or run both in parallel

| Criterion | Hire a controller | Build an AI controller | Run both in parallel for 6 months | |---|---|---|---| | Upfront cost | €15-25k (recruiter + onboarding) | €40-80k (engineering build) | €40-80k build + €15-25k light hire | | Ongoing monthly cost | €7-10k (loaded salary) | €2-8k (retainer + cloud) | €4-6k light hire + €2-8k retainer | | Time to useful capacity | 6-12 months ramp | 3-6 months phase one, 6-12 months fully trusted | 6 months controlled comparison | | Audit familiarity | Standard, auditor knows the pattern | Audit trail is new to most auditors, requires up-front conversation | Both paths visible to auditor | | Scaling cost | Linear (more volume needs more headcount) | Near-zero (same system handles more volume) | Hybrid | | Risk profile | Hiring risk, turnover, training | Build risk, novelty risk, auditor pushback | Optionality, doubled cost in the window | | Captures institutional knowledge | In a person who can leave | In code that compounds | Both | | Right answer if | You need someone walking in tomorrow | You have a senior finance leader to supervise | You want hard data before committing |

> "We're freeing roughly 25 hours a month of senior finance time, and we haven't even moved past phase two." > Senior partner, mid-market enterprise client, 2026-05.

What guardrails have to be in place before it touches the ledger?

This is the part of the answer that most "AI in finance" coverage skips, and it's the part that determines whether the system is trustworthy or a liability. The reason it matters: Patronus AI's FinanceBench benchmark found that leading large language models answered roughly 79% of financial questions incorrectly without retrieval. Other independent benchmarks (Vectara's HHEM hallucination leaderboard) put hallucination rates on factual content somewhere between 1% and 15% depending on the model. Either way, the implication is the same: an AI financial controller cannot run on a general-purpose model alone. It needs rules, retrieval, and dual-control. That's where the guardrails come in.

Read-only first, write later. The first phase of any build runs against a sandbox copy of the ledger with no write permissions. The system observes, proposes, and produces a parallel reconciliation report. The human controller (or senior finance leader) reviews the parallel report against the hand-built truth for one full month-end cycle. Discrepancies get diagnosed. Only when parity is demonstrated does write access get granted.

Dual-control on write methods. No single agent or script writes to the production ledger without a second check. The second check is either a hard rule (materiality threshold below which auto-write is permitted) or a human approval (above the threshold). The rules are documented and version-controlled. This pattern is consistent with PCAOB guidance on the use of automated tools in audit, and with IAASB ISA 315 (revised) on risk assessment for systems-generated entries.

Materiality thresholds. Every write rule has a monetary threshold above which a human reviews the write before it lands. The thresholds are set in conjunction with the senior finance leader and reviewed quarterly.

Parity tests against hand-built truth. For the first three to six months, every batch the AI runs is also run by the existing human controller (or by hand, if no controller exists). The two outputs are compared. Discrepancies are reconciled. The parity rate is the primary signal of whether the system is trustworthy yet.

The story behind that line. Our first parity test in the current build ran 47 entries. 43 matched the human's output. Four didn't. We dug into the four. Three were because the AI was using yesterday's published FX rate while the human had used today's, which was easy to fix. The fourth was a human typo the AI hadn't replicated. The AI was actually right. That's the kind of detail you only catch when you run a real parity test against real data. Anyone telling you the build was "100% accurate from day one" is either lying or hasn't actually run parity yet.

Per-posting audit trail. Every write produces an audit row: who, what, when, from what input, against what rule, with what confidence. The audit trail is reviewable by the auditors and is the single most important artefact in the whole build. Without it, the system is a liability. With it, the system is auditable in a way that even a human controller often isn't. Sit with that for a second. That's a pretty nice feature to have if you're on the fence either way.

Reconciliation rollback. Every write has a known undo. If a posting goes wrong, the system can reverse it cleanly without manual intervention.

Human-on-close. No month-end is signed off without a human review of the close pack. The AI does not sign financial statements. The AI does not represent the company to regulators or auditors. That is permanent. The system does the volume work; the human does the representation.

Sandbox first, production second. Every change to the rules, every new module, every script update goes through a sandbox cycle before it touches production. Same as any responsible engineering team. The pattern that lets this work day-to-day is the one I wrote about in the AI operations playbook: thin operational harness, audited write methods, deterministic code doing the heavy lifting.

These seven items are non-negotiable. If you're being sold an "AI controller" without them, you're not being sold an AI controller. You're being sold a liability.

One more thing the brochures don't tell you. All of this is tested and dry-run before each monthly run, to catch anything that's broken or changed. These days there are plenty of software changes your ledger vendor may be pushing to their API or export format that you won't get advance notice on. (In our current build the bank's CSV export format changed three times in the first four months. Each change broke the reshape script. We added a header-row checksum that halts the pipeline if the format drifts, instead of letting the script silently mis-route entries.) It needs a technical human in the loop to check before each monthly run. It's not a build-once-and-let-it-run setup. But with the right operator, the monthly check is only 5 to 20 minutes of work.

---

> Building one yourself? Want to compare notes? > > Email geordie@testventures.net for a 20-minute call. No pitch. I'll tell you what I'd actually do in your situation.

---

What are the limits of an AI financial controller?

The clearest way to name the limits is to list the things a human controller does that an AI cannot do, or cannot yet do safely.

Provisions and accruals with subjective estimates (warranty reserves, bad debt provisions, restructuring accruals). Going-concern assessment. Related-party transaction review. Novel transactions the rules haven't seen before. Auditor relationship management. Conversations with the bank, the tax authority, the board, the investors. Signing the financial statements. Representing the company in regulatory matters.

A real example from the live build. Early in phase one the system tried to auto-code a related-party transaction as a vendor invoice. From the rules' point of view it looked exactly like a vendor invoice: same payee format, same amount range, same coding pattern. From a controller's point of view it needed to be flagged for related-party disclosure. The system did what the rules told it to do. The rules were wrong. We added a related-party detector in the next iteration. But this is exactly the kind of judgement call we are explicit about keeping with the human. The AI is good at applying rules. It is not good at noticing when the rule itself is the wrong rule for the situation.

Some of these limits will narrow over time as the systems improve. Some of them will not, because the regulatory framework treats them as inherently human responsibilities. Signing the accounts is a person's signature, not a system's signature, and that won't change quickly. Maybe ever.

So the honest scoping: AI takes the volume work. The human takes the judgement, the relationships, and the signature. I don't see that going away. Do you?

How do you know an AI financial controller is working?

Four signals.

Parity rate against hand-built truth. For the first three to six months, the AI's output is compared line-by-line to a human's output on the same data. The parity rate should trend toward 100% over the first quarter. If it doesn't, the rules are wrong. (In our live build we hit 91% parity in month one, 96% in month two, 99% in month three. The residual 1% is consistently the same three rule categories, which we're working on.)

Exception volume trending down. The number of items the AI flags as unable to classify should drop over time as the rules learn the company's patterns. If exception volume stays flat or rises, the system isn't learning the business.

Close cycle time trending down. The number of days from month-end to signed close should compress. A well-built AI controller closes the books in a week or less for a mid-market business. Two weeks is OK in the first quarter; longer than that means something is broken.

Audit findings. The human auditor's review of the system should produce zero material findings on the AI-generated portions. If the auditor finds the audit trail acceptable, the system is working. If the auditor refuses to rely on the AI's outputs, the system is not yet ready. The AICPA's guidance on AI-assisted audit evidence is the cleanest reference for what an auditor will actually look for.

A senior finance leader watches all four. If three of the four are trending right, the system is on track. If two or more are flat or wrong, the build needs intervention.

Should I hire a controller or build an AI one?

I won't tell you the answer. I'll tell you the three honest options.

Hire a controller. Known quantity. Six-to-twelve-month ramp. Annual cost in the range above. Familiar to your auditor, your bank, your board. Carries hiring risk, turnover risk, and the risk that you're paying a senior salary for work that doesn't actually need a senior person to do. Right answer if you need someone to walk in tomorrow and own the books, and you're not in a position to invest engineering time.

Build an AI controller. Lower total cost of ownership over a two-year horizon. Captures the company's accounting rules in code, which compounds over time. Frees senior finance time for strategy. Carries build risk, novelty risk, and the risk that the auditor pushes back. Right answer if you have a senior finance leader who can supervise the build, and you're willing to invest six months before the system is fully trusted.

Run both in parallel for six months. Hire light (a part-time or fractional controller, or an existing senior person carrying the load). Build the AI underneath. Compare output at the end of the six-month window. Decide which dominates. Right answer if you want optionality, can afford the parallel cost, and want hard data before committing.

The decision I see most often in the age of AI: option three, with the parallel period treated as a controlled experiment rather than a hedge. The companies that try option two without the parallel period are the ones where the build fails publicly, because the senior finance leader doesn't have time to supervise it. The companies that try option one without considering option three are the ones that wake up 18 months from now wondering why a competitor's finance team is half the cost. The pattern of why "fractional + AI" beats "full-time hire" in this seat is the same one I wrote about in Three Smart Execs, One Vibe Coder, and How to Hire a Fractional AI Lead.

What does a build like this actually look like in practice?

We're building one right now with an enterprise client. I won't name them. I will publish the steps as we go, anonymised, on this blog.

The build is in phase two of four. Phase one was read-only observation and parity-testing against the existing finance team's output. Phase two is sandboxed write access on a subset of the ledger with dual-control on every posting. Phase three is production write access on the same subset with materiality thresholds. Phase four is extension to the rest of the ledger and the full month-end cycle.

Specific colour from the live build. When we moved from read-only to first sandboxed write, the first test was not a batch. It was a single $375 vendor invoice payment. The smallest, cleanest, hand-picked transaction we could find. If that didn't round-trip through every step (reshape, match, write, audit row, rollback test), nothing else would. The whole pipeline can pass an 1,800-line test suite and still fall over on a real transaction because the live data has a quirk the synthetic data didn't have. So you start with one transaction, watch it the whole way through, and only then move to ten, then fifty, then a batch.

We are not promising magic. We are promising disciplined, audit-ready, dual-control automation of 60 to 75% of the controller workload, with the senior finance leader's time freed for the work that actually generates value.

The build is also not finished. Anyone who tells you they've finished one of these in 2026 is either at a much bigger scale than mid-market, or is overselling. The honest read: this is a frontier we're working on, with a real client, with real money, and we'll publish what we learn.

What does 60 to 75% look like in practice? On the build we're running, it's around 25 to 40 hours per month of senior finance time recovered. That's a CFO who was drowning in mundane tasks now freeing up time to be proactive and creative. Rather than doing monkey work of daily manual entries, they're building dashboards, advising the founders, and producing the predictions that only they can do. That is revenue-moving activity that any senior team would want.

How do I get started thinking about this for my own business?

Three questions to answer first.

What does your controller or finance team actually do? Write it down. Be specific. By hour. By task. If you don't have a controller yet and are about to hire one, write the job description's task list with hours next to each task. This document is the input to every other decision. The audit prompt in Are You Actually Automating Anything? is a clean way to do this exercise in 40 minutes.

What percentage of that task list is rule-based reconciliation? Be honest. In most mid-market finance teams the answer is 60 to 75%. If your answer is materially lower, the AI option is less compelling.

Who would supervise the build? If the answer is "no-one," don't start. The build needs a senior finance leader who owns the parity tests, the rule changes, and the audit relationship. If you don't have that person, hire that person first (or engage a fractional version of that person), and then revisit the build question. You may not need a finance person to do the engineering work itself. I'm leading the engineering on the build we're running now, and I have a strong technical background but I am not a CFO. With the CFO guiding the financial part, we make a team that can complete phase one (read-only observation and parity testing) in about two months of part-time engagement, and reach a fully trusted production system in six to twelve months.

If those three questions have clean answers, you're in a position to decide. If they don't, the right next step is to make them have clean answers, not to start a build. The first time I scoped a build like this, I had the wrong supervisor model in mind and the wrong ledger access assumed. It took me a month to recalibrate. The "make them have clean answers" step is the cheapest month you'll spend on the whole project.

Frequently asked questions

Is an AI financial controller the same as accounting software?

No. Accounting software (the ledger itself) records transactions. An AI financial controller sits on top of the ledger and performs the work that a human controller does, including reconciliation, AP capture, journals, close prep, and audit trail generation. The two are complementary.

Will my auditor sign off on accounts produced by an AI?

In our experience so far, yes, if the audit trail is properly built and dual-control is in place on every write. The auditor's question is not "did a human or a machine produce this entry?" The auditor's question is "can I trace this entry back to source, can I verify the rule that produced it, and can I see who reviewed it?" If the answers are yes, the audit signs off. If the answers are no, the audit doesn't sign off. PCAOB and IAASB guidance on system-generated entries is the reference your auditor will use.

What about regulatory representation?

The AI does not represent the company to regulators. A human, typically the CFO or finance director, signs the financial statements and represents the company. The AI produces the work; the human signs the work. That distinction is permanent.

How long does a build like this take?

Phase one (read-only observation and parity testing) runs for one full month-end cycle, sometimes two, so roughly four to eight weeks. Phase two (sandboxed write on a subset) runs one to two months. Phase three (production write on a subset, with materiality thresholds) runs three to six months. Phase four (extension to the full ledger and the full close) runs in parallel with phase three for the back half of that window. So six to twelve months from start to a fully trusted system, with material time savings starting from month three onward. None of this is full-time work. On a daily basis the senior finance leader's supervision is one to two hours, and the engineering side fits cleanly into a fractional engagement.

What if my ledger isn't a modern API-friendly cloud system?

It still works, but the engineering is harder. The integration layer has to do more work. Older ledgers tend to require file-based exports rather than direct API writes, which means the writeback is asynchronous and the dual-control rules need to handle batching. The build is still possible. The cost is higher.

What does fractional AI ops actually mean in this context?

It means we do the engineering, the rule-setting, the parity testing, and the ongoing tuning, on a fractional retainer rather than a full-time hire. Your senior finance leader supervises the relationship. Our team does the technical work. When I say "we," I mean: me leading the build, a CTO-level second opinion on the engineering, and your CFO in the loop directing the financial logic. All three are part-time fractional roles. The combined cost is materially lower than a full-time engineering hire plus a full-time controller. The fractional model is the pattern that fits this work cleanly.

Who is the senior finance leader you keep mentioning?

The CFO, the finance director, the head of finance, or the fractional CFO, depending on company stage. The person who owns the books, signs the accounts, and represents the company to the board and the auditor. The AI controller works for that person.

How is this different from RPA in finance?

RPA (robotic process automation) automates clicks on existing user interfaces. It's brittle: every UI change breaks the bot. An AI financial controller works at the data and API level, not the UI level, and includes a rules layer plus a reasoning layer that RPA doesn't have. The two can coexist (some clients use RPA for legacy systems that have no API), but they are not the same thing.

Is this safe?

With the seven guardrails above, yes. Without them, no. The single most important guardrail is the per-posting audit trail. If you cannot answer the question "who wrote this entry, on what input, against what rule, when, and who reviewed it?" for every line, the system is not ready for production.

What's next

Three supporting posts will follow over the next week:

Want the 7-guardrail checklist as a PDF? Email geordie@testventures.net with "guardrails PDF" in the subject. I'll send the working checklist we use on the live build, anonymised, in return for your email so I can let you know when the supporting posts land.

If you're a finance leader weighing this decision and want to talk through it for your own business, email me directly. No pitch. Twenty minutes. I'll tell you what I actually think for your situation.

geordie@testventures.net

---

This is a living document. Updated 15 May 2026. As the build at our enterprise client progresses, the cost numbers, the parity rates, and the phase timings will be updated with real figures. Subscribe to the build-in-public blog to get the updates as they happen.

← Day 000 All posts Day 001 →

Follow the BIP

See if this is the right fit.

15 minutes. No pitch deck. No pressure. Just a conversation about what's eating your time.

Schedule a call