The full end-of-month bank files have been posted into the shared folder on a Friday. Several entities, several currencies, hundreds of lines. Three months ago this was a person's week. This week it was the last input I needed before going live.
I did not go live with it that day. I ran it as a dry run first.
Everyone wants the launch-day story, the moment the machine does the thing. The real work of deploying AI in production is the weeks before, when the machine does the thing into a copy of the books and you check every line against what a human already did by hand. You are not testing whether it works. You are trying to catch the one row where it is confidently wrong.
The dry run is the whole job
I have been running this for weeks. Take a month of real bank data. Run it through the classifier and the matcher. Get back a set of proposed journal entries, each one tagged with which rule fired and how confident it was. Then put that next to the same month the client's finance lead coded by hand, and diff the two.
The first time I did this, most of the lines came back unclassified. The system did not know what they were. So you read the exceptions, you find the pattern, you write the rule, you run it again. Unclassified drops. Run it again. Drops more. You are not chasing a hundred percent on day one. You are chasing a number low enough that the human review of what is left takes minutes, not days.
When the dry run matched on a sandbox copy of the live accounts, twice, that was the green light. Not a demo. Not a slide. The same month of money, posted into a mirror of the real books, reconciling to the penny against a human's work. This was a quiet win that nobody really knows about, but it still felt good. Until this point, everything was still theory.
The safeguards come before the go-live, not after
I will not let AI write to a client's live accounts without three things in place first. None of them are about the AI. All of them are about what happens when it is wrong.
A fallback the client controls. The finance lead can flip any pattern back to manual, through a config file, without calling me. One master switch flips everything off at once. If I am on a plane and something looks off, they are not stuck waiting for me.
A rollback on every write. Every auto-posted entry has a one-click reversal inside a set time window. Nothing is permanent the moment it posts. That time window can be as long as the human in the loop wants. I am thinking a week.
An audit log that cannot quietly lie. Every write is hash-chained and mirrored to an off-site repository within minutes. If the chain breaks, the system halts itself. The point is not to trust the AI. The point is to never have to.
This is what human in the loop ai actually means in an accounting context. Not a person babysitting every entry. A person who can see everything, reverse anything, and stop the whole thing cold, while the machine does the volume.
There is a reason almost nobody has deployed AI into live accounting. I get into it on the pillar page, but the short version is this: most finance teams that try it find the time spent checking the machine equals the time it saved. So why bother. That was the thing I had to design around from day one. The checks have to be fast, and they have to be rare. The system handles the volume on its own, and a human only looks at the exceptions, the handful of rows it could not place with confidence. If the finance lead has to re-check every entry, I have built nothing worth paying for.
What go-live week actually looks like
The plan is boring on purpose. The full dry run on the month's data, reviewed, any rule updates applied. Then live writes, one small entry per entity, fully reversed in the same sitting with the finance lead watching. Then the trusted patterns post on their own, with everything else flagged for same-day review. The goal is every dry run done and live writes done across every entity by midweek, with a day or two of slack built in before I go offline, in case something breaks. You do not schedule a go-live for the last possible hour. You leave room for the thing to go wrong.
Bank reconciliation is just the first workflow
Reconciliation is the part that has to work first. This is where it connects back to what an AI financial controller actually does, the pillar I wrote a few weeks ago. Going live on the bank side proves the process can work. The rest of the controller plugs into the same process.
Accounts payable is next. A tool that reads PDF invoices, pulls the supplier, the amount, the date, the coding, and drafts the entry for review. The invoices that arrive as email attachments and used to be typed in by hand become a queue a human approves instead of a stack a human transcribes.
Then the partner tax reads at the back end of the month. If you're a CFO who is familiar with the term K-1s, I wasn't, but I am now, then you know the manual work involved with this. This is the structured data that has to be pulled out of dozens of near-identical documents and reconciled against a master. Same idea as reconciliation, different documents. Read, match, draft, review.
And on top of all of it, one dashboard. It does not show model accuracy or token counts or anything I would find interesting and the client would not. It shows two numbers. How many entries AI posted this month. And the estimated hours that saved a human from doing by hand.
That second number is the only KPI that matters. Not because the others are wrong, but because that is the one the client repeats to their own boss. Hours saved is the only AI metric worth tracking. Everything in the plan above exists to make that number real, and trustworthy, and reversible.
The model was the easy part. It was the month of dry runs that made it safe to switch on. If you want to read why I do any of this, it is in the fund I started for my son.
Monthly Revenues $9,200 | Clients 2 | Prospects (Meta to WhatsApp campaign built, WABA pending) | Team: Me + Jan (CTO)
Day 66 of 365.