When AI Reconciliation Misses Half the Statement

"It is just one month. Expand the report to cover May and send it back to me."

That was my message to the CFO who also acts as the main accountant. I thought I was handing her a five-minute job on the AI reconciliation I have been building for an enterprise client.

Her reply came back a minute later. "This isn't just May. This is for the full year."

I read it twice. At first my stomach sank and panic started to creep in. Wait, could it be that I am completely hosing their books? Surely not. We had been so careful in the setup. But until the live run has gone through, everything is just theory, and we need to see results that reconcile to the statements.

Then I went back into the numbers myself, already knowing my idea of the job and the actual job were not the same. The dread was still there, but I kept it under control until I had a better idea of what was going on. No need to panic yet, not until I had the full picture.

Fifty lines on the bank statement for the month. Twenty-three of them had never made it into the general ledger. Nearly half. The automation had quietly dropped every one of those lines into an exceptions file and finished its run with a green light. No error, no warning, nothing flashing red. As far as the machine was concerned, it had done its job. It had done about half of it, and I needed to find out how this could be, because up to this point we had a solid plan.

What the setup got right

The honest review has to start here, not with the faults, because the bones held and that is the only reason this is a fixable story instead of a disaster.

The real ledger was never touched. Every bit of this ran on a sandbox copy of the books with live writing switched off, so a mistake cost me an afternoon, not a client's accounts. The steps that actually matter run through fixed code, deterministic, same input giving the same answer every time, not a model improvising over a company's money. That is important to understand. I had written it down myself in the skill and the rules we were building on the fly as the project advanced.

Every write that does happen lands in an audit log, and a re-run will not double-post. And here is the one I am most glad I built: when the system was not sure about a line, it refused to post a guess. It set the line aside instead. All of it was caught in the audit log, which the agent suggested we have from the start.

That last instinct was right, and it is the same one I wrote about yesterday on the trust ladder. An AI that is unsure should not be quietly posting its best guess to a client's books. The failure was not that it missed twenty-three lines. The failure was that nothing then forced me to look at where they had gone, and we missed this. The CFO checking by hand and telling us was also part of the plan, but as it was her telling me what had happened, it made me look bad, like I did not know what was going on. Which, in that case, was true.

What I missed

The first miss was about that exceptions file. When I built the pipeline I pictured exceptions the way most people do: a handful of strange lines a month that need a human eye. So when the run dropped some transactions into exceptions, I read it as normal. It was not a handful. It was twenty-three out of fifty. The file I had treated as the rare case was holding almost half the month.

The second was the matching. To post a payment to the right supplier, the system compares each bank line against a list of open bills. The version running last month was comparing against the previous month's list. Stale by one cycle. So most of the bills genuinely had no match to find, and the system did the only safe thing it knew and excepted them. The logic worked as written. What was written was wrong.

The third is the real lesson, and it is the one I want anyone hiring for this kind of work to take away. Nothing in the system ever checked that what got posted equalled what was on the statement. There was no step that added up the posted lines, added up the excepted lines, and asked whether the total matched the bank. The run could finish, report success, and be half done, and not one line of code would notice. "Success" meant the script ran to the end without crashing. It did not mean the books were correct. I had been trusting the process because I was becoming complacent that the agent had it under control.

What I learned, and the control I built

The lesson is small and it is boring and that is exactly why it is easy to skip: a reconciliation has to reconcile itself. The total has to tie, or the bank run has not passed, no matter how green the light is.

So this week, before anything else, I built the check that was missing. Every time we learned a lesson, it was added to the skill and checked and double-checked. It is read-only, it writes nothing, it does one job. Every bank line on the statement has to be accounted for in one of two ways: posted to the books, or sitting on a named held list with a stated reason. Posted plus held has to equal the bank statement, to the cent. FX fees, bank wires, cross-entity posting, everything. Separately, the bank balance in the books has to tie to the bank's own closing balance. If any of those three checks fails, the run stops and prints the exact lines that are unaccounted for. It is no longer allowed to go green while it is half done.

I proved it on the sandbox copy, live writing still off. Twenty-seven posted, twenty-three held with reasons, zero unaccounted for, tied to the cent. Then I ran the whole thing again to be sure it would not double-post on a re-run. Same answer, no duplicates. That is what I want from a control: same input, same output, and it tells me the moment something does not add up. The deterministic plumbing under the AI financial controller matters here far more than any model does.

The reshuffle: one message turned a clean-up into a year

"This isn't just May. This is for the full year." I had been treating this as a tool to tidy up one month's reconciliation. The CFO was telling me the actual scope, and they were right.

So the plan reshuffled. The job is not a monthly clean-up I run and hand back. It is a finance operation where every cycle, the month, the quarter, the full year, runs through automation that checks its own work, so the accountant stops spending their days matching bank lines by hand and gets that time back for the work only they can do. Tax structure. How the business is actually built. The judgement that does not belong in a script.

That is the north star now, and I should name it plainly: an AI controller that makes the person it works for the strongest version of their role, not a faster version of the manual grind. The completeness check is not a patch I bolted on after a bad month. It is the first real piece of that year-long build, because you cannot automate a full year you cannot prove was complete. A few weeks ago an independent review caught eighteen things I had missed; this month the client caught one more. The pattern is the same. Finishing and being correct are different claims, and the only way to run a full year unattended is to make the system prove the second one every single time.

Every time we learn something, we add it to the best place. Is it in the code? The skill, or the north star plan? The agent knows the best practice for each, so I do not tell it what to do. You tell me, as per best practice, where this should go. I do not care, as long as we learn better each time we make a write or change anything, and it is not lost to the next session. The day that stops being true is the day we could lose a client, and maybe the business with it.

How we get better

Next month's close runs through the completeness check before any numbers or confirmations go out. If the automation excepts a line, the total will not tie, and it will tell me which line and why instead of waving me through. The exact failure I had last month, a green light over a half-finished job, is the one thing this control makes impossible.

The bigger correction is in the workflow. I had let "the run finished" stand in for "the work is right." It does not, and now the system will not let me pretend it does. I would rather write this post than the one where everything tied and I looked clever. The clever version teaches nobody anything. This one has a number in it: twenty-three of fifty. That number is the reason every run now has to prove it added up before I trust a word it says.

Monthly Revenues $11,000 | Clients 2 | Prospects (Meta to WhatsApp campaign built, WABA pending) | Team: Me + part time Jan (CTO)

Day 75 of 365.

My AI Reconciliation Ran Clean and Still Missed Half the Statement

What the setup got right

What I missed

What I learned, and the control I built

The reshuffle: one message turned a clean-up into a year

How we get better

AI Deployment as a Service. One workflow at a time.