Reduce AI Research Costs: Halving a $36k Bill

I was chatting with my CTO after a fairly routine client call yesterday. We were doing a self-assessment, and my CTO wondered in the chat: how do you feel that went?

As a sales guy, eternally confident and optimistic, my answer was: fine. We're the subject matter experts. He doesn't want to have to figure this out.

I knew that really the power of this call was in the Granola transcript that we had with all of the details that we could immediately run through Opus 4.8. The strength of the AI first mentality is to digitise and then ai-fy the calls to build in the proper plan.

The client is a small financial-services firm. Some months ago they wired up a mostly prompt-based AI agent onto their research desk to reduce their research costs and free up the analysts. Every morning it reads what two people used to read by hand: regulatory filings, broker notes, the right slice of X. It ranks what matters and emails a short list before the market opens. It saves the team roughly two hours a day each, so do the maths and that is a big saving for a research team of four or five people. It also costs about $100 a day to run. A hundred a day is $36,000 a year to read the news.

That number is not small, and it is exactly the kind of thing a business-minded fractional partner, one who thinks the same way as the founder, looks at first. It is real money, it is the right thing to look at, and most of it was being spent in the dumbest possible way.

The expensive model was doing the cheap work

Here is what was actually happening. One frontier model, the best and most expensive on the market, was doing everything: downloading the X feed, scraping filings, writing first-pass summaries, and then also doing the one part that genuinely needs a top model, the ranking and the judgment about what the desk needs to read. Paying premium rates for the donkey work is where the money goes.

So think of the work in three jobs, not one. Gathering, which is just pulling data and needs no judgment. First pass, which is summarising and tagging each item. And judgment, the ranking and synthesis where a wrong answer costs real money. The first two are cheap work. Only the third needs the expensive model.

The move is to route by the job. Send the gathering and the first pass to a cheap model, and keep the frontier model only for the judgment. The independent write-ups on this put the saving at roughly 60 to 75 percent on the routed work, because you stop paying a premium rate to download a web page. For this team it takes the bill from about $100 a day toward $40 or $50. Call it $15,000 a year, found without touching the quality of the morning report.

That's the value of that one phone call.

What the open-source pricing actually looks like

The cheap models here are the open-weight ones, and the price gap is not subtle. A frontier model runs around $5 per million input tokens and $25 per million output. A capable open-weight model runs nearer $0.14 and $0.28. That is roughly thirty-five times cheaper on the way in and ninety times cheaper on the way out, for work where you would not feel the difference.

You would not feel the difference because the old line that open models are a year behind the frontier is out of date. On most real tasks the gap is now about three months, and for reading filings and ranking headlines, three months behind is nothing. You do not need the best model on earth to download a document and tell you what is in it. You need it for the last mile only.

Use the feeds you already pay for

If you run a research desk in financial services, you almost certainly already pay for Bloomberg and AlphaSense. That changes the maths. A lot of what the AI is paying to gather, those two could hand over already structured.

Both have grown real automation in the last year. AlphaSense has custom workflow agents that can schedule pulls of earnings, filings and broker research and deliver them into your pipeline. Bloomberg can run the query on its own servers and return just the answer, and it can schedule a custom morning brief natively. The point is not the plumbing, it is the principle: where you already own the data, let it arrive clean and structured, and stop paying a model to re-read a terminal page to find a number. Cheaper and more accurate at the same time. The one thing worth a call to each account manager is whether the programmatic access sits inside what you already pay or costs extra.

Keep it private, keep it compliant

None of this is worth doing if it leaks your data, and in financial services that is the first question, not the last. So two rules.

The cheap tier only ever sees public data: filings, public news, public posts. Nothing under an NDA, no positions, no data-room material. And the cheap models do not run on a model maker's own consumer API, some of which route to servers offshore and reserve the right to train on what you send. You run the identical open-weight model through an EU or US enterprise provider that offers zero data retention, no training, and the same SOC 2 and data-processing terms your compliance team already demands of every vendor. Same model, compliant pipes. Anything confidential stays on the frontier path you already trust. Done that way, the bill comes down and the privacy posture does not move an inch.

This is non-negotiable for the financial-services sector. Without it in place, the conversation about cost is a non-starter.

The deliverable is a five-minute report, not a science project

Worth saying what the output actually is, because it is unglamorous and that is the point. It is a daily report the research team reads in five minutes with their coffee. Ranked, sourced, short. A dashboard to drill into when someone wants more. That is the whole product. The AI is the details: prompting, thought logic, and yes, some coding, behind a report a human reads. The human still makes the calls.

The $15,000 was a phone call, not a project

We did not build this firm a model. We did not run a six-month transformation. We looked at where the money was going, saw the expensive model doing cheap work, and worked out which knob to turn. The saving was a phone call.

Will we really solve this in a week? Probably not. But with this forty-five-minute phone call, we know where to start looking, and chances are we will figure it out, probably faster than we think. I try to under-promise and over-deliver, a fine balance for a sales guy who leans towards a certain amount of puffery.

That is what fractional AI ops is, most days. Not the grand rebuild. The person who has seen enough of these setups to know that your $36,000 robot is overpaying by about $15,000, and can tell you exactly which part to change without breaking the thing that already works.If you have an AI brief bolted onto a Bloomberg and AlphaSense desk, three things are worth checking: where your tokens actually go, whether your feeds can hand you structured data instead of raw pages, and whether your cheap-model calls run on servers your compliance team has cleared. Most desks I have seen are overpaying on all three.

Monthly Revenues $9,200 | Clients 2 | Prospects (Meta to WhatsApp campaign built, WABA pending) | Team: Me + Jan (CTO). Day 69 of 365.

$36,000 a Year to Read the News, and the Call That Halved It