How to build an LLM CFO function.
20 June 2026
Someone has to own the AI bill — not just watch it. Owning it means setting budgets, attributing spend to teams, approving the model changes that move cost, and reconciling the dashboard against the invoice. This is the blueprint for standing that function up, whether it ends up being a person, a virtual team, or an outside service.
What an "LLM CFO function" actually is
It is the standing responsibility for the economics of your LLM usage: a named owner, a recurring cadence, and a small set of metrics that decisions are made against. It is the AI-spend analogue of what cloud FinOps became for infrastructure — except token pricing, cache discounts, and reasoning costs make the unit economics stranger, so the function needs its own playbook rather than a re-used cloud one. We lay out the discipline itself in AI FinOps; this page is about turning that discipline into an org function.
A dashboard is not a function
Most teams start with visibility — a cost dashboard wired up from gateway logs. Visibility is necessary and not sufficient. A dashboard shows the number; a function changes it. The gap is ownership: who is accountable when spend per request creeps up 30% after a model swap, who signs off on routing changes, who tells a team its feature is now the most expensive surface in the product. Without that owner, the dashboard becomes wallpaper everyone has stopped looking at.
The four responsibilities
| Responsibility | What it means in practice |
|---|---|
| Attribute | Every dollar of spend maps to a team, product surface, environment, and model. Below ~90% attribution coverage, every other number is suspect. |
| Optimize | Own the levers — routing, caching, batch — and the decision to pull them, with quality regression checks attached. |
| Govern | Set budgets and guardrails, approve model and prompt changes that move cost, and define escalation policy. |
| Reconcile | Compare derived spend (token counts × price table) against the provider invoice monthly, with a documented tolerance. |
Who runs it
The function lives at the seam between platform engineering and finance, and it fails when it sits entirely on one side. Engineering owns the instrumentation, the gateway, and the levers. Finance owns the budgets, the allocation model, and the reconciliation against the real bill. In a small org this is one engineer with a finance partner and an hour a week; in a larger one it is a virtual team with a clear RACI. What matters is that one named person is accountable for the number, even if many are responsible for moving it. Allocation mechanics — when showback becomes chargeback — are covered in AI chargeback and showback.
The metrics it lives by
- Cost per successful task. Total spend over completed units of work — retries and failures in the numerator. The headline unit-economics number.
- Cache-read ratio. Cache-read tokens as a share of input tokens; the single biggest input-cost lever and invisible if you only watch totals.
- Model mix. Share of spend by model tier, so a drift toward the most expensive model is caught in days, not at invoice time.
- Attribution coverage. Percentage of spend mapped to a known owner. The credibility gate for everything else.
- Forecast accuracy. Predicted vs. actual monthly spend. A function that cannot forecast cannot budget.
The mechanics of capturing these live in LLM cost monitoring and token usage tracking.
The operating cadence
- Weekly: scan cost per task and model mix for anomalies; flag any surface trending up.
- Monthly: reconcile against the invoice; review budgets vs. actuals with each team; ship one optimization.
- Quarterly: re-forecast, revisit budgets, and review whether the price table and routing policy still reflect reality.
The first 90 days
- Weeks 1–2 — Instrument and attribute. Get per-request token data tagged by team, product, environment, and model. Drive attribution coverage above 90% before optimizing anything.
- Weeks 3–4 — Baseline and reconcile. Establish cost per successful task and cache-read ratio, then reconcile the derived total against last month's invoice so the numbers are trusted.
- Weeks 5–8 — Set budgets and guardrails. Give each team a budget and an alert; add per-request and per-agent ceilings so runaway loops can't surprise you.
- Weeks 9–12 — Optimize and institutionalize. Pull the first high-confidence lever (usually caching or routing), measure it against quality, and lock in the weekly/monthly cadence with a named owner.
Build, or buy the function
Standing this up in-house is the right call when LLM spend is large enough to justify a dedicated owner and the expertise exists internally. When it isn't — spend is real but nobody has the bandwidth, or the team would rather not build the instrumentation and reconciliation muscle from scratch — the function can be run as a managed service. That is precisely what LLM CFO does: the attribution, optimization, governance, and reconciliation, run for you. Either way, the test is the same: is there a named owner, a cadence, and a number that decisions are made against?
Related
- AI FinOps — the operating model this function runs
- LLM cost monitoring — what to track and how
- AI chargeback and showback — the allocation mechanics
- AI governance for finance leaders — policy and accountability
- Agent spend guardrails — the budgets and loop controls