How does pricing work?

Free audit. Then 15–25% of delivered monthly savings vs. a locked pre-engagement baseline, reconciled to raw provider invoices. No savings, no fee. No multi-year lock-ins.

How are savings measured?

We lock a pre-engagement baseline from your provider invoices and publish monthly reconciliation reports that tie end-to-end to raw billing, traffic-adjusted. Your CFO and finance team can audit the math.

How long does it take?

Target 3–5 weeks from kickoff to first reconciled invoice.

new Managed AI cost optimization · Pay on results

Cut your LLM API bill in half.

LLM CFO is managed FinOps for AI — for engineering teams spending $20K+/month on LLMs. We audit your OpenAI, Anthropic, AWS Bedrock, Google Vertex AI, and Azure OpenAI usage, then implement model routing, semantic caching, prompt caching, prompt compression, and batch routing. Every month we send a statement reconciled against your raw provider invoices. You only pay on the savings we deliver.

Book free audit → See platform

Targeting 40–60% reduction 3–5 week kickoff No savings, no fee

~/llmcfo/example

// LLM CFO: baseline → optimized import { audit, optimize } from "@llmcfo/core"; const baseline = await audit({ providers: ["openai", "anthropic", "bedrock"], window: "2026-03", }); const plan = await optimize(baseline, { techniques: ["routing", "cache", "compress"], ship: true, });

# routing tree — classified per request tiers: reasoning: model: "claude-3.7-sonnet" match: "complexity > 0.7" extraction: model: "claude-3.5-haiku" match: "schema_required" classify: model: "gpt-4o-mini" match: "intent && len < 800" cache: similarity: 0.92 ttl: "24h"

# Statement of Savings — example output Baseline: $165,000.00 / mo (locked at kickoff) Target: $70,000–99,000 / mo Range: 40–60% reduction (architecture-dependent) ## Reconciled to raw provider invoice each month ## Quality A/B tested 7 days before rollout ## Illustrative — not a real customer report

baseline	$165,000/mo	—
model routing	flagship → small	−30 to −50%
semantic cache	near-duplicate hits	−20 to −40%
prompt compress	system + context	−15 to −30%
provider arb.	capability per $	−20 to −35%
EXAMPLE · COMBINED RANGE	illustrative	−40 to −60%

// Reconciled against every major provider & gateway

OpenAI Anthropic Gemini Bedrock Azure Groq Together Mistral

§ 01 · Targets

What we target per engagement.

We're an early-stage service taking our first cohort. These are the targets we commit to at audit kickoff, sourced from public benchmarks and our own engineering work — not customer claims.

// reduction range

40–60%

Architecture-dependent. Confirmed against your invoices at audit.

// time to savings

3–5wks

Kickoff → first reconciled provider invoice.

// quality SLO

A/B7d

Every change A/B tested 7 days minimum. Auto-rollback on regression.

// audit fee

Free. No savings, no fee — performance pricing.

§ 02 · Techniques

Six levers we pull, every engagement.

Most engagements combine three or four of these. The audit tells us which dominate your spend surface.

30–50%

model routing

Right model, right request.

Classify each request and route to the cheapest model that passes your quality bar. Reasoning goes to flagships; extraction and classification go to small fast models.

20–40%

semantic cache

Near-duplicates, served instantly.

Fingerprint prompts with embeddings and serve identical answers from sub-10ms cache. Similarity thresholds and invalidation tuned per feature.

15–30%

prompt compression

Tokens that don't earn their keep.

Audit system prompts for redundancy. Deduplicate examples. Compress retrieved context with LLMLingua-style techniques. Every change A/B tested.

10–50%

batch & async

Non-interactive work pays less.

Route background jobs to batch endpoints (up to 50% discount) with SLO-aware queueing for anything time-sensitive.

20–35%

provider arbitrage

Same capability, different price.

Identical tasks often cost 2–3× more at one provider. We route by capability-per-dollar, not by the SDK your team happened to start with.

5–15%

fallback chains

Graceful degradation, not over-provisioning.

Smart retries and tiered fallbacks beat worst-case over-provisioning. Maintain SLO without paying flagship prices on every call.

§ 03 · How it works

Audit. Implement. Reconcile. Repeat.

Every engagement follows the same four-phase shape. Most customers are on month-end reconciliation by week five.

WEEK 01 · AUDIT

Map every dollar.

Ingest invoices and gateway logs. Baseline locked and countersigned.

WEEK 02 · PLAN

Rank by waste.

Optimizations ranked by dollar impact. Quality SLOs agreed per endpoint.

WEEKS 03–05 · SHIP

Ship & A/B.

Feature-flagged, tested 7 days vs. baseline, graduated to 100%.

ONGOING · RECONCILE

Signed, monthly.

Statement of Savings reconciled to raw provider invoices. Every month.

§ 04 · Pricing

We only get paid when you save.

Performance pricing, tied to your provider invoices. No savings, no fee. No multi-year lock-ins.

Free audit,
then 15–25%
of verified savings.

✓Audit is free and credited against implementation.
✓Savings measured vs. locked pre-engagement baseline, reconciled to raw provider invoices.
✓Quality SLOs enforced; regressions auto-rollback.
✓Minimum engagement: $20K/month LLM spend.
✓Cancel for any reason — no multi-year lock-ins.

// illustrative example · $165K baseline · 50% reduction

Baseline AI spend	$165,000
After optimization (≈50%)	$82,500
Monthly savings	$82,500
LLM CFO fee (20%)	− $16,500
Customer keeps	$66,000 /mo

// ILLUSTRATIVE ANNUAL NET — NOT A CUSTOMER FIGURE

$792,000

Mid-range example only. Your audit produces a real number.

§ 05 · FAQ

Common questions.

Missing something? Write to hello@llmcfo.com — we respond within one business day.

What is LLM CFO?

LLM CFO is a managed FinOps for AI service. We audit LLM spend across OpenAI, Anthropic, AWS Bedrock, Google Vertex AI, and Azure OpenAI; implement model routing, semantic caching, prompt caching, and prompt compression; and deliver monthly reconciled savings statements that tie end-to-end to raw provider invoices.

What is FinOps for AI?

FinOps for AI applies cloud-cost-management principles — visibility, accountability, optimization — to LLM and inference spend. The unit of cost is the token; the challenges are attribution (which feature, team, or user is burning tokens) and architectural waste (over-routing to flagship models, missing cache opportunities, uncompressed prompts, no batch routing). LLM CFO delivers all three pillars as a managed service.

How much can I reduce my OpenAI or Anthropic bill?

Published research and our engineering work suggests 40–60% reduction is achievable for most production workloads. The biggest savings usually come from semantic caching (20–40%), model routing (30–50%), and prompt compression (15–30%). We commit to a target after the audit. If we don't deliver savings, you don't pay.

How is pricing structured?

Performance pricing. Free audit, then 15–25% of delivered monthly savings vs. a locked pre-engagement baseline. No savings in a given month, no fee. No multi-year lock-ins.

What's the minimum monthly spend?

$20K+/month on LLM APIs, or teams projected to cross that within a quarter. Below that, self-serve observability tooling (Helicone, Langfuse) is a better fit.

Which LLM providers do you support?

OpenAI, Anthropic, Gemini & Vertex AI, AWS Bedrock, Azure OpenAI, Groq, Together AI, Mistral, Cohere, Fireworks, OpenRouter, and most OSS endpoints. We also work with LiteLLM, Helicone, and Langfuse gateways. Multi-provider setups typically have the largest savings surface.

What's the difference between LLM observability and LLM cost optimization?

Observability tools (Helicone, Langfuse, LangSmith) tell you what your spend is. Cost optimization is the engineering work that reduces it: caching, routing, compression, batching, provider arbitrage, cache-read token strategy. LLM CFO is the managed-service layer on top — we take the observability data and deliver reduction with reconciled monthly statements.

How are savings verified?

We lock a pre-engagement baseline from your provider invoices. Each month, our reconciliation report compares optimized spend to that baseline, traffic-adjusted, reconciled to raw billing. Your finance team can audit the math.

Are you SOC 2, GDPR, or HIPAA certified?

Not yet — we're being honest. We're an early-stage service and don't claim SOC 2, GDPR, or HIPAA certification. We follow read-only-by-default access for billing and gateway data, and can sign a mutual NDA and DPA before any engagement. Happy to share our security practices on request.

Will optimization hurt output quality?

Every change is A/B tested against your production baseline for seven days minimum. Regressions roll back automatically. Quality SLOs are agreed per endpoint at the start of the engagement and monitored continuously alongside cost.

§ 06 · Vocabulary

The vocabulary of AI FinOps.

If your team is searching for any of these, we work on all of them. The audit tells us which dominate your spend surface.

prompt caching

Prompt & context caching.

OpenAI prompt caching, Anthropic prompt caching, Gemini context caching. Cache-read tokens tracked separately from input tokens (50–90% discount). We tune cache hit rates without breaking determinism.

semantic caching

Near-duplicate request collapsing.

~31% of enterprise LLM queries are semantically identical. Embedding fingerprints + similarity thresholds + per-feature invalidation. Sub-10ms responses on cache hits.

model routing

Tiered LLM routing.

Reasoning → flagship. Extraction → small fast. Classification → mini. The largest single savings lever for most teams (30–50%).

prompt compression

Token-efficient prompts.

System-prompt audit, example deduplication, LLMLingua-style context compression. Quality A/B-tested 7 days minimum before rollout.

batch API

Async & batch routing.

Background and async work routed to batch endpoints — up to 50% discount. SLO-aware queueing for anything time-sensitive.

provider arbitrage

OpenAI · Anthropic · Bedrock · Vertex · Azure.

Identical capability often costs 2–3× more at one provider. We route by capability-per-dollar across OpenAI, Anthropic, AWS Bedrock, Google Vertex AI, Azure OpenAI, Groq, Together, Fireworks.

§ 07 · Resources

Briefs, benchmarks & tools.

Long-form research for engineering leaders evaluating LLM cost optimization and FinOps for AI. Free, no signup.

GUIDE↗

The complete guide to LLM cost optimization

Every technique, benchmark, and decision framework for reducing your OpenAI & Anthropic bills.

GUIDE · APR '26FREE

TUTORIAL↗

Production semantic caching for LLM apps

Fingerprinting, similarity thresholds, and invalidation — with a reference implementation.

TUTORIAL · CODEFREE

GUIDE↗

Model routing patterns that actually work

A decision tree for flagship vs. budget model selection without quality regressions.

GUIDE · BENCHMARKSFREE

DATA↗

LLM price-per-capability benchmark

Cost-per-successful-task across GPT, Claude, Gemini, Mistral, Llama — sourced from public pricing and our own runs.

UPDATED MONTHLYFREE

CASE STUDIES↗

Customer engagements

We're early-stage and don't yet have public case studies. We'll publish reconciled outcomes here as customers approve them.

COMING SOONHONEST

CHANGELOG↗

What's new at LLM CFO

Service updates, new provider adapters, audit-template improvements. Public release notes.

WEEKLYFREE

§ 07 · Start here

Book the audit. Keep the savings.

Two weeks. Read-only access. We return with a full map of your spend, ranked by waste, and a baseline our platform can track against. Free. No implementation commitment.

Book free audit → Talk to us

30-MIN DISCOVERY · READ-ONLY ACCESS · NDA + DPA ON REQUEST