← home
RESEARCH · TOOLING

Langfuse vs LiteLLM vs OpenLIT.

20 June 2026

By the LLM CFO team

We already compared LiteLLM, Helicone, and LangFuse as gateway vs. proxy vs. platform. This is the other question we keep getting: of the open-source tools that emit LLM cost and usage data, which ones speak OpenTelemetry — so the numbers land in the stack you already run (Grafana, Datadog, Honeycomb, ClickHouse) instead of a tool-specific dashboard somebody has to babysit?

The one-line version

ToolPrimary roleOpenTelemetry-native?
LiteLLMmulti-provider gateway / SDK — unify the API, route, budget, fall overPartial — emits OTLP via a callback
LangFusetracing + eval + prompt platform with its own data modelPartial — ingests OTLP, stores in its own schema
OpenLITOpenTelemetry-native GenAI observability — traces, metrics, costYes — OTLP by design

LiteLLM

A multi-provider gateway and SDK: one OpenAI-shaped API surface across ~100 providers, with virtual keys, per-team budgets, fallbacks, and a local cost table. It is where you put routing and spend caps. We cover it in depth in the three-way gateway comparison. For this discussion the relevant fact is that LiteLLM can emit OpenTelemetry spans through a callback, so the gateway you already route through can become your cost-telemetry source without a second integration.

LangFuse

An observability platform built around traces, evals, prompt management, and datasets. It accepts OTLP and has SDKs, but it stores data in its own model and you read it primarily in the LangFuse UI. Pick it when you need agent traces, LLM-as-judge evals, and prompt versioning — the analysis surface, not just the cost line. Its cost numbers are derived from a price table you maintain, the same caveat as everything else here.

OpenLIT

OpenLIT is the OpenTelemetry-native option. You add one auto-instrumentation call and it wraps your LLM, vector-DB, and framework calls, emitting OTLP traces and metrics that follow the OpenTelemetry GenAI semantic conventions (gen_ai.* attributes). Cost is computed from token counts against a pricing file and attached to the span, so it flows to whatever OTLP backend you already run.

What it does well:

What it doesn't do (or does weakly):

Where the cost number actually comes from

All three compute spend the same way: token counts multiplied by a pricing table they ship or you maintain. None of them read your invoice. That means three identical failure modes — a stale price table, mis-accounted cache-read tokens (the baseline trap), and provider-specific discounts the table doesn't know about. Whatever you pick, reconcile the derived number against the provider bill monthly, or the dashboard quietly drifts from reality.

The OpenTelemetry question

The reason "OTel-native" is worth caring about is not purity. It is that a gen_ai.* span looks the same whether it came from your checkout service or your support agent, so one query answers "cost per request, by team" across the whole system — and it lives next to your latency and error telemetry instead of in a separate tool. If your telemetry schema is already an OpenTelemetry decision, an OTel-native emitter like OpenLIT (or LiteLLM's OTLP callback) keeps LLM cost in that same pipe. If you are buying an analysis product anyway, LangFuse's richer surface may matter more than schema portability.

How to pick

NeedRecommended
Routing, virtual keys, per-team budgets across providersLiteLLM
LLM cost and usage inside your existing OTel/Grafana/Datadog stackOpenLIT
Agent traces, evals, and prompt versioning as a product surfaceLangFuse
You want all three thingsLiteLLM to route + OpenLIT to emit OTLP; add LangFuse if you need evals
You have no observability backend yetLangFuse hosted is the fastest path to a usable dashboard

Combining them is normal

These are layers, not competitors. A common 2026 stack is LiteLLM as the gateway (routing + budgets), OpenLIT auto-instrumentation emitting OTLP to your collector, and LangFuse where teams that need evals and prompt management want them. The unifying thread is the OpenTelemetry GenAI convention: if every layer speaks gen_ai.*, the cost number survives swapping any single tool out.

The honest caveats

Related

← Back to llmcfo.com