Tokens Are Fuel. We're Measuring the Wrong Thing

AI tokens are the new fuel line item, and finance is being told to count them. But fuel-per-gallon is not the metric that runs a fleet. Why value-per-token is the next frontier in AI economics.

By Christopher Rafter · · 8 min read · Perspective
Tokens Are Fuel. We're Measuring the Wrong Thing

The fastest-growing line item in corporate technology budgets right now is AI tokens.

Deloitte's January 2026 analysis on AI spend dynamics put real numbers behind what most CIOs and CFOs are already feeling in their bones. AI now consumes up to half of IT spend at some firms. Cloud bills jumped 19% in 2025 driven almost entirely by generative AI workloads. Nearly half of executives expect basic AI automation to take three years to deliver ROI. Only 28% of global finance leaders report clear, measurable value from their AI investments.

Deloitte's recommendation is the predictable one: bring FinOps discipline to AI. Forecast token demand. Set budgets. Right-size models. Build chargebacks. Adopt rigorous governance.

Good advice. Necessary advice.

But it answers the wrong question.

We're Optimizing Fuel Consumption, Not Delivered Value

Imagine if every trucking company in America measured its drivers strictly by how much fuel they consumed.

Driver A burned 800 gallons last week. Driver B burned 1,200 gallons. Therefore, Driver A is the better operator. Promote Driver A. Audit Driver B.

This is approximately the conversation the industry is currently having about AI tokens.

Trucking companies don't measure drivers that way. They measure goods hauled. Miles driven. Deliveries completed on time and intact. The efficiency metric, the one that actually drives operational decisions, is miles per gallon. Not gallons consumed.

Fuel is the cost. Value is what gets delivered. Efficiency is the ratio between them.

Tokens are the AI equivalent of fuel. They're a proxy for energy consumed, for GPU cycles, for inference time, for the electricity flowing through an AWS Bedrock or Azure OpenAI tenant. They are measurable, billable, and increasingly the variable that finance leaders are being asked to control.

But the value an AI interaction produces is something else entirely. It's the report that gets actioned. The decision that gets made faster. The customer question that gets answered in four minutes instead of two days. The margin erosion that gets caught three weeks before the P&L would have surfaced it.

That value can be a thousand times larger than the token cost. Or it can be zero. A million tokens spent generating slop is a million tokens spent.

We have FinOps tools that count tokens beautifully. We have almost nothing that counts value.

The Hard Problem: How Do You Score Value?

The reason nobody measures value-per-token yet is that measuring value is genuinely hard.

Picture an analyst sitting at her desk on a Tuesday afternoon, running a one-hour exploratory session in ChatGPT. She asks 40 questions. She gets 40 answers. She copies three of them into a Word document that becomes the basis of a board memo. She discards the other 37.

The session consumed somewhere north of a million tokens.

What value did it produce?

Three answers landed in a board memo. Two were affirmations of what she already believed. One was a new framing that materially shifted the board's discussion. The other 37 answers, forgotten. Unused. Possibly correct, but unread.

Was the session worth it? By what measure?

The honest answer right now is: nobody knows. We can show you the cloud bill for the session. We cannot show you the dollar value of the board discussion that resulted from it.

This is the gap the next generation of AI economics tooling will need to close. And it is a much harder problem than counting tokens.

What Value-Per-Token Will Actually Look Like

If I had to predict where this category goes in the next 24 months, I would bet on three shifts.

Interaction-level value scoring. Every AI interaction will get scored on a value-delivered scale, likely a composite of explicit signals (the output was saved, shared, used in a downstream artifact) and implicit signals (the user kept iterating, the conversation reached a decision, the output was referenced later, the resulting action moved a metric). The early versions of this will be crude. The mature versions will be the difference between organizations that scale AI confidently and ones that quietly cap it.

Per-user and per-team efficiency benchmarking. Just as fleet managers benchmark drivers by miles-per-gallon, AI teams will benchmark employees and departments by value-per-token. Not to punish high consumption, but to identify the patterns that produce the most value per unit of fuel. The marketing team that uses 10x more tokens than legal but produces 100x more business outcomes is not wasteful. It is the case study.

Workload routing by value-efficiency, not just cost-efficiency. You don't run a long-haul freight load on a delivery van. You don't run a high-value strategic synthesis on the cheapest available model. The systems that can score value-per-token will eventually route workloads automatically: smaller models for low-value drafts, frontier models for high-stakes synthesis, owned infrastructure for the predictable bulk.

Deloitte gestures at the third shift when they discuss the "AI factory" architecture, hybrid hosting based on cost dynamics. Routing for value-efficiency, not just cost-efficiency, is the next layer up.

The Honest Discomfort

Here's what makes this hard for most enterprises to talk about openly.

The reason we can't currently measure AI value-per-token isn't really that the tooling is missing. It's that most organizations have never rigorously measured the value of knowledge work in the first place.

How much business value did your senior analyst produce last quarter? Your finance team? Your data team? Most companies cannot tell you. They've never built the measurement infrastructure because the inputs (salary, headcount, hours) felt close enough to the outputs (the work got done, the reports got delivered).

AI breaks that approximation.

When you can spin up effectively unlimited analyst-equivalents at a measurable token cost, the question "did this work produce value?" stops being rhetorical. It becomes the only question that matters.

The organizations that figure out how to answer it will run their AI programs an order of magnitude better than the ones that just count tokens.

Where This Work Is Happening

For the last two years, we've been building toward exactly this measurement problem at Inzata.

Our product, DataBlueprint, reads across the business systems an organization already runs (Excel files, QuickBooks, NetSuite, HubSpot, and 800+ others) into a Knowledge Graph and lets operators talk to it in plain English. Sandboxed LLMs inside a single-tenant AWS Bedrock environment. No third-party APIs. Customer data never used for training. The free-forever plan is live at inzata.ai.

The piece we're building right now: a value-per-token scoring model that ranks every BI interaction on a value-delivered scale. So an organization can identify not just the employees and activities using AI the most, but specifically which ones generate the most value-per-token. The patterns that work. The patterns that waste fuel.

If Deloitte is right that AI economics is the new strategic reckoning, and I think they are, then value-per-token is the metric that lets enterprises ride the wave instead of being crushed by it.

Tokens are fuel. The real question is what you're hauling, where it's going, and whether anyone is going to use it when it arrives.

That's the next frontier.


Reference: Deloitte Insights, "AI tokens: How to navigate AI's new spend dynamics," January 19, 2026.

Frequently Asked Questions

The Hard Problem: How Do You Score Value?

The reason nobody measures value-per-token yet is that measuring value is genuinely hard. Picture an analyst sitting at her desk on a Tuesday afternoon, running a one-hour exploratory session in ChatGPT. She asks 40 questions. She gets 40 answers. She copies three of them into a Word document that becomes the basis of a board memo. She discards the other 37. The session consumed somewhere north of a million tokens. What value did it produce? Three answers landed in a board memo. Two were affirmations of what she already believed. One was a new framing that materially shifted the board's discussion. The other 37 answers, forgotten. Unused. Possibly correct, but unread. Was the session worth it? By what measure? The honest answer right now is: nobody knows. We can show you the cloud bill for the session. We cannot show you the dollar value of the board discussion that resulted from it. This is the gap the next generation of AI economics tooling will need to close. And it is a much harder problem than counting tokens.