Token Undressed: What AI Costs Are About to Expose

I remember sitting in a windowless conference room in 2012, staring at a printed spreadsheet. I needed an extra two hundred dollars a month for a staging server. The finance lead looked at me like I had asked the company to buy a private jet. We spent forty five minutes debating whether QA could share hardware with the build server. Every dollar had a name, a justification, and an owner.

That was the old way. Then cloud happened. Engineers could spin up instances with a few clicks, and for a while, nobody asked questions. The scrutiny returned eventually. The pattern is predictable: a new resource arrives, it feels infinite, spending runs unchecked, the bills land, and someone in finance starts asking questions.

Right now, that resource is AI tokens.

For most of 2025, tokens felt like a rounding error. Models improved so quickly that the per-request cost barely registered. Executives told teams to use them freely. Engineering leaders shared screenshots of million-token workflows and overnight agent runs like badges of honor. The implicit message: burning tokens meant you were building the future.

Then the monthly invoices arrived, and the future turned out to be expensive.

A prominent AI company CEO admitted earlier this year that the cost question "came up quite suddenly at the beginning of 2026." Before that, he said, nobody ever raised the issue. Spending was invisible. Now it is the main topic of every budget meeting. When a single engineering team burns over a million dollars a month on AI agents, someone in the C-suite notices. And once that attention lands, it stays.

The token free-for-all is ending. What replaces it will quietly reshape how engineering teams are structured, how performance is measured, and how careers advance. Most companies are not ready for it.

The most immediate change will be token budgets joining salary, equity, and benefits as standard engineering compensation. You receive a yearly token stipend. Spend it all, you break even. Come in under budget, you pocket the difference as a bonus. On paper, it is clever incentive design. Engineers treat tokens like real money.

But incentives are slippery. Some people will genuinely optimize for efficiency. They will learn to write tighter prompts, break large tasks into smaller validation steps, and route low-stakes work through cheaper models. Others will take the opposite path. They will figure out the minimum visible output required to justify their spend and optimize for that number instead. The gap between looking efficient and being efficient will grow wider. Most managers will not be able to tell the difference.

This gets messier when budgets are allocated at the team or department level. Suddenly your token usage is not private. It is visible. Auditable. Political.

Picture a ten-person team with a shared quarterly token pool. One engineer runs a series of overnight agent sessions, burning through a third of the pool on a feature that a competent developer could have built manually in two days. The remaining nine now ration their tokens. Standup meetings acquire a new silent metric. Colleagues notice who is burning the budget and who is carrying the weight. The token playground engineer becomes a liability fast.

The most uncomfortable practice to emerge will be something engineers already do informally but will soon formalize: scrutinizing each other's prompts before they go live. When a single poorly worded instruction can cost hundreds of dollars in agent execution time, the stakes are high enough that no one wants to be the person who wasted the budget. Your prompt becomes a pull request. Your teammates become editors. Your burn rate becomes your reputation. It sounds extreme. So did mandatory code review. Now it is the baseline. Prompt review will follow the same arc, pushed by the same force: cost.

And then there is the measurement problem. This is where things get genuinely dangerous.

Once token budgets become a fixed expense, companies will want to know who deserves the largest share. The most obvious metric, the one that requires zero thought to implement, is lines of code produced. Brian shipped a hundred lines last month. Timmy shipped ten thousand. Guess who gets the bigger token budget next quarter.

The trouble is that AI makes generating volume trivial. Anyone can produce thousands of lines in an afternoon by feeding a spec into an agent. The harder skills, the ones that actually matter, are the ones that reduce line count. Deleting dead code. Simplifying architecture. Choosing a well-maintained library over a custom implementation. Refactoring a sprawling function into something readable and testable. None of that shows up in a LoC report.

Companies that reward volume will get volume. They will also get codebases so bloated with AI-generated output that shipping anything new becomes a multi-month exercise in untangling dependencies nobody fully understands. I watched teams bury themselves under unnecessary frameworks and over-engineered abstractions long before language models existed. AI just accelerates the burial. The shovel is motorized now.

This is not a warning about the tools. The tools are genuinely useful, and they improve every quarter. The warning is about the organizational patterns that surround the tools. Those patterns have not changed in decades.

Measuring the wrong thing. Optimizing for visibility over value. Confusing activity with progress. Every engineer with more than a few years of experience has seen some version of this. RAM was once the scarce resource people fought over. Then cloud compute. Then headcount. Now tokens. The currency changes. The dysfunction does not.

The engineers who come through this shift intact will not be the ones who produce the most output. They will be the ones who understand that a tool, no matter how powerful, does not replace thinking. The value of an engineer has never been measured in keystrokes per minute. It is measured in judgment. Knowing what to build, what to leave alone, and what to tear down.

That judgment costs tokens too. But it costs far less than building the wrong thing at scale.

Token Undressed: What AI Costs Are About to Expose

Continue Reading.

Committed Raw

Warm Starts

Getting Looped