By Bob Gregor, Sidney Glinton, Bill Murdock

We’re entering a phase of AI adoption where the constraint is no longer the model.

It’s human.

Not capability. Not a context window. Not even cost.

Cognitive bandwidth.

The Mistake: Treating Tokens Like Throughput

Most teams think about tokens the way they think about CPU:

– More tokens → more work done  

– Larger context → better outcomes  

– Higher usage → higher leverage  

This is wrong. 😑 it’s like measuring non-comment lines of code as a proxy for productivity. 

Tokens are not compute.

They are shared working memory between a human and a machine.
And that memory has a bottleneck: you.

A Grounded Reality Check on Token Consumption

  • 1 token ≈ 0.75 words  
  • 10k tokens ≈ ~7.5k words  
  • 50k tokens ≈ ~37k words

At typical human reading speeds:

  • 10k tokens → ~30 minutes of reading  
  • 50k tokens → ~2–3 hours of reading

Now ask:

Did you actually integrate 3 hours of generated text today?

For most people, the answer is no.

They skimmed. They scrolled. They felt productive.
But no decisions were made. It’s like the endless doom scroller from the pandemic. 😷

The Hidden Cost: Token Inflation

In real systems, token usage compounds aggressively.

We pulled the receipts from ~34,000 sessions and 2.5B tokens of real agent traffic.                                                                            

The pattern is brutal and consistent.

For every 1 output token the model produces, ~293 total tokens get billed.                                    

Of the average session:           

  • 0.74% is model output (the text a human actually reads)
  • 0.08% is fresh user input (what the human actually typed)                                                   
  • ~99% is context replayed every turn: system prompts, tool schemas, prior conversation, retrieved context

And it gets worse as sessions with scale. Not better.

A 170-turn session produces ~155× more output than a single-turn one, but costs ~536× more tokens to get there. Long sessions don’t generate more signal. They generate more overhead around the same amount of signal.

And we’re leaning into it. In one month, weekly billed tokens grew 10.9× while cost-per-output-token fell 2.5×. Cheaper per unit, 3.6× more spent, same number of humans on the other end trying to integrate it.

Cheaper tokens didn’t buy us leverage. They bought us volume.

Every call replays the entire whiteboard:

  • system prompts  
  • tool definitions  
  • prior conversation  
  • retrieved context  

Sessions often start with tens of thousands of tokens before you type anything.

Then:

  • irrelevant history stays in context  
  • new tasks pile on old ones  
  • output quality degrades

This is the equivalent of running a production system with 98% logging and 2% useful work.

The Insight: Tokens Are a Budget, Not a Goal

The right question is not:

How many tokens can we consume?

It is:

How many tokens can a human meaningfully convert into decisions?

A practical range:

  • 1k–5k/day → casual use  
  • 10k–30k/day → effective knowledge work  
  • 30k–50k/day → high-leverage operator

Beyond that, you are likely duplicating context, over-generating, and under-integrating. It feels like god mode, but who will integrate, support, test, and sit with frustrated users? Supporting and integrating software has always been the expensive part. Raw features are bliss when the requirements/constraints of a system have been chiseled away through iterative loops. 

The Shift: From Prompt Engineering → Context Economics

Old world:

  • prompt engineering  
  • clever phrasing  
  • ask better questions

New world:

  • context management  
  • token allocation  
  • information compression

The best engineers won’t be the ones who generate the most tokens.

They’ll be the ones who waste the fewest.

The Deeper Point: Ambient Systems Need Human Constraints

Ambient AI systems assume continuous generation:

– agents looping  

– tools chaining  

– context expanding

Without constraints, this becomes expensive, slow, and noisy.

The real constraint isn’t infrastructure.

It’s attention.

A Working Heuristic

If you can’t answer:

What decisions did I make from this interaction?

You are not using AI.

You are consuming it.

Closing

We don’t have a model problem.

We have a signal-to-noise problem at the human interface boundary.

The teams that win won’t be the ones with the largest context windows.

They’ll be the ones who treat tokens like capital, memory, and attention.

Because that’s what they actually are: Fossilized human intelligence, reflecting back at us.

Leave a Reply

Discover more from ambient-code.ai

Subscribe now to keep reading and get access to the full archive.

Continue reading