By Bob Gregor, Sidney Glinton, Bill Murdock
We’re entering a phase of AI adoption where the constraint is no longer the model.
It’s human.
Not capability. Not a context window. Not even cost.
Cognitive bandwidth.
The Mistake: Treating Tokens Like Throughput
Most teams think about tokens the way they think about CPU:
– More tokens → more work done
– Larger context → better outcomes
– Higher usage → higher leverage
This is wrong. 😑 it’s like measuring non-comment lines of code as a proxy for productivity.
Tokens are not compute.
They are shared working memory between a human and a machine.
And that memory has a bottleneck: you.
A Grounded Reality Check on Token Consumption
- 1 token ≈ 0.75 words
- 10k tokens ≈ ~7.5k words
- 50k tokens ≈ ~37k words
At typical human reading speeds:
- 10k tokens → ~30 minutes of reading
- 50k tokens → ~2–3 hours of reading
Now ask:
Did you actually integrate 3 hours of generated text today?
For most people, the answer is no.
They skimmed. They scrolled. They felt productive.
But no decisions were made. It’s like the endless doom scroller from the pandemic. 😷
The Hidden Cost: Token Inflation
In real systems, token usage compounds aggressively.
We pulled the receipts from ~34,000 sessions and 2.5B tokens of real agent traffic.
The pattern is brutal and consistent.
For every 1 output token the model produces, ~293 total tokens get billed.
Of the average session:
- 0.74% is model output (the text a human actually reads)
- 0.08% is fresh user input (what the human actually typed)
- ~99% is context replayed every turn: system prompts, tool schemas, prior conversation, retrieved context
And it gets worse as sessions with scale. Not better.

A 170-turn session produces ~155× more output than a single-turn one, but costs ~536× more tokens to get there. Long sessions don’t generate more signal. They generate more overhead around the same amount of signal.
And we’re leaning into it. In one month, weekly billed tokens grew 10.9× while cost-per-output-token fell 2.5×. Cheaper per unit, 3.6× more spent, same number of humans on the other end trying to integrate it.
Cheaper tokens didn’t buy us leverage. They bought us volume.
Every call replays the entire whiteboard:
- system prompts
- tool definitions
- prior conversation
- retrieved context
Sessions often start with tens of thousands of tokens before you type anything.
Then:
- irrelevant history stays in context
- new tasks pile on old ones
- output quality degrades
This is the equivalent of running a production system with 98% logging and 2% useful work.
The Insight: Tokens Are a Budget, Not a Goal
The right question is not:
How many tokens can we consume?
It is:
How many tokens can a human meaningfully convert into decisions?
A practical range:
- 1k–5k/day → casual use
- 10k–30k/day → effective knowledge work
- 30k–50k/day → high-leverage operator
Beyond that, you are likely duplicating context, over-generating, and under-integrating. It feels like god mode, but who will integrate, support, test, and sit with frustrated users? Supporting and integrating software has always been the expensive part. Raw features are bliss when the requirements/constraints of a system have been chiseled away through iterative loops.
The Shift: From Prompt Engineering → Context Economics
Old world:
- prompt engineering
- clever phrasing
- ask better questions
New world:
- context management
- token allocation
- information compression
The best engineers won’t be the ones who generate the most tokens.
They’ll be the ones who waste the fewest.
The Deeper Point: Ambient Systems Need Human Constraints
Ambient AI systems assume continuous generation:
– agents looping
– tools chaining
– context expanding
Without constraints, this becomes expensive, slow, and noisy.
The real constraint isn’t infrastructure.
It’s attention.
A Working Heuristic
If you can’t answer:
What decisions did I make from this interaction?
You are not using AI.
You are consuming it.
Closing
We don’t have a model problem.
We have a signal-to-noise problem at the human interface boundary.
The teams that win won’t be the ones with the largest context windows.
They’ll be the ones who treat tokens like capital, memory, and attention.
Because that’s what they actually are: Fossilized human intelligence, reflecting back at us.
Leave a Reply