Normalized Tokens
- Why AutoTalk normalizes AI tokens before billing them
- How the per-model weight is computed from vendor prices
- How normalized tokens roll up into the
tokens_inandtokens_outmeters
AutoTalk works with many different language models from many different vendors, and each vendor charges a different price per token. If we simply counted raw tokens, using a cheap model would look exactly the same on your invoice as using the most expensive one -- which would be unfair to everyone.
To keep billing fair and consistent across vendors, every token is converted into a normalized token before it touches the tokens_in and tokens_out meters on your plan. Normalization does not change the tokens the model actually consumed -- it just re-weights them so that expensive tokens count for more than cheap tokens.
The idea in one sentence
1 normalized token = 1 raw token × how expensive the model is compared to our baseline.
If a model's input is exactly as expensive as the baseline, 1 raw input token is 1 normalized input token. If the model costs twice as much, 1 raw input token is 2 normalized input tokens. If it costs half as much, 1 raw input token is 0.5 normalized input tokens.
The baseline
AutoTalk fixes an internal baseline price used to define what "1x" means. These values are maintained internally and used only for weighting -- they are not the prices you see on your invoice.
| Baseline | Value |
|---|---|
| Input per 1K tokens | $0.005 (equivalent to $5 / 1M) |
| Output per 1K tokens | $0.015 (equivalent to $15 / 1M) |
A model whose input is priced at $0.005 per 1K gets an input weight of 1.0. A model whose input is priced at $0.0025 per 1K gets a weight of 0.5. A model whose input is priced at $0.010 per 1K gets a weight of 2.0.
The formula
For each model, AutoTalk computes two weights -- one for input, one for output:
weight_in = model_input_price_per_1K / baseline_input_price_per_1K
weight_out = model_output_price_per_1K / baseline_output_price_per_1K
Then normalized tokens are:
normalized_in_tokens = raw_prompt_tokens * weight_in
normalized_out_tokens = raw_completion_tokens * weight_out
These are the numbers that accumulate in your tokens_in and tokens_out meters every time an AI agent runs.
Safety clamps
Vendor prices occasionally change, and occasionally contain outliers or temporary glitches. To protect both you and AutoTalk from a surprise invoice caused by a bad price feed, every computed weight is clamped to a safe range:
| Clamp | Value |
|---|---|
| Minimum weight | 0.05 (no model counts for less than 5% of a baseline token) |
| Maximum weight | 8 (no model counts for more than 8× a baseline token) |
If a model's price is missing or unknown, its weight defaults to 1.0 (treated as baseline).
A worked example
Suppose an AI agent call consumes 1,000 raw input tokens and 500 raw output tokens using gpt-4o, which is priced at $0.0025 / $0.01 per 1K (input/output).
Compute the weights:
weight_in = 0.0025 / 0.005 = 0.5
weight_out = 0.01 / 0.015 ≈ 0.667
Apply them:
normalized_in_tokens = 1000 * 0.5 = 500
normalized_out_tokens = 500 * 0.667 ≈ 333
The agent counts 500 against your tokens_in allowance and 333 against your tokens_out allowance -- not the raw 1000 / 500 it actually consumed.
The same call with gpt-4o-mini (priced at $0.00015 / $0.0006 per 1K) would instead burn only 30 and 20 normalized tokens respectively -- reflecting how much cheaper that model is.
Overage reporting
At the end of each billing period, AutoTalk aggregates the tokens_in and tokens_out counters for overage billing. To keep the numbers manageable, normalized tokens are reported in units of 1,000 (one billing unit = 1,000 normalized tokens).
Why this matters for you
- Your cheapest workflow stays cheap. Running an agent on
gpt-4o-miniconsumes a tiny fraction of a normalized token per raw token, so your monthly allowance goes much further. - Your most expensive workflow is priced fairly. Running an agent on a top-tier model consumes more normalized tokens per raw token, which matches AutoTalk's real cost of serving that model.
- Switching models is predictable. Because every model is priced relative to the same baseline, you can reason about plan capacity without memorizing vendor price tables.
Next steps
- Plan selection -- See which meters your plan caps and by how much
- Usage history -- Review how your
tokens_in/tokens_outmeters have trended over time