What is a Token? — How LLMs Actually Work

Every modern LLM — GPT, Claude, Llama, Gemini — speaks in tokens, not characters or words. A token is roughly 3/4 of an English word. The phrase "Creative Genius helps businesses" is about 5 tokens.

Why this matters in production

Pricing. APIs charge per million input + output tokens. A 5,000-word brief is roughly 6,500 tokens — that's real money at scale.
Latency. Output tokens take ~10–40ms each. A 1,000-token response is a perceivable second of wait.
Context limits. A "128K context window" means the model sees up to ~96,000 English words at once. Beyond that, things get truncated silently.

Common gotcha

Different model families tokenize differently. The same prompt sent to GPT-4o, Claude, and Llama 3 will have three different token counts — and three different costs. Always estimate using the target model's own tokenizer.