beginner · 3h · 3 lessons
How LLMs Actually Work
Tokens, context windows, attention, and the difference between training and inference — explained without the math jargon.
By the end of this course you will be able to:
- Explain why a 128K context window doesn't mean the model 'remembers' your conversation
- Predict how many tokens a prompt costs before sending it
- Choose the right model for a given task by latency, cost, and capability
Lessons
LESSON 1
What is a Token?
Tokens are the atoms of LLM communication. Get this right and everything downstream gets easier.
12 min →
LESSON 2
Context Windows: The Real Limits
A model's advertised context window is the ceiling, not the floor. Real-world performance degrades well before you hit it.
16 min →
LESSON 3
Temperature, Top-p, and Why Output Varies
The two dials that control how creative — or how deterministic — a model is. Most developers leave them at default and pay for it later.
14 min →