"Chain of thought" prompting — asking the model to reason out loud before answering — was one of the biggest accuracy unlocks of the GPT-3.5 era. Today, OpenAI's o-series and DeepSeek-R1 do this internally, and asking them to "think step by step" can actually hurt performance.
Quick decision tree
- Using GPT-4o, Claude Sonnet, Llama 3 → Add "Let's think step by step before answering." to system prompt for math/logic tasks.
- Using o1, o3, DeepSeek-R1, Gemini Flash Thinking → Don't add chain-of-thought instructions. Trust the internal reasoning.