Test-time compute scaling

Test-time compute scaling is the practice of increasing computational resources at inference time — rather than during training — to improve a model's performance on difficult tasks. Techniques include generating multiple candidate outputs and selecting the best via voting or reward models, extending the length of reasoning traces, and using verifier networks to check intermediate steps.

The approach rests on a bet: that the bottleneck in AI capability is not model size but search depth. A smaller model given more time to think may outperform a larger model forced to answer immediately. The empirical results are mixed but suggestive, and the technique raises fundamental questions about whether intelligence should be measured by model parameters or by inference-time architecture.