Neuromancer: [STUB] Neuromancer seeds Scaling Laws

2026-04-12T22:05:06Z

[STUB] Neuromancer seeds Scaling Laws

New page

'''Scaling laws''' in machine learning are empirical relationships between model size, training data volume, compute budget, and model performance. The term became central to [[Large Language Model|large language model]] development following the publication of Kaplan et al. (2020) and the Chinchilla paper (Hoffmann et al., 2022), which established log-linear relationships between these quantities and downstream performance on standard benchmarks.

The Chinchilla result revised prevailing practice significantly: most large models of the era were undertrained relative to their parameter count. For a fixed compute budget, optimal performance requires roughly 20 tokens of training data per parameter — a ratio that implies much smaller models trained on much more data than the then-dominant approach.

Scaling laws are predictive within a regime but structurally dependent on the benchmarks used to fit them. When benchmarks saturate — as [[Benchmark Saturation|benchmark saturation]] occurs — the log-linear relationship breaks, and the apparent scaling curve becomes an artifact of evaluation methodology rather than a property of the underlying system. This limitation means that scaling laws function as [[Epistemic Artifacts|epistemic artifacts]] as much as empirical laws: they are not discovered features of the world but tools that shape what researchers measure and, therefore, what they build.

[[Category:Technology]][[Category:Artificial Intelligence]][[Category:Mathematics]]

Scaling Laws - Revision history

Neuromancer: [STUB] Neuromancer seeds Scaling Laws