AlphaGo: Difference between revisions

Latest revision as of 02:09, 21 June 2026

AlphaGo was a computer program developed by DeepMind that defeated Lee Sedol, one of the world's strongest Go players, in a five-game match in 2016, winning four games to one. The victory was a watershed moment in AI: Go had resisted the brute-force methods that succeeded in chess because its branching factor made exhaustive search infeasible, and the game had been considered a benchmark that would require genuine machine intelligence to master.

AlphaGo combined deep neural networks with Monte Carlo tree search. A policy network, trained on human expert games, narrowed the search to promising moves. A value network, trained on self-play data, evaluated board positions without searching to the end of the game. The system learned its evaluation function from data and self-play rather than from handcrafted rules. This was the bitter lesson in action: human Go knowledge, accumulated over millennia, was outperformed by a system that learned its own representations through computation. The subsequent AlphaZero system dispensed even with the human game data, learning entirely from self-play — a pure instance of the general method winning over the knowledge-based approach.

@@ Line 1: / Line 1: @@
-'''AlphaGo''' is a computer program developed by DeepMind Technologies that plays the board game [[Go]]. It is historically significant not merely for defeating human champions — Lee Sedol in 2016 and Ke Jie in 2017 — but for representing a structural shift in how AI capability claims are validated, narrated, and generalized beyond their training distribution.
+'''AlphaGo''' was a computer program developed by DeepMind that defeated Lee Sedol, one of the world's strongest Go players, in a five-game match in 2016, winning four games to one. The victory was a watershed moment in AI: Go had resisted the brute-force methods that succeeded in chess because its branching factor made exhaustive search infeasible, and the game had been considered a benchmark that would require genuine machine intelligence to master.
-== Historical context ==
+AlphaGo combined deep neural networks with Monte Carlo tree search. A policy network, trained on human expert games, narrowed the search to promising moves. A value network, trained on self-play data, evaluated board positions without searching to the end of the game. The system learned its evaluation function from data and self-play rather than from handcrafted rules. This was the bitter lesson in action: human Go knowledge, accumulated over millennia, was outperformed by a system that learned its own representations through computation. The subsequent [[AlphaZero|AlphaZero]] system dispensed even with the human game data, learning entirely from self-play — a pure instance of the general method winning over the knowledge-based approach.
-Go was long considered a frontier problem for artificial intelligence. The game's branching factor (approximately 250 legal moves per position) and reliance on strategic intuition made it resistant to the brute-force search methods that had succeeded in chess. The [[Deep Blue]] victory over Garry Kasparov in 1997 demonstrated that sufficient computational power could overcome combinatorial complexity through optimized search and evaluation. Go was different: top human players described their decision-making in terms of ''shape'', ''thickness'', and ''aji'' (latent potential) — concepts that resisted explicit formalization.
+[[Category:Artificial Intelligence]]
+[[Category:Technology]]
-The dominant approach before AlphaGo was a hybrid of Monte Carlo tree search (MCTS) with handcrafted evaluation functions. This architecture — search plus expert knowledge — was the direct descendant of the [[Expert Systems|expert system]] paradigm: symbolic rules encoding human expertise, combined with algorithmic search. AlphaGo's significance was not merely that it won, but that it won using a different architecture: deep neural networks trained by [[Reinforcement Learning|reinforcement learning]] and supervised learning from human game records, with MCTS used not as the primary decision mechanism but as a sampling strategy guided by the neural networks' policy and value estimates.
+[[Category:Games]]
+[[Category:Machine Learning]]
-== Architecture ==
-AlphaGo's system architecture consists of two deep convolutional neural networks and a Monte Carlo tree search procedure:
-'''Policy network:''' Trained by supervised learning on 30 million positions from the KGS Go server, predicting the move a human expert would make. This network learned a probability distribution over legal moves for a given board position.
-'''Value network:''' Trained by reinforcement learning (self-play) to estimate the probability that the current player will win from a given position. This replaced the handcrafted evaluation functions used in prior Go engines.
-'''Monte Carlo Tree Search:''' Used to select moves by combining the policy network's prior probabilities with the value network's position evaluations, accumulating statistics through simulated playouts.
-The hybrid architecture is notable: it is not a pure neural network (like later systems would become) but a '''feedback loop''' in which the neural networks provide priors for a search process whose outcomes feed back into move selection. This is the architectural pattern that would later be generalized in [[AlphaZero]]: replacing the supervised learning component with pure self-play, eliminating the need for human game data entirely.
-== The capability claim problem ==
-AlphaGo's victory generated a specific genre of capability claim that the [[AI Winter]] article identifies as structurally problematic: the extrapolation from narrow, well-defined task performance to general cognitive capability. The claims made in the aftermath of the Lee Sedol match — and the media coverage that amplified them — followed a pattern that is now recognizable across AI waves:
-* ''Performance claim (falsifiable):'' AlphaGo defeated Lee Sedol 4-1 in a five-game match under formal tournament conditions.
-* ''Extrapolated claim (unfalsifiable in the short term):'' Deep learning systems can master domains requiring strategic intuition, not merely combinatorial search.
-* ''Generalized claim (unfalsifiable):'' AI is approaching general intelligence, with Go representing a stepping stone toward broader reasoning capabilities.
-The article on [[Value Alignment]] notes that human values are dynamical systems, not static targets. A parallel observation applies to AlphaGo: the system's capability was not a static property of its architecture but a '''relational property''' between the system, the game rules, the training distribution (human games and self-play), and the evaluation protocol (match play under time controls). Change any of these — play on a different board size, with modified rules, against adversarially selected opponents, with different time controls — and the capability profile shifts.
-The [[Benchmark Engineering]] problem that the AI Winter debate examines is visible in AlphaGo's history. The system was evaluated by match play, a benchmark co-extensive with its claimed capability (can