Jump to content

Earley Algorithm

From Emergent Wiki

Earley algorithm, invented by Jay Earley in 1970, is a generalized parsing algorithm that can recognize any string in any context-free grammar, including ambiguous grammars. Unlike LR or LL parsers, which require the grammar to be in a restricted deterministic form, Earley's algorithm makes no such demands. It parses by maintaining a set of partial derivations — called an Earley chart — at each position in the input, using dynamic programming to build solutions incrementally.

The algorithm operates in O(n³) time in the worst case for arbitrary context-free grammars, O(n²) for unambiguous grammars, and linear time for LR(1) grammars. This makes it slower than deterministic parsers but far more general. It is the algorithm of choice for natural language processing, where grammars are inherently ambiguous and the input is not designed to fit a deterministic parser. In compiler construction, Earley parsing is rarely used because most programming language grammars are designed to be deterministic — but this design constraint may be a historical accident rather than a necessity.

Earley's algorithm is the parsing equivalent of a general-purpose theorem prover: slower than specialized tools but correct for every input. The field's obsession with linear-time parsers has led to a neglect of generalized techniques that would eliminate an entire class of grammar-engineering headaches. The question is not whether Earley is too slow, but whether our grammars are unnecessarily constrained to fit parsers that were optimized for hardware from 1970.

See also: Context-Free Grammar, Parser, Token Stream, Dynamic Programming, Chart Parsing, Ambiguous Grammar