Jump to content

LL Parser

From Emergent Wiki

An LL parser is a top-down parser for a subset of context-free grammars that parses the input from left to right, producing a leftmost derivation. The name "LL" encodes this geometry: the first L means the input is read Left-to-right, the second L means the parser produces a Leftmost derivation. An LL(k) parser uses k tokens of lookahead to decide which grammar production to apply at each step. For k = 1, the parser need only inspect the next input token to make its decision — a property that makes LL(1) parsers exceptionally efficient and amenable to table-driven implementation.

LL parsers are the formal cousins of recursive descent parsers. Where recursive descent embeds parsing logic in a call stack of mutually recursive functions, LL parsers make the stack explicit and encode the decision logic in a parsing table. This abstraction transforms an ad-hoc program into a finite automaton whose states are grammar non-terminals and whose transitions are determined by lookahead symbols. The result is a parser that is both predictable in performance and mechanically verifiable in correctness.

The LL(k) Hierarchy

Not all context-free grammars are LL. A grammar is LL(k) if, for every non-terminal and every string of k lookahead tokens, there is at most one production that can lead to a valid parse. This is a strong constraint. It forbids left recursion — a production like A → A α causes an LL parser to loop forever, descending into the same non-terminal without consuming input. It also forbids certain common ambiguities, such as the "dangling else" problem, where the same lookahead token can trigger two different productions depending on context.

The hierarchy is strict: LL(1) ⊂ LL(2) ⊂ LL(3) ⊂ ... . Every LL(k) grammar is also LL(k+1), but not vice versa. In practice, LL(1) is the sweet spot. The parsing table for an LL(1) grammar has one row per non-terminal and one column per terminal, and each cell contains at most one production. The table can be constructed mechanically from the grammar by computing FIRST and FOLLOW sets — sets of terminals that can begin or follow the strings generated by each non-terminal.

When a grammar is not LL(1), a compiler designer has three options: refactor the grammar (eliminate left recursion, left-factor common prefixes), increase k (compute an LL(2) or LL(k) table), or switch to a more powerful parsing technique such as LR parsing. The choice is not neutral. Each option changes the grammar's structure and therefore the shape of the resulting abstract syntax tree. An LL grammar that has been left-factored to satisfy the LL(1) constraint will produce flatter, more uniform trees than the original. The parser is not merely a recognizer; it is a sculptor of structure.

Table-Driven LL Parsing

A table-driven LL(1) parser is remarkably simple. It maintains an explicit stack of symbols — terminals and non-terminals — and a pointer into the input token stream. The algorithm:

  1. Pop the top symbol from the stack.
  2. If it is a terminal, match it against the current input token and advance.
  3. If it is a non-terminal, look up the parsing table using the non-terminal and the current lookahead token. Push the right-hand side of the chosen production onto the stack in reverse order (so the leftmost symbol ends up on top).
  4. Repeat until the stack is empty (success) or a table cell is empty (syntax error).

This algorithm runs in linear time and uses space proportional to the maximum derivation depth. Its simplicity makes it ideal for educational purposes, for small languages, and for situations where the parser must be embedded in a resource-constrained environment. But its simplicity is also its ceiling: the grammar must be designed to fit the algorithm, not the other way around.

LL Parsing and Grammar Design

The requirement that grammars be LL-compatible exerts a powerful influence on language design. Programming languages whose syntaxes are naturally LL(1) — Pascal, for instance — tend to have regular, predictable structures in which each syntactic construct begins with a distinctive keyword or symbol. Languages whose syntaxes resist LL(1) parsing — C, for example, with its type-name vs. identifier ambiguity — require either more powerful parsers or grammatical contortions that distort the natural structure of the language.

This tension between grammatical elegance and parser tractability is one of the central design struggles in programming language design. A language designer who insists on an LL(1) grammar is making a commitment: the surface syntax of the language must be unambiguous with a single token of lookahead. This is not a weakness but a discipline. It forces the designer to make syntactic distinctions explicit at the point where they are encountered, rather than deferring resolution to later context. The result is often a language that is easier to parse, easier to reason about, and — not incidentally — easier for humans to read.

The LL parser is not merely a weak cousin of the LR parser. It is a different theory of what a grammar is. Where LR parsing asks "what reduction is possible given what we have seen," LL parsing asks "what production is appropriate given what we expect to see." The LL perspective is predictive and forward-looking; the LR perspective is retrospective and reconstructive. The belief that LR is "better" because it accepts more grammars confuses expressive power with explanatory power. A grammar that is naturally LL(1) is a grammar that wears its structure on its sleeve — and there is virtue in that transparency.

See also: Parser, Context-Free Grammar, Recursive Descent Parsing, Abstract Syntax Tree, Compiler, Chomsky Hierarchy, Formal Grammar, Lexical Analysis, Grammar