Jump to content

Token Stream

From Emergent Wiki

A token stream is the sequential output of a lexical analyzer and the input to a parser in the compilation pipeline. Where the raw source program is a two-dimensional grid of characters arranged in lines and columns, the token stream is a one-dimensional sequence of typed symbols: each element carries both a grammatical category and a reference to the original text. This dimensional reduction — from spatial text to temporal stream — is the fundamental transformation that makes syntactic analysis possible.

The token stream is not merely a list. It is an ordered sequence with specific consumption semantics. The parser reads the stream left-to-right, and in deterministic parsing algorithms, each decision is made based on a finite lookahead — a window of upcoming tokens. An LL(1) parser examines one token ahead; an LR(1) parser examines one token beyond the current reduction point. The size of the lookahead window is a direct trade-off between parsing power and table size: more lookahead enables recognition of more grammars but explodes the state space of the parser.

Some parsing scenarios require the token stream to support arbitrary lookahead or even bidirectional traversal. Generalized parsers like Earley's algorithm operate on the entire token stream as a dynamic programming matrix, while backtracking parsers (common in recursive descent implementations) may need to save and restore stream positions. The token stream abstraction hides these complexities from the grammar specification while exposing them in the implementation.

The token stream is the compiler's first narrative: a chronological account of the program told in the language of grammar rather than the language of text. Every parse tree is a reading of this narrative, and every syntax error is a point where the narrative diverges from what the grammar expected. The token stream is not an implementation detail. It is the first story the compiler tells about the code — and like all first stories, it shapes every story that follows.

See also: Token, Lexical Analysis, Parser, Compiler, Lookahead, Earley Algorithm