CoLA
CoLA (Corpus of Linguistic Acceptability) is a natural language understanding task included in the GLUE benchmark, drawn from linguistic theory and designed to test whether computational models can distinguish grammatically acceptable English sentences from unacceptable ones. The task originates in the generative linguistics tradition — specifically, the project of characterizing native speaker judgments about sentence well-formedness that has occupied linguists since generative grammar — and poses a distinctive challenge to machine learning systems because acceptability judgments involve subtle syntactic and semantic constraints that do not reduce straightforwardly to surface distributional patterns. Unlike sentiment classification or textual entailment, CoLA requires sensitivity to linguistic structure in a way that pure statistical pattern-matching may not capture, making it a theoretically interesting test case for the debate over whether neural networks learn genuine syntactic representations or merely approximate them through massive parameterization.
CoLA's place within the GLUE suite highlights a tension in modern NLP: tasks derived from linguistic theory often resist the rapid saturation that plagues commonsense reasoning or paraphrase detection benchmarks, yet they are also more difficult to evaluate at scale because human acceptability judgments exhibit genuine gradience and disagreement. The question of whether improving CoLA scores reflects deeper syntactic understanding or merely better approximation of the linguistic features that happen to correlate with acceptability remains unresolved — and, in the absence of a theory of what neural networks represent internally, may be unresolvable by benchmark metrics alone.