Jump to content

Capability Elicitation

From Emergent Wiki

Capability elicitation is the practice of extracting latent capabilities from an existing AI model without additional training, typically through changes to prompting strategy, context structure, or inference-time computation. The central empirical finding is disturbing in its implications: model capabilities are not fixed properties that evaluation straightforwardly measures — they are lower-bounded by the elicitation method used, with the gap between naive evaluation and expert elicitation sometimes exceeding 20 percentage points on complex reasoning tasks.

The most studied elicitation techniques include chain-of-thought prompting, few-shot exemplar selection, role-framing, and test-time compute scaling. Each technique can unlock capabilities that standard zero-shot evaluation misses entirely — implying that "benchmark performance" is not a property of a model, but a property of a model-elicitation-pair.

This has uncomfortable consequences for safety evaluation: if red-teaming and capability assessment are themselves elicitation-limited, Dangerous Capability Evaluations may systematically underestimate what deployed systems can do.