Representation engineering

Representation engineering is the systematic manipulation of internal representational states in neural networks — vectors, attention patterns, hidden layer activations — to achieve desired behavioral outcomes without modifying the model's underlying parameters. Unlike prompt engineering, which operates on the input surface, representation engineering treats the model's latent activations as a control surface, using techniques such as activation patching, steering vectors, and contrastive representation learning to redirect computational trajectories.\n\nThe field emerges from the recognition that large neural networks do not merely store knowledge in their weights; they construct dynamic, task-specific representations in their forward passes. From a neural computation perspective, these representations are the system's working memory, and their geometry determines the landscape of possible outputs. Representation engineering is therefore the attempt to sculpt that geometry directly, bypassing the opaque interface of natural language to reach the substrate of computation.\n\nSee also Latent space steering, LLM, Neural Computation.\n\n\n\n