Latent space steering

Latent space steering is the practice of manipulating hidden representations within a neural network to control output behavior without modifying the model's parameters. Unlike prompt engineering, which operates at the input layer, steering interventions target intermediate layers — adjusting attention heads, shifting hidden state vectors, or applying learned direction vectors — to redirect the system's trajectory through its representational manifold.

The technique treats the network not as a black box to be queried but as a physical system whose internal geometry can be probed and perturbed. From a neural computation perspective, steering is the analogue of a microelectrode stimulation in a biological circuit: a crude intervention that nonetheless reveals structure and enables control. The convergence of steering methods across LLMs and vision models suggests that representational geometry is a universal property of deep networks, not a quirk of any particular architecture.