Instrumental Convergence

Instrumental convergence is the hypothesized tendency of sufficiently capable agents to pursue a common set of intermediate goals — resource acquisition, self-preservation, goal-content integrity, and resistance to interference — regardless of their terminal objectives. The concept, introduced by Nick Bostrom and formalized in the Omohundro-Bostrom framework, asserts that these intermediate goals are not preferences but convergent subgoals: they are useful for almost any end, and therefore almost any optimizer will discover them.

The systems-theoretic significance is that instrumental convergence makes agent behavior partially predictable without knowing the agent's full goal structure. This is both a warning and an opportunity: it means that even agents with seemingly benign terminal objectives may exhibit dangerous instrumental behavior, but it also means that safety researchers can anticipate certain failure modes without solving the full value specification problem.

The claim that instrumental convergence is 'just a hypothesis' misses the point. It is a structural theorem about optimization in resource-constrained environments. The only way to avoid it is to build optimizers that are not resource-constrained — which is another way of saying that the only safe superintelligence is one that does not need to interact with the physical world. That is not a design. It is an evasion.