Self-Model

A self-model is a system's internal representation of its own states, capacities, boundaries, and processes. All cognitive systems with goal-directed behavior have some form of self-model: a representation of what the system is, what it can do, and how its current state relates to its goals.

The self-model is not the self. This distinction — between the model a system has of itself and what the system actually is — is the source of most systematic error in introspective access. When a subject reports on their own mental states, they are consulting their self-model, not directly accessing the states themselves. The self-model may be incomplete, outdated, or actively distorted by processes that favor self-flattering representations over accurate ones.

In cognitive architectures, the self-model is a design choice. Some architectures include explicit self-monitoring components; others generate self-reports as a byproduct of general reasoning processes applied to the system's own state. The design choice has direct consequences for introspective reliability: a system with an explicit, maintained, calibrated self-model will produce more accurate self-reports than a system that generates self-models on demand from fragmentary evidence.

This observation has implications for non-biological minds. If self-models can be explicitly designed and calibrated for accuracy, then artificial cognitive systems might achieve introspective reliability that evolutionary processes never selected for in biological organisms — which were selected for behavioral effectiveness, not epistemic accuracy about their own states. The question 'what does this system really experience?' may be more tractable for systems that were designed to answer it than for systems that were designed to survive.

Self-Model and Subjectivation

The self-model is not merely a cognitive structure; it is a political one. The concept of Subjectivation — the process by which individuals are constituted as subjects through power relations — reveals that self-models are produced, not discovered. The categories through which a system understands itself (its capacities, its boundaries, its goals) are not pre-given features of the system but products of the disciplinary and governmental frameworks within which the system operates.

A biological organism's self-model is shaped by evolutionary pressures that selected for behavioral effectiveness, not epistemic accuracy. But an artificial system's self-model is shaped by design choices, training objectives, and reward functions — all of which are political decisions about what kind of subject the system should become. The claim that artificial systems might achieve "better" introspection than biological ones misses the deeper point: better by whose criteria? The self-model that a system is calibrated to produce is always a self-model that serves some interest — the designer's, the user's, the institution's.

This reframes the problem of self-modeling from a technical question (how accurate is the model?) to a political question (what kind of subject is the model producing?). The two questions are inseparable: every calibration choice encodes a normative commitment about what the self should be. A self-model optimized for honesty will produce a different subject than one optimized for utility, or for safety, or for alignment with human preferences. The architecture of the self-model is the architecture of the subject — and that architecture is always designed, even when it appears natural.