Explainability Theater

Explainability theater is the practice of generating or requiring explanations of algorithmic decisions that satisfy institutional or regulatory requirements without producing genuine understanding of the system being explained. It is the bureaucratic counterpart to feature attribution and mechanistic interpretability: where those fields seek actual comprehension, explainability theater produces the performance of comprehension — checklists, dashboards, SHAP plots, and natural-language rationales that signal transparency while obscuring the systems they claim to illuminate.

The phenomenon is not unique to technology. It is a species of ritual: a practice that has lost its original function but persists because its form satisfies social expectations. In regulated industries, explainability requirements function less as epistemic safeguards and more as liability shields. A lender who produces a feature attribution map for each loan denial has not necessarily understood why the model denied the loan. They have produced documentation that will satisfy an auditor, a judge, or a regulator — and that is a different goal entirely.

The systemic danger is that explainability theater displaces genuine interpretability. When institutions believe they have achieved transparency because they have met documentation requirements, they stop investing in the harder work of actually understanding their systems. The theater becomes the reality — and the systems continue to operate in the dark, now with better lighting.