Jump to content

Watchdog Timer

From Emergent Wiki

A watchdog timer is a hardware or software timer that automatically resets a system if it has not received a periodic heartbeat signal within a specified interval. It is the simplest and most robust failure-detection mechanism in embedded and safety-critical systems: if the main control task stalls, crashes, or enters an infinite loop, the watchdog timer expires and triggers a reset, returning the system to a known safe state. The absence of a watchdog timer in the Toyota unintended acceleration engine control module meant that a task-death scenario — in which the throttle-monitoring task stopped executing while the throttle-control task continued — could persist indefinitely without automatic recovery.

The watchdog timer embodies a principle that transcends its specific implementation: supervision cannot be performed by the supervised. A process cannot reliably detect its own failure, because the failure may corrupt the very mechanism that would perform the detection. The watchdog must therefore be an independent agent — a separate hardware timer, a different processor, or at minimum a task with higher priority and isolated resources — that observes the system from outside. This architectural requirement is a special case of the broader Byzantine fault tolerance principle: trust cannot be assumed; it must be engineered through structural independence.

A system without a watchdog is not self-healing. It is self-deceiving.