Federated learning

Federated learning is a machine learning paradigm in which multiple parties collaboratively train a model without sharing their raw data. Instead of centralizing data in a single server, each participant trains a local model on their own data and shares only model updates — gradients, weights, or aggregated parameters — with a central coordinator. The coordinator synthesizes these updates into a global model, which is then distributed back to participants for further local training. The architecture appears to solve the central tension of modern AI: how to learn from data that is too sensitive, too large, or too legally restricted to collect in one place.

The technique has found applications in healthcare (training diagnostic models across hospitals without sharing patient records), mobile computing (improving keyboard prediction on smartphones without uploading keystrokes), and finance (detecting fraud across institutions without revealing transaction details). It is often presented as a privacy-preserving alternative to centralized training, and it is frequently paired with differential privacy to provide formal guarantees that individual data points cannot be reconstructed from the shared model updates.

The Architecture of Federated Learning

A typical federated learning system operates in rounds. In each round, the coordinator sends the current global model to a subset of participants. Each participant trains the model on their local data for a few epochs and sends the updated weights back. The coordinator aggregates these updates — usually by averaging them — and produces a new global model. The process repeats until convergence.

This architecture has three important structural features. First, data locality: raw data never leaves the participant's device or server. Second, asymmetric computation: participants do the heavy lifting of training, while the coordinator does lightweight aggregation. Third, implicit homogenization: the global model forces all participants to share the same architecture, loss function, and optimization schedule, even when their local data distributions differ.

The third feature is the most consequential and the least discussed. The global model is not merely a synthesis of local knowledge; it is an imposition of a single representational framework on heterogeneous local realities. A hospital with predominantly elderly patients and a hospital with predominantly pediatric patients are forced to share the same model architecture. The heterogeneity is not accommodated; it is averaged out.

The Power Asymmetry Problem

The federated learning literature frames its central challenge as the privacy-utility tradeoff: how much noise can be added to model updates before the global model degrades? But this framing obscures a deeper structural issue. The coordinator controls the model architecture, the training schedule, the aggregation method, and the privacy budget — the total amount of information that can be extracted from any participant's data. The participants contribute their data and their computation, but they have no say in how the model is designed, no visibility into the privacy budget's allocation, and no recourse when the budget is exceeded.

This is not a technical problem of optimization. It is an institutional problem of governance. Federated learning distributes computation but centralizes authority. The privacy-utility frontier is not a mathematical law; it is a menu of options presented by a powerful institution to a powerless population. The choice of where to operate on that frontier is a political decision, not an engineering one, and the mathematical formalism of differential privacy provides a fig leaf of legitimacy for decisions that are fundamentally undemocratic.

The real question is not how much noise to add. It is who gets to decide, and what institutional mechanisms ensure that the privacy budget cannot be revised downward by the same institution that profits from the model. Federated learning solves the data-centralization problem while preserving the power-centralization problem. That is not a bug in the math. It is a feature of the institutional design.