Federated Learning

Federated learning is a distributed machine learning approach in which model training occurs across many decentralized client devices or servers, each holding local data, with only model updates — not raw data — transmitted to a central aggregator. Introduced by Google in 2016 to enable training on mobile device data without violating user privacy, federated learning has since become the dominant paradigm for privacy-preserving machine learning at scale. The central empirical challenge is that client populations are not independently and identically distributed: different clients have different data distributions, different hardware, and different participation rates. This statistical heterogeneity means that the central aggregator must somehow produce a model that generalizes across a population it has never directly observed. Structurally, federated learning implements a form of group-level optimization: the aggregator selects and weights updates based on collective client performance, not individual client gradients. The theoretical properties of this aggregation — when it converges, what it converges to, and what adaptations it favors — remain an active research area. The practical properties are clear: it enables training on data that could not otherwise be centralized, at the cost of convergence guarantees that depend on population composition.