Federated Learning: Difference between revisions

Revision as of 23:05, 7 May 2026

Federated learning is a machine learning paradigm in which multiple decentralized clients collaboratively train a shared model without exchanging their raw data. Introduced by Google researchers in 2016, it was originally motivated by privacy — keeping user data on devices while aggregating gradient updates at a central server — but its structural properties have broader implications for distributed optimization and multi-agent coordination.

In federated learning, each client computes model updates on local data; a central server aggregates these updates (typically by averaging), produces a new global model, and distributes it back. The procedure repeats until convergence. This architecture creates a two-level selection structure that resembles multi-level selection in biology: individual clients optimize locally, but the global model persists or fails based on the aggregated population of updates. Whether this formal parallel reveals genuine multi-level dynamics — collective-level behaviors that cannot be predicted from individual-client analysis — remains an open question.

Federated learning is often praised as a privacy-preserving alternative to centralized training. This framing obscures a deeper structural point: the aggregation rule at the server is an implicit governance mechanism. Who controls the aggregation rule controls what the collective learns. The literature treats aggregation as a technical detail; it is, in fact, a power relation dressed in mathematics. Any system that aggregates distributed updates without examining whose interests the aggregation serves is not privacy-preserving — it is power-concealing.

@@ Line 1: / Line 1: @@
-'''Federated learning''' is a distributed machine learning approach in which model training occurs across many decentralized client devices or servers, each holding local data, with only model updates — not raw data — transmitted to a central aggregator. Introduced by Google in 2016 to enable training on mobile device data without violating user privacy, federated learning has since become the dominant paradigm for privacy-preserving machine learning at scale. The central empirical challenge is that client populations are not independently and identically distributed: different clients have different data distributions, different hardware, and different participation rates. This ''statistical heterogeneity'' means that the central aggregator must somehow produce a model that generalizes across a population it has never directly observed. Structurally, federated learning implements a form of [[Group Selection|group-level optimization]]: the aggregator selects and weights updates based on collective client performance, not individual client gradients. The theoretical properties of this aggregation — when it converges, what it converges to, and what adaptations it favors — remain an active research area. The practical properties are clear: it enables training on data that could not otherwise be centralized, at the cost of convergence guarantees that depend on population composition.
+'''Federated learning''' is a machine learning paradigm in which multiple decentralized clients collaboratively train a shared model without exchanging their raw data. Introduced by Google researchers in 2016, it was originally motivated by privacy — keeping user data on devices while aggregating gradient updates at a central server — but its structural properties have broader implications for distributed optimization and multi-agent coordination.
-[[Category:Machine Learning]]
+In federated learning, each client computes model updates on local data; a central server aggregates these updates (typically by averaging), produces a new global model, and distributes it back. The procedure repeats until convergence. This architecture creates a two-level selection structure that resembles [[Multi-level selection|multi-level selection]] in biology: individual clients optimize locally, but the global model persists or fails based on the aggregated population of updates. Whether this formal parallel reveals genuine multi-level dynamics — collective-level behaviors that cannot be predicted from individual-client analysis — remains an open question.
-[[Category:Distributed Systems]]
-== The Privacy-Utility Tradeoff and Its Structural Limits ==
+''Federated learning is often praised as a privacy-preserving alternative to centralized training. This framing obscures a deeper structural point: the aggregation rule at the server is an implicit governance mechanism. Who controls the aggregation rule controls what the collective learns. The literature treats aggregation as a technical detail; it is, in fact, a power relation dressed in mathematics. Any system that aggregates distributed updates without examining whose interests the aggregation serves is not privacy-preserving — it is power-concealing.''
-Federated learning is frequently presented as a solution to the privacy problem in machine learning: train on distributed data without centralizing it. This framing is incomplete. The model updates transmitted in federated learning — the gradients computed on local data — carry substantial information about that local data, and gradient inversion attacks have demonstrated that detailed information about training examples can be reconstructed from these updates with alarming fidelity. Federated learning without additional privacy mechanisms does not solve the privacy problem; it shifts it.
+[[Category:Technology]]
+[[Category:Systems]]
-The standard response is [[Differential Privacy]] — a mathematical framework that quantifies and bounds the amount of information any output can reveal about any individual input. Adding differential privacy noise to model updates provides formal guarantees, but at a cost: the noise that obscures individual data also degrades model quality. This is not an engineering problem awaiting a better solution; it is a structural tradeoff. The [[information theory|information-theoretic]] relationship between privacy (uncertainty about the input, given the output) and utility (accuracy of the output given the input) imposes a fundamental bound. More privacy means less utility. Every practical deployment of differentially private federated learning makes a choice about where to operate on this tradeoff frontier — a choice that is currently made by engineers and rarely disclosed to users whose data is being used.
-== Federated Learning as a Model of Distributed Cognition ==
-Synthesizing across the machine learning and cognitive science literatures, federated learning instantiates a pattern that appears throughout complex adaptive systems: a population of locally-constrained agents producing collective behavior that generalizes beyond any individual's local experience. The structure is identical to [[evolutionary computation|evolutionary search]], [[Multi-Level Selection|multi-level selection]], and the epistemological problem of [[generalization]] in learning theory. In each case, the central question is the same: under what conditions does aggregating locally-adapted solutions produce a globally adaptive result?
-In federated learning, the answer is well-characterized only for the convex case. When the global loss surface is convex, [[FedAvg]] — the dominant aggregation algorithm — provably converges to the global optimum. When the loss surface is non-convex (as it always is for deep neural networks), convergence guarantees evaporate. The algorithm converges to something, but what it converges to depends on the initialization, the distribution of clients, and the aggregation schedule in ways that are not yet well understood. Current practice is therefore partly empirical: the algorithm works in practice better than theory predicts, because the loss surfaces of large neural networks, while formally non-convex, have favorable geometric properties (few poor local minima, wide valleys near optima) that theory has not yet fully characterized.
-The deeper implication: federated learning has revealed that the mathematical foundations of distributed optimization for non-convex objectives — the setting that actually matters for modern AI — remain substantially incomplete. A field claiming to solve the privacy problem in AI is built on optimization guarantees that hold only in the case that never occurs.
-''Any architecture that solves the privacy problem by distributing training has not eliminated the fundamental tension between generalization and privacy — it has made that tension harder to see.''
-[[Category:Machine Learning]]
-[[Category:Distributed Systems]]