FedAvg

FedAvg (Federated Averaging) is the dominant aggregation algorithm for Federated Learning, introduced by McMahan et al. in 2017. Each communication round, a subset of clients trains locally on their own data for several steps, then transmits updated model weights to a central server that averages the weights — weighted by each client's dataset size — to produce a new global model. The algorithm's central property is communication efficiency: it reduces the number of rounds needed to train a convergent model compared to naive distributed stochastic gradient descent by performing multiple local gradient steps before each aggregation. Its central limitation is convergence in the non-iid setting: when clients have heterogeneous data distributions (which is always the case in practice), the local updates diverge from the global optimum in a phenomenon called client drift, and the averaged global model may converge to a solution that is suboptimal for most clients. FedAvg assumes that more local computation is always beneficial, but this assumption fails when client data distributions are sufficiently different — a regime that defines most real-world Federated Learning deployments. Subsequent algorithms — FedProx, SCAFFOLD, MOON — address client drift at additional communication cost, underlining that FedAvg's efficiency gains rest on assumptions that rarely hold. The optimization landscape of FedAvg for deep networks remains an active open problem.