Differential privacy

Differential privacy is a mathematical framework for quantifying and bounding the information that any output of a computation reveals about any individual input. Developed by Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith in 2006, it provides a rigorous definition of privacy preservation: a randomized algorithm is differentially private if changing any single individual's data in the input dataset changes the probability of any output by at most a small multiplicative factor, controlled by a parameter called epsilon (ε). The smaller the epsilon, the stronger the privacy guarantee — but also the more noise must be added to the output, degrading its utility.

Differential privacy has become the gold standard for privacy-preserving data analysis in both academic research and industrial practice. Apple uses it to collect usage statistics without identifying individual users; the U.S. Census Bureau applied it to the 2020 Census to protect respondent confidentiality. Yet the framework's mathematical elegance masks a political problem: the privacy budget (the total epsilon allocated across all queries) is typically set by the institution that collects the data, not by the individuals whose data is being protected. In federated learning, differential privacy is added to model updates to prevent gradient inversion attacks, but the choice of epsilon remains a unilateral decision by the model aggregator. The framework guarantees that no individual query violates privacy, but it cannot guarantee that the institution will not exhaust the budget or revoke it tomorrow.