Research
Recent DeepMind paper on the ethical and social risks of language models identified large language models leakage of sensitive information about their training data as a potential risk that organizations working on these models have a responsibility to address. Other recent paper shows that similar privacy risks can also arise in standard image classification models: a fingerprint of each individual training image can be found embedded in the model parameters, and malicious parties could exploit these fingerprints to reconstruct the training data from the model .
Privacy-enhancing technologies such as differential privacy (DP) can be deployed at training time to mitigate these risks, but often result in a significant decrease in model performance. In this work, we make substantial progress toward unlocking high-accuracy training of image classification models under differential privacy.
Figure 1: (left) Illustration of training data leakage in GPT-2 [credit: Carlini et al. “Extracting Training Data from Large Language Models”, 2021]. (right) CIFAR-10 training examples reconstructed from a 100K parameter convolutional neural network [credit: Balle et al. “Reconstructing Training Data with Informed Adversaries”, 2022]
Differential privacy was suggested as a mathematical framework for capturing the protection requirement of individual files during statistical data analysis (including training machine learning models). DP algorithms protect individuals from any inferences about the features that make them unique (including full or partial reconstruction) by injecting carefully calibrated noise when computing the desired statistic or model. The use of DP algorithms provides strong and strict privacy guarantees both in theory and in practice, and has become a de-facto gold standard adopted by many public and private Organisations.
The most popular DP algorithm for deep learning is differential private stochastic gradient descent (DP-SGD), a modification of the standard SGD obtained by clipping gradients of individual examples and adding enough noise to mask the contribution of any individual to each model update :
Figure 2: Illustration of how DP-SGD processes the gradients of individual examples and adds noise to produce model updates with privatized gradients.
Unfortunately, previous work has found that in practice, the privacy protection afforded by DP-SGD is often weighed down by significantly less accurate models, which is a significant barrier to the widespread adoption of differential privacy in the machine learning community. According to empirical evidence from previous work, this degradation of utility in DP-SGD becomes more severe in larger neural network models – including those routinely used to achieve the best performance in challenging image classification criteria.
Our work investigates this phenomenon and proposes a series of simple modifications to both the training process and the model architecture, yielding significant improvement in DP training accuracy on standard image classification benchmarks. The most striking observation to emerge from our research is that DP-SGD can be used to effectively train much deeper models than previously thought, as long as we ensure that the model gradients are well-behaved. We believe that the significant leap in performance achieved by our research has the potential to unlock practical applications of image classification models trained with formal privacy guarantees.
The figure below summarizes two of our main results: a ~10% improvement in CIFAR-10 compared to previous work during private training without additional data, and a peak accuracy of 86.7% in ImageNet when privately tuning a model. trained on a different dataset, nearly closing the gap with the best non-private performance.
Figure 3: (left) Our best results in training WideResNet models on CIFAR-10 without additional data. (right) Our best results for improving NFNet models on ImageNet. The best performing model was pre-trained on an internal dataset from ImageNet.
These results are obtained at ε=8, a standard setting for calibrating the strength of protection afforded by differential privacy in machine learning applications. We refer to the paper for a discussion of this parameter, as well as additional experimental results at other values of ε and also on other data sets. Along with the paper, we also open source our implementation to allow other researchers to verify and build on our findings. We hope this contribution will help other stakeholders to make practical AS training a reality.
Download our JAX app on GitHub.