Machine learning has increased considerably in several areas due to its performance in recent years. Thanks to modern computers’ computing capacity and graphics cards, deep learning has made it possible to achieve results that sometimes exceed those experts give. However, its use in sensitive areas such as medicine or finance causes confidentiality issues. A formal privacy guarantee called differential privacy (DP) prohibits adversaries with access to machine learning models from obtaining data on specific training points. The most common training approach for differential privacy in image recognition is differential private stochastic gradient descent (DPSGD). However, the deployment of differential privacy is limited by the performance deterioration caused by current DPSGD systems.
The existing methods for differentially private deep learning still need to operate better since that, in the stochastic gradient descent process, these techniques allow all model updates regardless of whether the corresponding objective function values get better. In some model updates, adding noise to the gradients might worsen the objective function values, especially when convergence is imminent. The resulting models get worse as a result of these effects. The optimization target degrades, and the privacy budget is wasted. To address this problem, a research team from Shanghai University in China suggests a simulated annealing-based differentially private stochastic gradient descent (SA-DPSGD) approach that accepts a candidate update with a probability that depends on the quality of the update and the number of iterations.
Concretely, the model update is accepted if it gives a better objective function value. Otherwise, the update is rejected with a certain probability. To prevent settling into a local optimum, the authors suggest using probabilistic rejections rather than deterministic ones and limiting the number of continuous rejections. Therefore, the simulated annealing algorithm is used to select model updates with probability during the stochastic gradient descent process.
The following gives a high-level explanation of the proposed approach.
1- DPSGD generates the updates iteratively, and the objective function value is computed following that. The energy shift from the previous iteration to the current one and the overall number of approved solutions are then used to calculate the acceptance probability of the current solution.
2- The acceptance probability is always kept to 1, when the energy change is negative. That means updates that step in the right direction are accepted. It is nevertheless guaranteed that the training moves mostly in the direction of convergence even while the model updates are noisy, meaning that the actual energy may be positive with a very small probability.
3- When the energy change is positive, the acceptance probability falls exponentially as the number of approved solutions rises. In this situation, accepting a solution would make the energy worse. Deterministic rejections, however, can lead to the ultimate solution falling inside a local optimum. Therefore, the authors proposed to accept updates of positive energy changes with a small, decreasing probability.
4- If there have been too many consecutive rejections, an update will still be allowed since the number of continuous rejections is limited. The acceptance probability may drop so low that it almost rejects all solutions with positive energy changes as the training approaches convergence, and it may even reach a local maximum. Limiting the number of rejections prevents this issue by accepting a solution when it is essential.
To evaluate the performance of the proposed method, SA-DPSGD is evaluated on three datasets: MNIST, FashionMNIST, and CIFAR10. Experiments demonstrated that SA-DPSGD significantly outperforms the state-of-the-art schemes, DPSGD, DPSGD(tanh), and DPSGD(AUTO-S), regarding privacy cost or test accuracy.
According to the authors, SA-DPSGD significantly bridges the classification accuracy gap between private and non-private images. Using the random update screening, the differentially private gradient descent proceeds in the right direction in each iteration, making the obtained result more accurate. In the experiments under the same hyperparameters, SA-DPSGD achieves high accuracies on datasets MNIST, FashionMNIST, and CI-FAR10, compared to the state-of-the-art result. Under the freely adjusted hyperparameters, the proposed approach achieves even higher accuracies.
Check out the Paper. All Credit For This Research Goes To Researchers on This Project. Also, don’t forget to join our Reddit page and discord channel, where we share the latest AI research news, cool AI projects, and more.
Mahmoud is a PhD researcher in machine learning. He also holds a
bachelor’s degree in physical science and a master’s degree in
telecommunications and networking systems. His current areas of
research concern computer vision, stock market prediction and deep
learning. He produced several scientific articles about person re-
identification and the study of the robustness and stability of deep