ppml-news.github.io

News in Privacy-Preserving Machine Learning

February 2020

Papers

Private Summation in the Multi-Message Shuffle Model

January 2020

Papers

Approximating Activation Functions

February 2019

Papers

Secure Evaluation of Quantized Neural Networks
TensorSCONE: A Secure TensorFlow Framework using Intel SGX
Achieving GWAS with Homomorphic Encryption
CodedPrivateML: A Fast and Privacy-Preserving Framework for Distributed Machine Learning
Interesting solution for offloading/out-sourcing model training to set of workers while ensuring strong privacy guarantees; based on Lagrange coded computations.
Towards Federated Learning at Scale: System Design

Bonus

A Marauder’s Map of Security and Privacy in Machine Learning, a lecture on security and privacy. By Nicolas Papernot.
A Simple Explanation for the Existence of Adversarial Examples with Small Hamming Distance
Some of the greatest minds from cryptography join in on adversarial examples: “We develop a simple mathematical framework which enables us to think about this baffling phenomenon [and] explain why we should expect to find targeted adversarial examples in arbitrarily deep neural networks.”

January 2019

Papers

Privacy-preserving semi-parallel logistic regression training with Fully Homomorphic Encryption
CaRENets: Compact and Resource-Efficient CNN for Homomorphic Inference on Encrypted Medical Images
Secure predictions using FHE with careful packing.
Differentially Private Markov Chain Monte Carlo
Improved Accounting for Differentially Private Learning
Secure Computation for Machine Learning With SPDZ
Looks at regression tasks using the general-purpose reference implementation and with active security.
Secure Two-Party Feature Selection
Privacy-preserving chi-squared test for binary feature selection from Paillier encryption.
Contamination Attacks and Mitigation in Multi-Party Machine Learning
Making models more robust to tainted training data by minimizing the ability to predict the providing parties.

News

Videos from Hacking Deep Learning 2 online, including talks on adversarily attacks and privacy. Via @BIUCrypto.
Videos from CCS’18 online, including presentation of ABY3. Via @lzcarl.
Simons Institute program on Data Privacy: Foundations and Applications kicked off this week with several workshops around differential privacy.
Program for SP’19 is out with four accepted papers on differential privacy. Via @IEEESSP.

Bonus

Deep Learning to Evaluate Secure RSA Implementations
Turbospeedz: Double Your Online SPDZ! Improving SPDZ using Function Dependent Preprocessing
Excellent summary of what happened last year in the world of privacy-preserving machine learning by Dropout Labs.
Real World Crypto happened this week, with (temporary?) recordings available on YouTube. Especially the talk on Deploying MPC for Social Good has received significant attention, while the talk on Foreshadow attack on Intel SGX furthermore reminded us that enclaves are not perfect yet.

31 December 2018

Papers

News

Google AI team releases new TensorFlow Privacy library for training machine learning models with differential privacy for training data. Via @NicolasPapernot.

14 December 2018

30 November 2018

Papers

nGraph-HE: A Graph Compiler for Deep Learning on Homomorphically Encrypted Data
“One of the biggest accelerators in deep learning has been frameworks that allow users to describe networks and operations at a high level while hinding details … A key challenge for building large-scale privacy-preserving ML systems using HE has been the lack of such a framework; as a result data scientists face the formidable task of becoming experts in deep learning, cryptography, and software engineering”. Amen!
CHET: Compiler and Runtime for Homomorphic Evaluation of Tensor Programs
“In many respects, programming FHE applications today is akin to low-level assembly … Our central hypothesis is that future applications will benefit from a compiler and runtime that targets a compact and well-reasoned interface”. Amen! Also describes several ways on which the compiler can optimize encrypted computations.
Faster CryptoNets: Leveraging Sparsity for Real-World Encrypted Inference
Solid work on using weights quantization and other ML techniques to adapt neural networks for the encrypted setting, significantly improving performance relative to CryptoNets. Interestingly, second degree approximations of the Swish activation function are used over ReLUs and squaring. Gives plenty of references for those not coming from a ML background.
Privacy-Preserving Collaborative Preduction using Random Forests
Train models locally on independent data sets and apply ensemble techniques to serve private predictions using these.
FALCON: A Fourier Transform Based Approach for Fast and Secure Convolutional Neural Network Predictions
Private predictions via FHE and GC. Interestingly, values are first convert to the frequency domain using the FFT and there’s a protocol for softmax.
The AlexNet Moment for Homomorphic Encryption: HCNN, the First Homomorphic CNN on Encrypted Data with GPUs
A Fully Private Pipeline for Deep Learning on Electronic Health Records
Distributed and Secure ML with Self-tallying Multi-party Aggregation

News

List of accepted papers for NeurIPS’18 privacy workshop is out! Via @mortendahlcs.

31 October 2018

Papers

Privado: Practical and Secure DNN Inference

28 September 2018

Papers

Encrypted Databases for Differential Privacy

27 July 2018

Papers

27 June 2018

25 May 2018

Papers

Logistic Regression over Encrypted Data from Fully Homomorphic Encryption
From Keys to Databases – Real-World Applications of Secure Multi-Party Computation
Jana: private SQL databases Sharemind: secure analytics Partisia, Sepior: auctions and key management Unbound Technology: enterprise secrets
Minimising Communication in Honest-Majority MPC by Batchwise Multiplication Verification
SPDZ2k: Efficient MPC mod 2^k for Dishonest Majority
To improve efficiency of MPC it is interesting to perform operations over rings that fit closely with native CPU instructions, as opposed to over e.g. a prime field. Doing so is straight forward when the attacker is honest-but-curious, and this paper addresses the case when he is fully malicious.

News

GDPR has come into effect!
Slides from UCL course on privacy enhancing technologies available. Via @emilianoucl.
Keystone: An Open-source Secure Hardware Enclave. Via @Daeinar.
The reference SPDZ implementation is being prepared for production. Via @SmartCryptology.
Next week’s TPMPC workshop will be live-streamed if you happen to be elsewhere than Aarhus! Via @claudiorlandi.

Bonus

Cautious Deep Learning

18 May 2018

Small but good: we only dug up one paper this week but it comes with very interesting claims.

Papers

SecureNN: Efficient and Private Neural Network Training
Following recent approachs but reporting significant performance improvements via specialized protocols for the 3 and 4-server setting: the claimed cost of encrypted training is in some cases only 13-33 times that of training on cleartext data. Big factor in this is the avoidance of bit-decomposition and garbled circuits when computing comparisons and ReLUs.

11 May 2018

If anyone had any doubt that private machine learning is a growing area then this week might take care of that.

Papers

Secure multiparty computation:

ABY3: A Mixed Protocol Framework for Machine Learning
One of big guys in secure computation for ML is back with new protocols in the 3-server setting for training linear regression, logistic regression, and neural network models. Impressive performance improvements for both training and prediction.
EPIC: Efficient Private Image Classification (or: Learning from the Masters)
An update to work from last year on efficient private image classification using SPDZ and support vector machines. Includes great overview of recent related work.

Homomorphic encryption:

Unsupervised Machine Learning on Encrypted Data
Implements K-means privately using fully homomorphic encryption and a bit-wise rational encoding, with suggestions for tweaking K-means to make it more practical for this setting. The TFHE library (see below) is used for experiments.
TFHE: Fast Fully Homomorphic Encryption over the Torus
Proclaimed as the fastest FHE library currently available, this paper is the extended version of previous descriptions of the underlying scheme and optimizations.
Homomorphic Secret Sharing: Optimizations and Applications
Further work on a hybrid scheme between homomorphic encryption and secret sharing: operations can be performed locally by each share holder as in the former, yet a final combination is needed in the end to recover the result as in the latter: “this enables a level of compactness and efficiency of reconstruction that is impossible to achieve via standard FHE”.

Secure enclaves:

SecureCloud: Secure Big Data Processing in Untrusted Clouds
An joint European research project to develop a platform for pusing critical applications to untrusted cloud environments, using secure enclaves and supporting big data. Envisioned use cases from finance, health care, and smart grids.
SecureStreams: A Reactive Middleware Framework for Secure Data Stream Processing
Presents concrete work done in the above SecureCloud project, namely a high-level Lua-based framework for privately processing streams at scale using dataflow programming and secure enclaves.

Differential privacy:

Privately Learning High-Dimensional Distributions
Tackles the problem that privacy “comes almost for free when data is low-dimensional but comes at a steep price when data is high-dimensional” as measured in amount of samples needed. Two mechanisms are presented for learning respectively a multivariate Gaussian and a product distribution.
SynTF: Synthetic and Differentially Private Term Frequency Vectors for Privacy-Preserving Text Mining
A differentially private mechanism is used to prevent author re-identification in texts used for training models where anomymized feature vectors can be used instead of the actual body text. Concrete experiments include topic classification of newsgroups postings.
Distributed Differentially-Private Algorithms for Matrix and Tensor Factorization
Correlated noise is used to privately perform two common operations via a centralized but curious party or directly between data holders, respectively. Interestingly, the correlated noise is not uniform as in typical secure aggregation settings.

Bonus

An Empirical Analysis of Anonymity in Zcash A little reminder that anonymity is hard.

27 April 2018

Papers

Towards Dependable Deep Convolutional Neural Networks (CNNs) with Out-distribution Learning
“in this paper we propose to add an additional dustbin class containing natural out-distribution samples” “We show that such an augmented CNN has a lower error rate in the presence of adversarial examples because it either correctly classifies adversarial samples or rejects them to a dustbin class.”
Weak labeling for crowd learning
“weak labeling for crowd learning is proposed, where the annotators may provide more than a single label per instance to try not to miss the real label”
Decentralized learning with budgeted network load using Gaussian copulas and classifier ensembles
“In this article, we place ourselves in a context where the amount of transferred data must be anticipated but a limited portion of the local training sets can be shared. We also suppose a minimalist topology where each node can only send information unidirectionally to a single central node which will aggregate models trained by the nodes” “Using shared data on the central node, we then train a probabilistic model to aggregate the base classifiers in a second stage.”
Securing Distributed Machine Learning in High Dimensions
Some results towards the issue of input pollution in federated learning, where a fraction of gradient providers may give arbitrarily malicious inputs to an aggregation protocol. “The core of our method is a robust gradient aggregator based on the iterative filtering algorithm for robust mean estimation”.

20 April 2018

Papers

Nothing Refreshes Like a RePSI: Reactive Private Set Intersection
PSI was several applications in private data processing, including object linking in advertising and data augmentation. This paper takes a step towards mitigating exhaustive attacks where a party learns too much by simply asking for many intersections.

News

Sharemind, one of the biggest and earliest players pushing MPC to industry, has launched a new privacy service based on secure computation using secure enclaves with the promise that it can handle big data. Via @positium.
Interesting interview with Lea Kissner, the head of Google’s privacy team NightWatch. Few details are given but “She recently tried to obscure some data using cryptography, so that none of it would be visible to Google upon upload … but it turned out that [it] would require more spare computing power than Google has” sounds like techniques that could be related to MPC or HE. Via @rosa.
Google had two AI presentations at this year’s RSA conference, one on fraud detection and one on adversarial techniques. Via @goodfellow_ian.

Bonus

Privacy-Preserving Multibiometric Authentication in Cloud with Untrusted Database Providers
Relevant application of secure computation to authentication using sensitive data. Relative black box use of existing protocols yet experimental performance <1sec.
Private Anonymous Data Access
Interesting mix of private information retrieval and oblivious RAM: “We consider a scenario where a server holds a huge database that it wants to make accessible to a large group of clients while maintaining privacy and anonymity … with the goal of getting the best of both worlds: allow many clients to privately and anonymously access the database as in PIR, while having an efficient server as in ORAM”.
Adversarial Attacks Against Medical Deep Learning Systems
A discussion around some of the concrete consequences the medical profession may face from adversarial examples in machine learning systems with a warning of “caution in employing deep learning systems in clinical settings”.

13 April 2018

Papers

Differentially Private Confidence Intervals for Empirical Risk Minimization
Addresses the question of computing confidence intervals in a private manner, using either DP or concentrated DP. Gives concrete examples and experiments using logistic regression and SVM.

News

Facebook host privacy summit but seem a bit sparse on details. Via @sweis.

Bonus

PowerHammer: Exfiltrating Data from Air-Gapped Computers through Power Lines
More work on leaking data from air-gapped computers through obscure side-channels, this time through power lines by varying the CPU utilization, achieving bit rates of 10-1000 bit/sec for different attacks.

30 March 2018

Papers

Private Nearest Neighbors Classification in Federated Databases
Great read on custom MPC protocols allowing k-NN classification of a sample (such as document classification with cosine similarity) using a distributed data set, without leaking neither sample nor data set. This includes feature extraction, similarity computation, and top-k selection.
Chiron: Privacy-preserving Machine Learning as a Service
Interesting look at protecting both privacy of training data and model specifics via secure enclaves. The technology is promising despite having experienced a few issues recently and e.g. avoids use of heavy cryptography.
Locally Private Bayesian Inference for Count Models
When applying differential privacy one may either ignore the fact that noise has been added to the data or try to take it into account; the latter is done here with good illustrations of the improvements this can give.
Hiding in the Crowd: A Massively Distributed Algorithm for Private Averaging with Malicious Adversaries
Interesting peer-to-peer protocol for privately computing the exact average of a distributed data set via gossiping directly between the peers. No heavy cryptography is used in case of honest peers, with a PHE-based extension for detecting malicious cheating.
Comparing Population Means under Local Differential Privacy
Cloud-based MPC with Encrypted Data
Gives two schemes for private Model Predictive Control by a central authority (who might have a better understanding of the environment than individual sensors), one based on PHE and another on MPC.

16 March 2018

Papers

Model-Agnostic Private Learning via Stability
More work on ensuring privacy of training data via differential private query mechanisms. Compared to paper from a few weeks ago, this one focuses on “algorithms that are agnostic to the underlying learning problem [with] formal utility guarantees [and] provable accuracy guarantees”.
Homomorphic Encryption for Speaker Recognition: Protection of Biometric Templates and Vendor Model Parameters
The Paillier cryptosystem is used to securely evaluate simplified similarity functions so users don’t leak biometric information during authentication. Performance numbers included.
Efficient Determination of Equivalence for Encrypted Data
Reminder that even a simpler task such as privately linking identities and records together is relevant in industry.

Bonus

The Morning Paper: When coding style survives compilation Anonymity is hard! Random forests can be trained to identify your coding style from source code as well as compiled programs.

9 March 2018

News

The 2018 Gödel Prize is awarded to Oded Regev for his paper On lattices, learning with errors, random linear codes, and cryptography. This had a huge influence on later work in cryptography, not least homomorphic encryption. Via @hoonoseme.
OpenMined is now maintaining a list of papers and tools around private machine learning: https://github.com/OpenMined/awesome-ai-privacy! Via @iamtrask.
Lab41 has released a Python wrapper around Microsoft’s SEAL homomorphic encryption library: https://github.com/Lab41/PySEAL. Via @mortendahl.
The list of accepted contributed talks for this year’s Theory and Practice of MPC workshop has been announced. This is the definitive annual event dedicated to secure multi-party computation. Via @claudiorlandi.

Papers

Generating Differentially Private Datasets Using GANs
Interesting idea of using GANs to produce artificial differential privacy-preserving datasets from sensitive data that are safe to release for further training purposes. This is done on the client side, meaning there’s no need for a trusted aggregator.
Faster Homomorphic Linear Transformations in HElib
The mesters are at it again, giving algorithmic improvements to perhaps the most well-known homomorphic encryption library and thereby making it 30-75 times faster.
Logistic Regression Model Training based on the Approximate Homomorphic Encryption
Private fitting of several logisictic regression models on smaller genomic data sets using the HEAAN homomorphic encryption scheme. Approach is somewhat typical gradient descent and sigmoid polynomial approximation but with significant concrete performance improvements over other work using HEAAN.

Blogs

The Building Blocks of Interpretability Nothing to do with private machine learning, yet this is so neat that it warrents a mention. Go play!

2 March 2018

News

@mvaria’s talk about a real-world application of MPC at this year’s ENIGMA conference is online and well worth a watch: https://www.youtube.com/watch?v=d9rMokeYx9I. Via @lcyqn.

Papers

Scalable Private Learning with PATE
Follow-up work to the celebrated Student-Teacher way of ensuring privacy of training data via differential privacy, now with better privacy bounds and hence less added noise. This is partially achieved by switching to Gaussian noise and more advanced (trusted) aggregation mechanisms.
Privacy-Preserving Logistic Regression Training
Fitting a logistic model from homomorphically encrypted data using the Newton-Raphson iterative method, but with a fixed and approximated Hessian matrix. Performance is evaluated on the iDASH cancer detection scenario.
Privacy-Preserving Boosting with Random Linear Classifiers for Learning from User-Generated Data
Presents the SecureBoost framework for mixing boosting algorithms with secure computation. The former uses randomly generated linear classifiers at the base and the latter comes in three variants: RLWE+GC, Paillier+GC, and SecretSharing+GC. Performance experiments on both the model itself and on the secure versions are provided.
Machine learning and genomics: precision medicine vs. patient privacy
Non-technical paper illustrating that secure computation techniques are finding their way into otherwise unrelated research areas, and hitting home-run with “data access restrictions are a burden for researchers, particularly junior researchers or small labs that do not have the clout to set up collaborations with major data curators”.

Blogs

Uber’s differential privacy .. probably isn’t @frankmcsherry looks at Uber’s SQL differential privacy project and shares experience gained from implementing these things in Microsoft’s PINQ.

23 February 2018

Papers

The Secret Sharer: Measuring Unintended Neural Network Memorization & Extracting Secrets
Concrete study of what a model can leak about sensitive information in the traning data. Perhaps not surprisingly, “only by developing and training a differential private model are we able to … protect against the extraction of secrets”.
Doing Real Work with FHE: The Case of Logistic Regression
The heavyweights of homomorphic encryption apply HElib to logistic regression with a focus on implementing “optimized versions of many bread and butter FHE tools. These tools include binary arithmetic, comparisons, partial sorting, and low-precision approximation of complicated functions such as reciprocals and logarithms”.
On the Connection between Differential Privacy and Adversarial Robustness in Machine Learning …
Reading in the Dark: Classifying Encrypted Digits with Functional Encryption
Develops a functional encryption scheme for “efficient computation of quadratic polynomials on encrypted vectors” and applies this to private MNIST prediction (i.e. using a model trained on unencrypted data) via suitable quadractic models.