Applied Federated Learning: Improving Google Keyboard Query Suggestions
Update on concrete use of federated learning at Google; no secure computation nor differential privacy but including thoughts on dealing with unseen training data.
Differentially Private User-based Collaborative Filtering Recommendation Based on K-means Clustering
Privacy Partitioning: Protecting User Data During the Deep Learning Inference Phase
Optimising for privacy loss at early layers suggests pragmatic approach for protecting privacy of prediction inputs without cryptography nor DP.
A Review of Homomorphic Encryption Libraries for Secure Computation
NeurIPS workshop on Privacy Preserving Machine Learning happened this week with a very interesting selection of papers.
Intel’s HE Transformer for nGraph released as open source!
nGraph-HE: A Graph Compiler for Deep Learning on Homomorphically Encrypted Data
“One of the biggest accelerators in deep learning has been frameworks that allow users to describe networks and operations at a high level while hinding details … A key challenge for building large-scale privacy-preserving ML systems using HE has been the lack of such a framework; as a result data scientists face the formidable task of becoming experts in deep learning, cryptography, and software engineering”. Amen!
CHET: Compiler and Runtime for Homomorphic Evaluation of Tensor Programs
“In many respects, programming FHE applications today is akin to low-level assembly … Our central hypothesis is that future applications will benefit from a compiler and runtime that targets a compact and well-reasoned interface”. Amen! Also describes several ways on which the compiler can optimize encrypted computations.
Faster CryptoNets: Leveraging Sparsity for Real-World Encrypted Inference
Solid work on using weights quantization and other ML techniques to adapt neural networks for the encrypted setting, significantly improving performance relative to CryptoNets. Interestingly, second degree approximations of the Swish activation function are used over ReLUs and squaring. Gives plenty of references for those not coming from a ML background.
Privacy-Preserving Collaborative Preduction using Random Forests
Train models locally on independent data sets and apply ensemble techniques to serve private predictions using these.
FALCON: A Fourier Transform Based Approach for Fast and Secure Convolutional Neural Network Predictions
Private predictions via FHE and GC. Interestingly, values are first convert to the frequency domain using the FFT and there’s a protocol for softmax.
A Fully Private Pipeline for Deep Learning on Electronic Health Records
Distributed and Secure ML with Self-tallying Multi-party Aggregation
Slalom: Fast, Verifiable and Private Execution of Neural Networks in Trusted Hardware
TAPAS: Tricks to Accelerate (encrypted) Prediction As a Service
DeepObfuscation: Securing the Structure of Convolutional Neural Networks via Knowledge Distillation
Logistic Regression over Encrypted Data from Fully Homomorphic Encryption
From Keys to Databases – Real-World Applications of Secure Multi-Party Computation
Jana: private SQL databases
Sharemind: secure analytics
Partisia, Sepior: auctions and key management
Unbound Technology: enterprise secrets
Minimising Communication in Honest-Majority MPC by Batchwise Multiplication Verification
SPDZ2k: Efficient MPC mod 2^k for Dishonest Majority
To improve efficiency of MPC it is interesting to perform operations over rings that fit closely with native CPU instructions, as opposed to over e.g. a prime field. Doing so is straight forward when the attacker is honest-but-curious, and this paper addresses the case when he is fully malicious.
GDPR has come into effect!
Slides from UCL course on privacy enhancing technologies available. Via @emilianoucl.
Keystone: An Open-source Secure Hardware Enclave. Via @Daeinar.
The reference SPDZ implementation is being prepared for production. Via @SmartCryptology.
Next week’s TPMPC workshop will be live-streamed if you happen to be elsewhere than Aarhus! Via @claudiorlandi.
Small but good: we only dug up one paper this week but it comes with very interesting claims.
If anyone had any doubt that private machine learning is a growing area then this week might take care of that.
Secure multiparty computation:
ABY3: A Mixed Protocol Framework for Machine Learning
One of big guys in secure computation for ML is back with new protocols in the 3-server setting for training linear regression, logistic regression, and neural network models. Impressive performance improvements for both training and prediction.
EPIC: Efficient Private Image Classification (or: Learning from the Masters)
An update to work from last year on efficient private image classification using SPDZ and support vector machines. Includes great overview of recent related work.
Homomorphic encryption:
Unsupervised Machine Learning on Encrypted Data
Implements K-means privately using fully homomorphic encryption and a bit-wise rational encoding, with suggestions for tweaking K-means to make it more practical for this setting. The TFHE library (see below) is used for experiments.
TFHE: Fast Fully Homomorphic Encryption over the Torus
Proclaimed as the fastest FHE library currently available, this paper is the extended version of previous descriptions of the underlying scheme and optimizations.
Homomorphic Secret Sharing: Optimizations and Applications
Further work on a hybrid scheme between homomorphic encryption and secret sharing: operations can be performed locally by each share holder as in the former, yet a final combination is needed in the end to recover the result as in the latter: “this enables a level of compactness and efficiency of reconstruction that is impossible to achieve via standard FHE”.
Secure enclaves:
SecureCloud: Secure Big Data Processing in Untrusted Clouds
An joint European research project to develop a platform for pusing critical applications to untrusted cloud environments, using secure enclaves and supporting big data. Envisioned use cases from finance, health care, and smart grids.
SecureStreams: A Reactive Middleware Framework for Secure Data Stream Processing
Presents concrete work done in the above SecureCloud project, namely a high-level Lua-based framework for privately processing streams at scale using dataflow programming and secure enclaves.
Differential privacy:
Privately Learning High-Dimensional Distributions
Tackles the problem that privacy “comes almost for free when data is low-dimensional but comes at a steep price when data is high-dimensional” as measured in amount of samples needed. Two mechanisms are presented for learning respectively a multivariate Gaussian and a product distribution.
SynTF: Synthetic and Differentially Private Term Frequency Vectors for Privacy-Preserving Text Mining
A differentially private mechanism is used to prevent author re-identification in texts used for training models where anomymized feature vectors can be used instead of the actual body text. Concrete experiments include topic classification of newsgroups postings.
Distributed Differentially-Private Algorithms for Matrix and Tensor Factorization
Correlated noise is used to privately perform two common operations via a centralized but curious party or directly between data holders, respectively. Interestingly, the correlated noise is not uniform as in typical secure aggregation settings.
Towards Dependable Deep Convolutional Neural Networks (CNNs) with Out-distribution Learning
“in this paper we propose to add an additional dustbin class containing natural out-distribution samples”
“We show that such an augmented CNN has a lower error rate in the presence of adversarial examples because it either correctly classifies adversarial samples or rejects them to a dustbin class.”
Weak labeling for crowd learning
“weak labeling for crowd learning is proposed, where the annotators may provide more than a single label per instance to try not to miss the real label”
Decentralized learning with budgeted network load using Gaussian copulas and classifier ensembles
“In this article, we place ourselves in a context where the amount of transferred data must be anticipated but a limited portion of the local training sets can be shared. We also suppose a minimalist topology where each node can only send information unidirectionally to a single central node which will aggregate models trained by the nodes”
“Using shared data on the central node, we then train a probabilistic model to aggregate the base classifiers in a second stage.”
Securing Distributed Machine Learning in High Dimensions
Some results towards the issue of input pollution in federated learning, where a fraction of gradient providers may give arbitrarily malicious inputs to an aggregation protocol. “The core of our method is a robust gradient aggregator based on the iterative filtering algorithm for robust mean estimation”.
Sharemind, one of the biggest and earliest players pushing MPC to industry, has launched a new privacy service based on secure computation using secure enclaves with the promise that it can handle big data. Via @positium.
Interesting interview with Lea Kissner, the head of Google’s privacy team NightWatch. Few details are given but “She recently tried to obscure some data using cryptography, so that none of it would be visible to Google upon upload … but it turned out that [it] would require more spare computing power than Google has” sounds like techniques that could be related to MPC or HE. Via @rosa.
Google had two AI presentations at this year’s RSA conference, one on fraud detection and one on adversarial techniques. Via @goodfellow_ian.
Privacy-Preserving Multibiometric Authentication in Cloud with Untrusted Database Providers
Relevant application of secure computation to authentication using sensitive data. Relative black box use of existing protocols yet experimental performance <1sec.
Private Anonymous Data Access
Interesting mix of private information retrieval and oblivious RAM: “We consider a scenario where a server holds a huge database that it wants to make accessible to a large group of clients while maintaining privacy and anonymity … with the goal of getting the best of both worlds: allow many clients to privately and anonymously access the database as in PIR, while having an efficient server as in ORAM”.
Adversarial Attacks Against Medical Deep Learning Systems
A discussion around some of the concrete consequences the medical profession may face from adversarial examples in machine learning systems with a warning of “caution in employing deep learning systems in clinical settings”.
Private Nearest Neighbors Classification in Federated Databases
Great read on custom MPC protocols allowing k-NN classification of a sample (such as document classification with cosine similarity) using a distributed data set, without leaking neither sample nor data set. This includes feature extraction, similarity computation, and top-k selection.
Chiron: Privacy-preserving Machine Learning as a Service
Interesting look at protecting both privacy of training data and model specifics via secure enclaves. The technology is promising despite having experienced a few issues recently and e.g. avoids use of heavy cryptography.
Locally Private Bayesian Inference for Count Models
When applying differential privacy one may either ignore the fact that noise has been added to the data or try to take it into account; the latter is done here with good illustrations of the improvements this can give.
Hiding in the Crowd: A Massively Distributed Algorithm for Private Averaging with Malicious Adversaries
Interesting peer-to-peer protocol for privately computing the exact average of a distributed data set via gossiping directly between the peers. No heavy cryptography is used in case of honest peers, with a PHE-based extension for detecting malicious cheating.
Cloud-based MPC with Encrypted Data
Gives two schemes for private Model Predictive Control by a central authority (who might have a better understanding of the environment than individual sensors), one based on PHE and another on MPC.
Model-Agnostic Private Learning via Stability
More work on ensuring privacy of training data via differential private query mechanisms. Compared to paper from a few weeks ago, this one focuses on “algorithms that are agnostic to the underlying learning problem [with] formal utility guarantees [and] provable accuracy guarantees”.
Homomorphic Encryption for Speaker Recognition: Protection of Biometric Templates and Vendor Model Parameters
The Paillier cryptosystem is used to securely evaluate simplified similarity functions so users don’t leak biometric information during authentication. Performance numbers included.
Efficient Determination of Equivalence for Encrypted Data
Reminder that even a simpler task such as privately linking identities and records together is relevant in industry.
The 2018 Gödel Prize is awarded to Oded Regev for his paper On lattices, learning with errors, random linear codes, and cryptography. This had a huge influence on later work in cryptography, not least homomorphic encryption. Via @hoonoseme.
OpenMined is now maintaining a list of papers and tools around private machine learning: https://github.com/OpenMined/awesome-ai-privacy! Via @iamtrask.
Lab41 has released a Python wrapper around Microsoft’s SEAL homomorphic encryption library: https://github.com/Lab41/PySEAL. Via @mortendahl.
The list of accepted contributed talks for this year’s Theory and Practice of MPC workshop has been announced. This is the definitive annual event dedicated to secure multi-party computation. Via @claudiorlandi.
Generating Differentially Private Datasets Using GANs
Interesting idea of using GANs to produce artificial differential privacy-preserving datasets from sensitive data that are safe to release for further training purposes. This is done on the client side, meaning there’s no need for a trusted aggregator.
Faster Homomorphic Linear Transformations in HElib
The mesters are at it again, giving algorithmic improvements to perhaps the most well-known homomorphic encryption library and thereby making it 30-75 times faster.
Logistic Regression Model Training based on the Approximate Homomorphic Encryption
Private fitting of several logisictic regression models on smaller genomic data sets using the HEAAN homomorphic encryption scheme. Approach is somewhat typical gradient descent and sigmoid polynomial approximation but with significant concrete performance improvements over other work using HEAAN.
Scalable Private Learning with PATE
Follow-up work to the celebrated Student-Teacher way of ensuring privacy of training data via differential privacy, now with better privacy bounds and hence less added noise. This is partially achieved by switching to Gaussian noise and more advanced (trusted) aggregation mechanisms.
Privacy-Preserving Logistic Regression Training
Fitting a logistic model from homomorphically encrypted data using the Newton-Raphson iterative method, but with a fixed and approximated Hessian matrix. Performance is evaluated on the iDASH cancer detection scenario.
Privacy-Preserving Boosting with Random Linear Classifiers for Learning from User-Generated Data
Presents the SecureBoost framework for mixing boosting algorithms with secure computation. The former uses randomly generated linear classifiers at the base and the latter comes in three variants: RLWE+GC, Paillier+GC, and SecretSharing+GC. Performance experiments on both the model itself and on the secure versions are provided.
Machine learning and genomics: precision medicine vs. patient privacy
Non-technical paper illustrating that secure computation techniques are finding their way into otherwise unrelated research areas, and hitting home-run with “data access restrictions are a burden for researchers, particularly junior researchers or small labs that do not have the clout to set up collaborations with major data curators”.
The Secret Sharer: Measuring Unintended Neural Network Memorization & Extracting Secrets
Concrete study of what a model can leak about sensitive information in the traning data. Perhaps not surprisingly, “only by developing and training a differential private model are we able to … protect against the extraction of secrets”.
Doing Real Work with FHE: The Case of Logistic Regression
The heavyweights of homomorphic encryption apply HElib to logistic regression with a focus on implementing “optimized versions of many bread and butter FHE tools. These tools include binary arithmetic, comparisons, partial sorting, and low-precision approximation of complicated functions such as reciprocals and logarithms”.
On the Connection between Differential Privacy and Adversarial Robustness in Machine Learning …
Reading in the Dark: Classifying Encrypted Digits with Functional Encryption
Develops a functional encryption scheme for “efficient computation of quadratic polynomials on encrypted vectors” and applies this to private MNIST prediction (i.e. using a model trained on unencrypted data) via suitable quadractic models.