Model Extraction and Active Learning
Abstract
Machine learning models are increasingly being offered as a service by big companies such as Google, Microsoft and Amazon. They use Machine Learning as a Service (MLaaS) to expose these machine learning models to the end-users through cloud-based Application Programming Interface (API). Such APIs allow users to query ML models with data samples in a black-box fashion, returning only the corresponding output predictions. MLaaS models are generally monetized by billing the user for each query made. Prior work has shown that it is possible to extract these models. They developed model extraction attacks that extract an approximation of the MLaaS model by making black-box queries to it. However, none of them satisfy all the four criteria essential for practical model extraction: (i) the ability to extract deep learning models, (ii) non-requirement of domain knowledge, (iii) the ability to work with a limited query budget and (iv) non-requirement of annotations. In collaboration with Pal et al., we propose a novel model extraction attack that makes use of active learning techniques and unannotated public data to satisfy all the aforementioned criteria. However, as we show in the experiments, no one active learning technique is well-suited for different datasets and under different query budget constraints. Given the plethora of active learning techniques at the adversary’s disposal and the black-box nature of the model under attack, the choice of the technique to be used is difficult but integral: the chosen technique is a strong determinant of the quality of the extracted model. In this work, we wish to devise an active learning technique that combines the benefits of existing active learning techniques, as applicable to different budgets and different datasets, yielding on average extracted models that exhibit a high-test agreement with the MLaaS model. In particular, we show that a combination of the DFAL technique of Ducoffe et al. and the Coreset technique of Sener et al. is able to leverage the benefits of both the base techniques, outperforming both DFAL and Coreset in a majority of our experiments. The model extraction attack using this technique achieves, on average, a performance of 4.70× over uniform noise baseline by using only 30% (30,000 data samples) of the unannotated public data. Moreover, the attack using this technique remains undetected by PRADA, a state-of-the-art model extraction detection method.
Collections
Related items
Showing items related by title, author, creator and subject.
-
Efficient Algorithms for Structured Output Learning
Balamurugan, P (2018-05-08)Structured output learning is the machine learning task of building a classifier to predict structured outputs. Structured outputs arise in several contexts in diverse applications like natural language processing, computer ... -
On Learning k-Parities and the Complexity of k-Vector-SUM
Gadekar, Ameet (2018-02-06)In this work, we study two problems: first is one of the central problem in learning theory of learning sparse parities and the other k-Vector-SUM is an extension of the not oriousk-SUM problem. We first consider the problem ... -
Solution Of Delayed Reinforcement Learning Problems Having Continuous Action Spaces
Ravindran, B (2012-05-29)