dc.description.abstract | Machine learning models are increasingly being offered as a service by big companies such as Google, Microsoft and Amazon. They use Machine Learning as a Service (MLaaS) to expose these machine learning models to the end-users through cloud-based Application Programming Interface (API). Such APIs allow users to query ML models with data samples in a black-box fashion, returning only the corresponding output predictions. MLaaS models are generally monetized by billing the user for each query made. Prior work has shown that it is possible to extract these models. They developed model extraction attacks that extract an approximation of the MLaaS model by making black-box queries to it. However, none of them satisfy all the four criteria essential for practical model extraction: (i) the ability to extract deep learning models, (ii) non-requirement of domain knowledge, (iii) the ability to work with a limited query budget and (iv) non-requirement of annotations. In collaboration with Pal et al., we propose a novel model extraction attack that makes use of active learning techniques and unannotated public data to satisfy all the aforementioned criteria. However, as we show in the experiments, no one active learning technique is well-suited for different datasets and under different query budget constraints. Given the plethora of active learning techniques at the adversary’s disposal and the black-box nature of the model under attack, the choice of the technique to be used is difficult but integral: the chosen technique is a strong determinant of the quality of the extracted model. In this work, we wish to devise an active learning technique that combines the benefits of existing active learning techniques, as applicable to different budgets and different datasets, yielding on average extracted models that exhibit a high-test agreement with the MLaaS model. In particular, we show that a combination of the DFAL technique of Ducoffe et al. and the Coreset technique of Sener et al. is able to leverage the benefits of both the base techniques, outperforming both DFAL and Coreset in a majority of our experiments. The model extraction attack using this technique achieves, on average, a performance of 4.70× over uniform noise baseline by using only 30% (30,000 data samples) of the unannotated public data. Moreover, the attack using this technique remains undetected by PRADA, a state-of-the-art model extraction detection method. | en_US |