Model Extraction Defense using Modified Variational Autoencoder
MetadataShow full item record
Machine Learning as a Service (MLaaS) exposes machine learning (ML) models that are trained on confidential datasets to users in the form of an Application Programming Interface (API). Since the MLaaS models are deployed for commer- cial purposes the API is available as a pay-per-query service. A malicious user or attacker can exploit these APIs to extract a close approximation of the MLaaS model by training a substitute model using only black-box query access to the API, in a process called model extraction. The attacker is restricted to extract the MLaaS model using a limited query budget because of the paid service. The model extraction attack is invoked by firing queries that belong to a substitute dataset that consists of either (i) Synthetic Non-Problem Domain (SNPD), (ii) Synthetic Problem Domain (SPD), or (iii) Natural Non-Problem Domain data (NNPD). In this work, we propose a novel defense framework against model extraction, using a hybrid anomaly detector composed of an encoder and a detector. In particu- lar, we propose a modified Variational Autoencoder, VarDefend, which uses a loss function, specially designed, to separate the encodings of queries fired by malicious users from those by benign users. We consider two scenarios: (i) stateful defense where an MLaaS provider stores the queries made by each client for discovering any malicious pattern, (ii) stateless defense where individual queries are discarded if they are flagged as out-of-distribution. Treating encoded queries from benign users as normal, one can use outlier detection models to identify encoded queries from malicious users in the stateless approach. For the stateful approach, a statistical test known as Maximum Mean Discrepancy (MMD) is used to match the distri- bution of the encodings of the malicious queries with those of the in-distribution encoded samples. In our experiments, we observed that our stateful defense mech- anism can completely block one representative attack for each of the three types of substitute datasets, without raising a single false alarm against queries made by a benign user. The number of queries required to block an attack is much smaller than those required by the current state-of-the-art model extraction de- fense PRADA. Further, our proposed approach can block NNPD queries that cannot be blocked by PRADA. Our stateless defense mechanism is useful against a group of colluding attackers without significantly impacting benign users. Our experiments demonstrate that, for MNIST and Fashion MNIST dataset, proposed stateless defense rejects more than 98% of the queries made by an attacker be- longing to either SNPD, SPD or NNPD datasets while rejecting only about 0.05% of all the queries made by a benign user. Our experiments also demonstrate that the proposed approach makes the MLaaS model significantly more robust to ad- versarial examples crafted using the substitute model by blocking transferability.