Learning Compact Architectures for Deep Neural Networks

Srinivas, Suraj

dc.contributor.advisor	Venkatesh Babu, R
dc.contributor.author	Srinivas, Suraj
dc.date.accessioned	2018-05-22T15:04:55Z
dc.date.accessioned	2018-07-31T06:40:20Z
dc.date.available	2018-05-22T15:04:55Z
dc.date.available	2018-07-31T06:40:20Z
dc.date.issued	2018-05-22
dc.date.submitted	2017
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/3581
dc.identifier.abstract	https://etd.iisc.ac.in/static/etd/abstracts/4449/G28168-Abs.pdf	en_US
dc.description.abstract	Deep neural networks with millions of parameters are at the heart of many state of the art computer vision models. However, recent works have shown that models with much smaller number of parameters can often perform just as well. A smaller model has the advantage of being faster to evaluate and easier to store - both of which are crucial for real-time and embedded applications. While prior work on compressing neural networks have looked at methods based on sparsity, quantization and factorization of neural network layers, we look at the alternate approach of pruning neurons. Training Neural Networks is often described as a kind of `black magic', as successful training requires setting the right hyper-parameter values (such as the number of neurons in a layer, depth of the network, etc ). It is often not clear what these values should be, and these decisions often end up being either ad-hoc or driven through extensive experimentation. It would be desirable to automatically set some of these hyper-parameters for the user so as to minimize trial-and-error. Combining this objective with our earlier preference for smaller models, we ask the following question - for a given task, is it possible to come up with small neural network architectures automatically? In this thesis, we propose methods to achieve the same. The work is divided into four parts. First, given a neural network, we look at the problem of identifying important and unimportant neurons. We look at this problem in a data-free setting, i.e; assuming that the data the neural network was trained on, is not available. We propose two rules for identifying wasteful neurons and show that these suffice in such a data-free setting. By removing neurons based on these rules, we are able to reduce model size without significantly affecting accuracy. Second, we propose an automated learning procedure to remove neurons during the process of training. We call this procedure ‘Architecture-Learning’, as this automatically discovers the optimal width and depth of neural networks. We empirically show that this procedure is preferable to trial-and-error based Bayesian Optimization procedures for selecting neural network architectures. Third, we connect ‘Architecture-Learning’ to a popular regularize called ‘Dropout’, and propose a novel regularized which we call ‘Generalized Dropout’. From a Bayesian viewpoint, this method corresponds to a hierarchical extension of the Dropout algorithm. Empirically, we observe that Generalized Dropout corresponds to a more flexible version of Dropout, and works in scenarios where Dropout fails. Finally, we apply our procedure for removing neurons to the problem of removing weights in a neural network, and achieve state-of-the-art results in scarifying neural networks.	en_US
dc.language.iso	en_US	en_US
dc.relation.ispartofseries	G28168	en_US
dc.subject	Deep Neural Networks	en_US
dc.subject	Learning Compact Architectures	en_US
dc.subject	Machine Learning	en_US
dc.subject	Binary Neural Nets	en_US
dc.subject	Architecture Learning	en_US
dc.subject	Sparse Neural Networks	en_US
dc.subject	Bayesian Neural Networks	en_US
dc.subject	Neural Network Architectures	en_US
dc.subject.classification	Computational and Data Sciences	en_US
dc.title	Learning Compact Architectures for Deep Neural Networks	en_US
dc.type	Thesis	en_US
dc.degree.name	MSc Engg	en_US
dc.degree.level	Masters	en_US
dc.degree.discipline	Faculty of Engineering	en_US