Efficient Algorithms for Learning Restricted Boltzmann Machines

Upadhya, Vidyadhar

dc.contributor.advisor	Sastry, P S
dc.contributor.author	Upadhya, Vidyadhar
dc.date.accessioned	2021-02-02T06:26:47Z
dc.date.available	2021-02-02T06:26:47Z
dc.date.submitted	2020
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/4840
dc.description.abstract	The probabilistic generative models learn useful features from unlabeled data which can be used for subsequent problem-specific tasks, such as classification, regression or information retrieval. The RBM is one such important energy based probabilistic generative model. RBMs are also the building blocks for several deep generative models. It is difficult to train and evaluate RBMs mainly because the normalizing constant (known as the partition function) for the distribution that they represent is computationally hard to evaluate. Therefore, various approximate methods (based noisy gradient of the log likelihood estimated through sampling) are used to train RBMs. Thus, building efficient learning algorithms for the RBM is an important problem. In this thesis, we consider the problem of maximum likelihood learning of RBMs. We consider both binary-binary RBMs as well as Gaussian-binary RBMs. We propose a new algorithm for learning binary-binary RBMs by exploiting the property that the BB-RBM log-likelihood function is a difference of convex functions w.r.t. its parameters. In the standard difference of convex functions programming (DCP), the optimization proceeds through solving a convex optimization problem at each iteration. In the case of RBM, this convex objective function contains the partition function and hence its gradient computation may be intractable. We propose a stochastic variant of the difference of convex functions optimization algorithm, termed S-DCP, where the the convex optimization problem at each iteration is approximately solved through a few iterations of stochastic gradient descent. The resulting algorithm is simple and the contrastive divergence~(CD) algorithm, the current standard algorithm for learning RBMs, can be derived as a special case of the proposed algorithm. It is seen through empirical studies that S-DCP improves the optimization dynamics of learning binary-binary RBMs. We further modify this algorithm to accommodate centered gradients. Through extensive empirical studies on a number of benchmark datasets, we demonstrate the superior performance of the proposed algorithms. It is well documented in the literature that learning Gaussian-binary RBMs is more difficult compared to binary-binary RBMs. We extend the S-DCP algorithm to learn Gaussian-binary RBMs by proving that the Gaussian-binary RBM log-likelihood function is also a difference of convex functions w.r.t. the weights and hidden biases under the assumption that the conditional distribution of the visible units have a fixed variance. Through extensive empirical studies on a number of benchmark datasets, we demonstrate that S-DCP learns good models more efficiently compared to CD and Persistent CD, the current standard algorithms for learning Gaussian-binary RBMs. We further modify the S-DCP algorithm to accommodate variance update (outside the inner loop of the convex optimization) so that we can learn the variance parameter of visible units too instead of keeping it fixed. We empirically analyse the resulting algorithm and show that it is more efficient compared to the current algorithms. The second order learning methods provide invariance to re-parameterization of the model and also, improve the optimization dynamics of the learning algorithm by providing parameter specific (adaptive) learning rates. However, the Hessian of the log-likelihood, required for second order learning algorithm, can only be estimated through sampling and this noisy estimate makes the optimization algorithm unstable. Moreover, the computation of the Hessian inverse is expensive. We propose a second order learning algorithm on the convex S-DCP objective function using diagonal approximation of the Hessian which, we show, can be easily computed with the gradient estimates. To compensate for the noise in the Hessian estimate and to make the algorithm stable, we use an exponential averaging over these estimates. We show empirically that the resulting algorithm, termed S-DCP-D, is computationally cheap, stable and improves the performance of S-DCP further. Our empirical results show that the centered S-DCP as well as the diagonally scaled S-DCP are effective and efficient methods for learning RBMs. In all methods for learning RBMs, the log-likelihood achieved on the held-out test samples are used to evaluate the quality of learnt RBMs and for fixing the hyperparameters. However, the presence of the partition function makes estimating the log-likelihood intractable for models with large dimension. Currently one uses some sampling based methods for approximating the log likelihood. We provide an empirical analysis of these sampling based algorithms to estimate the log-likelihood and suggest some simple techniques to improve these estimates.	en_US
dc.language.iso	en_US	en_US
dc.rights	I grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation	en_US
dc.subject	Pattern recognition	en_US
dc.subject	machine learning	en_US
dc.subject	Generative Models	en_US
dc.subject	Restricted Boltzmann Machines	en_US
dc.subject.classification	Research Subject Categories::TECHNOLOGY::Electrical engineering, electronics and photonics::Electrical engineering	en_US
dc.title	Efficient Algorithms for Learning Restricted Boltzmann Machines	en_US
dc.type	Thesis	en_US
dc.degree.name	PhD	en_US
dc.degree.level	Doctoral	en_US
dc.degree.grantor	Indian Institute of Science	en_US
dc.degree.discipline	Engineering	en_US

Files in this item

Name:: revised-thesis-vidyadhar.pdf
Size:: 10.81Mb
Format:: PDF
Description:: Thesis full text

View/Open

This item appears in the following Collection(s)

Electrical Engineering (EE) [342]

Show simple item record