Provable Methods for Non-negative Matrix Factorization
Nonnegative matrix factorization (NMF) is an important data-analysis problem which concerns factoring a given d n matrix A with nonnegative entries into matrices B and C where B and C are d k and k n with nonnegative entries. It has numerous applications including Object recognition, Topic Modelling, Hyper-spectral imaging, Music transcription etc. In general, NMF is intractable and several heuristics exists to solve the problem of NMF. Recently there has been interest in investigating conditions under which NMF can be tractably recovered. We note that existing attempts make unrealistic assumptions and often the associated algorithms tend to be not scalable. In this thesis, we make three major contributions: First, we formulate a model of NMF with assumptions which are natural and is a substantial weakening of separability. Unlike requiring a bound on the error in each column of (A BC) as was done in much of previous work, our assumptions are about aggregate errors, namely spectral norm of (A BC) i.e. jjA BCjj2 should be low. This is a much weaker error assumption and the associated B; C would be much more resilient than existing models. Second, we describe a robust polynomial time SVD-based algorithm, UTSVD, with realistic provable error guarantees and can handle higher levels of noise than previous algorithms. Indeed, experimentally we show that existing NMF models, which are based on separability assumptions, degrade much faster than UTSVD, in the presence of noise. Furthermore, when the data has dominant features, UTSVD significantly outperforms existing models. On real life datasets we again see a similar outperformance of UTSVD on clustering tasks. Finally, under a weaker model, we prove a robust version of uniqueness of NMF, where again, the word \robust" refers to realistic error bounds.