Robust Speaker Identification System

Patra, Sabyasachi

dc.contributor.advisor	Balakrishnan, N
dc.contributor.author	Patra, Sabyasachi
dc.date.accessioned	2025-10-30T10:39:54Z
dc.date.available	2025-10-30T10:39:54Z
dc.date.submitted	2007
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/7257
dc.description.abstract	Recent advances in research and development in speaker recognition and identification systems have made speaker identification one of the most trusted methods for authorization and forensic applications. However, field deployment of such systems requires their ability to function effectively in noisy environments. Designing robust speaker identification systems for such conditions has gained significant attention from the research community and is the focus of this thesis. In this work, we explore various dimensionality reduction techniques and their application to speaker identification. Principal Component Analysis (PCA), a coordinate-based dimensionality reduction method, plays a dominant role in this domain. By projecting the original feature set into a smaller subspace through a linear orthogonal transformation, PCA reduces both the dimensionality and the correlation among feature vectors. This transformation lowers computational overhead in subsequent processing stages and minimizes the effect of noise, thereby improving accuracy. This thesis applies a feature-dependent dimensionality reduction technique known as Weighted Principal Component Analysis (WPCA). The key advantage of WPCA is its ability to merge coordinate-based and weight-based methods into a unified framework. Experimental results show an improvement of up to 3% in speaker identification accuracy using WPCA over PCA across various Signal-to-Noise Ratios (SNRs). Selecting the optimal set of parameters is critical in dimensionality reduction. In speaker identification, the most significant components are those that clearly distinguish speech among individuals. We conducted experiments to identify the optimal parameters in the transformed feature vectors. In this thesis, 24-dimensional MFCC (Mel-Frequency Cepstral Coefficients) feature vectors are used, which are transformed into either PCA or WPCA space. In PCA space, each feature vector is divided into two parts: coefficients 1 to 12 correspond to higher eigenvalues forming Principal Component Features (PCF), and coefficients 13 to 24 correspond to lower eigenvalues forming Minor Component Features (MCF). Experimental and analytical results show that MCFs have greater discriminative power than PCFs. Another significant contribution of this thesis is the extraction of latent features from the speech spectrum to enable automatic noise filtration. The proposed method applies Latent Variable Decomposition (LVD) to the magnitude spectral vector of the speech signal. In this method, the distribution of spectral vectors is modeled using a mixture multinomial distribution based on the a priori probability of a fixed number of hidden classes and the conditional frequency beam distribution. These form the transformation matrix used to generate new feature vectors. The number of hidden classes determines the dimensionality of the new feature vector. Since these features are inherently frequency-independent, noise effects are absorbed during this process. These features are used in the candidate selection stage, where decisions are made based on Bhattacharyya Distance between speakers. Subsequently, Gaussian Mixture Models (GMM) are applied to the selected candidates using MFCC feature vectors. Results show that the proposed features yield up to 400% improvement in speaker identification rate over MFCC features at 10 dB SNR, demonstrating high effectiveness in noisy environments.
dc.language.iso	en_US
dc.relation.ispartofseries	T06533
dc.rights	I grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation
dc.subject	Principal Component Analysis
dc.subject	Weighted PCA
dc.subject	Latent Variable Decomposition
dc.title	Robust Speaker Identification System
dc.degree.name	MSc Engg
dc.degree.level	Masters
dc.degree.grantor	Indian Institute of Science
dc.degree.discipline	Engineering

Files in this item

Name:: T06533.pdf
Size:: 11.16Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Supercomputer Education and Research Centre (SERC) [113]

Show simple item record