Show simple item record

dc.contributor.advisorSarma, V V S
dc.contributor.authorHenry Mary, Dante
dc.date.accessioned2026-03-10T09:29:37Z
dc.date.available2026-03-10T09:29:37Z
dc.date.submitted1979
dc.identifier.urihttps://etd.iisc.ac.in/handle/2005/8924
dc.description.abstractThis thesis presents studies on the application of multistage pattern?recognition schemes in automatic speaker identification and verification systems for large populations. The success of speaker?recognition schemes in laboratory environments for small populations suggests the feasibility of designing such systems for several practical applications in military communications, voice banking, and forensic sciences. In these practical situations, it is necessary that the system be capable of handling large populations. In this context, the formulation and solution of the speaker?recognition problem as a multiclass pattern?recognition problem appears to be very promising. Even though pattern?recognition theory has advanced considerably in recent years, most speaker?recognition schemes reported in the literature are designed on an ad hoc or trial?and?error basis and are largely restricted to exploratory studies of particular feature sets for this application. Notable exceptions to this trend are the recent research and development efforts on speaker?verification systems by Bell Laboratories, Texas Instruments, and Philips Research. However, many questions still remain regarding the design of identification systems for large populations, forensic applications of voice?based schemes for personal identification, and environmental effects. The first part of this thesis considers the issue of large population size. Several recent studies in pattern?recognition theory consider multistage classification as an attractive approach for multiclass pattern?recognition problems. In multistage classifiers, the final decision is made only after a number of stages, which may be prescribed in advance. A theoretical model suitable for this application, involving normally distributed features, is developed in this thesis. The proposed decision scheme can be broadly divided into two phases. In the first phase, using some features, a small number of classes to which the test sample is likely to belong is selected with a high degree of accuracy. In the second phase, the actual identification is carried out using another feature subset. The resulting classifier is a hierarchical classifier and can be viewed as a decision tree. The number of classes rejected at each node of the decision tree is derived as a function of the measured value of the feature at that node when the features are normally distributed. Assignment of features at the two stages of the decision tree for a two?stage scheme for recognition of thirty speakers has been done initially on an ad hoc basis. But for larger populations, where many features and many stages are needed, an optimal procedure for feature allocation to various nodes of the decision tree becomes essential. The criterion for optimal allocation of features is defined as a cost function involving feature?measurement costs and recognition errors. This aspect is formulated as a stochastic optimal?control problem. The approach is illustrated by designing an optimal classifier for a sixty?speaker recognition problem. The problem of reliability estimation of voice?based personal?identity verification systems for forensic applications is considered in the second part of the thesis. The proposed model utilizes the normality assumption of features and also incorporates the multistage?recognition concepts developed in the first part of the thesis. A statistical study of some speech features is also undertaken to verify the normality assumptions. A preliminary estimate of the number of independent features and the amount of speech data needed is obtained using this model for the very high recognition accuracies required in this problem. The thesis concludes with a discussion of several other important practical considerations in system design, such as designing and testing the system from a finite set of labelled samples and environmental considerations such as additive channel noise.
dc.language.isoen_US
dc.relation.ispartofseriesT01628
dc.rightsI grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation
dc.subjectSpeaker recognition
dc.subjectAutomatic systems
dc.subjectPattern classification
dc.titleMultistage pattern recognition schemes for automatic... ... verification
dc.typeThesis
dc.degree.namePhD
dc.degree.levelDoctoral
dc.degree.grantorIndian Institute of Science
dc.degree.disciplineEngineering


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record