Music And Speech Analysis Using The 'Bach' Scale Filter-Bank

Ananthakrishnan, G

dc.contributor.advisor	Ramakrishnan, A G
dc.contributor.author	Ananthakrishnan, G
dc.date.accessioned	2009-08-13T06:52:10Z
dc.date.accessioned	2018-07-31T04:57:30Z
dc.date.available	2009-08-13T06:52:10Z
dc.date.available	2018-07-31T04:57:30Z
dc.date.issued	2009-08-13T06:52:10Z
dc.date.submitted	2007
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/592
dc.description.abstract	The aim of this thesis is to deﬁne a perceptual scale for the ‘Time-Frequency’ analysis of music signals. The equal tempered ‘Bach ’ scale is a suitable scale, since it covers most of the genres of music and the error is equally distributed for each semi-tone. However, it may be necessary to allow a tolerance of around 50 cents or half the interval of the Bach scale, so that the interval can accommodate other common intonation schemes. The thesis covers the formulation of the Bach scale ﬁlter-bank as a time-varying model. It makes a comparative study with other commonly used perceptual scales. Two applications for the Bach scale ﬁlter-bank are also proposed, namely automated segmentation of speech signals and transcription of singing voice for query-by-humming applications. Even though this ﬁlter-bank is suggested with a motivation from music, it could also be applied to speech. A method for automatically segmenting continuous speech into phonetic units is proposed. The results, obtained from the proposed method, show around 82% accuracy for the English and 85% accuracy for the Hindi databases. This is an improvement of around 2 -3% when the performance is compared with other popular methods in the literature. Interestingly, the Bach scale ﬁlters perform better than the ﬁlters designed for other common perceptual scales, such as Mel and Bark scales. ‘Musical transcription’ refers to the process of converting a musical rendering or performance into a set of symbols or notations. A query in a ‘query-by-humming system’ can be made in several ways, some of which are singing with words, or with arbitrary syllables, or whistling. Two algorithms are suggested to annotate a query. The algorithms are designed to be fairly robust for these various forms of queries. The ﬁrst algorithm is a frequency selection based method. It works on the basis of selecting the most likely frequency components at any given time instant. The second algorithm works on the basis of ﬁnding time-connected contours of high energy in the ‘Time-Frequency’ plane of the input signal. The time domain algorithm works better in terms of instantaneous pitch estimates. It results in an error of around 10 -15%, while the frequency domain method results in an error of around 12 -20%. A song rendered by two diﬀerent people will have quite a few diﬀerent properties. Their absolute pitches, rates of rendering, timbres based on voice quality and inaccuracies, may be diﬀerent. The thesis discusses a method to quantify the distance between two diﬀerent renderings of musical pieces. The distance function has been evaluated by attempting a search for a particular song from a database of a size of 315, made up of songs sung by both male and female singers and whistled queries. Around 90 % of the time, the correct song is found among the top ﬁve best choices picked. Thus, the Bach scale has been proposed as a suitable scale for representing the perception of music. It has been explored in two applications, namely automated segmentation of speech and transcription of singing voices. Using the transcription obtained, a measure of the distance between renderings of musical pieces has also been suggested.	en
dc.language.iso	en_US	en
dc.relation.ispartofseries	G21044	en
dc.subject	Speech Analysis	en
dc.subject	Speech Processing	en
dc.subject	Filter Bank	en
dc.subject	Musical Transcription	en
dc.subject	Speech Recognition	en
dc.subject	Speech - Signal Processing	en
dc.subject	Audio Signals	en
dc.subject	Music - Pitch Tracking Algorithms	en
dc.subject	Music Signals - Analysis	en
dc.subject	Time Frequency Analysis	en
dc.subject	Bach Scale	en
dc.subject	Automated Speech Segmentation	en
dc.subject.classification	Computer Science	en
dc.title	Music And Speech Analysis Using The 'Bach' Scale Filter-Bank	en
dc.type	Thesis	en
dc.degree.name	MSc Engg	en
dc.degree.level	Masters	en
dc.degree.discipline	Faculty of Engineering	en