Improved air-tissue boundary segmentation in real-time magnetic resonance imaging videos using speech articulator specific error criterion

Roy, Anwesha

dc.contributor.advisor	Ghosh, Prasanta Kumar
dc.contributor.author	Roy, Anwesha
dc.date.accessioned	2022-10-26T07:06:42Z
dc.date.available	2022-10-26T07:06:42Z
dc.date.submitted	2022
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/5882
dc.description.abstract	Real-time Magnetic Resonance Imaging (rtMRI) is a tool used exhaustively in speech science and linguistics to understand the dynamics of the speech production process across languages and health conditions. rtMRI has two advantages over other methods which capture articulatory movement, like X-ray, Ultrasound and Electromagnetic articulography - it is non invasive, and it captures a complete view of the vocal tract including pharyngeal structures. The rtMRI video provides spatio-temporal information of speech articulatory movements, which helps in modeling speech production. For this purpose, a common step is to obtain the air-tissue boundary (ATB) segmentation in all frames of the rtMRI video. The accurate estimation of ATBs of the upper airway of the vocal tract is essential for many speech processing applications like speaker verification, text-to-speech synthesis, visual augmentation for synthesized articulatory videos, and analysis of vocal tract movement. Thus, it is necessary to have an accurate air-tissue boundary segmentation in every frame of the rtMRI videos. The best performance in ATB segmentation of rtMRI videos in speech production, in unseen subject conditions, is known to be achieved by a 3-dimensional convolutional neural network (3D-CNN) model. In seen subject conditions, both 3D-CNN and 2-dimensional deep convolutional encoder-decoder network (SegNet) show similar performance. However, the evaluation of these models, as well as other ATB segmentation techniques reported in literature, has been done using Dynamic Time Warping (DTW) distance between the entire original and predicted boundaries or contours. Such an evaluation measure may not capture local errors in the predicted contour. Careful analysis of predicted contours reveals errors in regions like the velum part and tongue base section, which are not captured in a global evaluation metric like DTW distance. In this thesis, such errors are automatically detected and a novel correction scheme is proposed for them. Two new evaluation metrics are also proposed for ATB segmentation, separately for each contour, to explicitly capture errors in these contours. Moreover, the state-of-the-art models use overall binary cross entropy as the loss function during model training. However, such a global loss function does not give enough emphasis on regions which are more prone to errors. In this thesis, together with global loss, the use of regional loss functions has been explored, which focus on areas of the contours which have been analyzed as error prone in the analysis. Two different losses are considered in the regions around velum and tongue base - binary cross entropy (BCE) loss and dice loss. It is observed that dice-loss based models perform better than their BCE loss based counterparts.	en_US
dc.language.iso	en_US	en_US
dc.rights	I grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation	en_US
dc.subject	segmentation	en_US
dc.subject	rtMRI	en_US
dc.subject.classification	Research Subject Categories::TECHNOLOGY::Electrical engineering, electronics and photonics::Electrical engineering	en_US
dc.title	Improved air-tissue boundary segmentation in real-time magnetic resonance imaging videos using speech articulator specific error criterion	en_US
dc.type	Thesis	en_US
dc.degree.name	MTech (Res)	en_US
dc.degree.level	Masters	en_US
dc.degree.grantor	Indian Institute of Science	en_US
dc.degree.discipline	Engineering	en_US

Files in this item

Name:: Thesis_M_Tech__Research_revise ...
Size:: 6.845Mb
Format:: PDF
Description:: Thesis full text

View/Open

This item appears in the following Collection(s)

Electrical Engineering (EE) [363]

Show simple item record