Using synchronized audio mapping to predict velar and pharyngeal wall locations during dynamic MRI sequences - ECU Libraries Catalog

Using synchronized audio mapping to predict velar and pharyngeal wall locations during dynamic MRI sequences / by Pooya Rahimian.

Author/creator	Rahimian, Pooya author.
Other author/creator	Tabrizi, M. H. N., degree supervisor.
Other author/creator	East Carolina University. Department of Computer Science.
Format	Theses and dissertations, Electronic, and Book
Publication Info	[Greenville, N.C.] : [East Carolina University], 2013.
Description	82 pages : illustrations (some color)
Supplemental Content	Access via ScholarShip
Subject(s)	Speech processing systems. Computational linguistics.

Click here for more information about this title

Summary	Automatic tongue, velum (i.e., soft palate), and pharyngeal movement tracking systems provide a significant benefit for the analysis of dynamic speech movements. Studies have been conducted using ultrasound, x-ray, and Magnetic Resonance Images (MRI) to examine the dynamic nature of the articulators during speech. Simulating the movement of the tongue, velum, and pharynx is often limited by image segmentation obstacles, where, movements of the velar structures are segmented through manual tracking. These methods are extremely time-consuming, coupled with inherent noise, motion artifacts, air interfaces, and refractions often complicate the process of computer-based automatic tracking. Furthermore, image segmentation and processing techniques of velopharyngeal structures often suffer from leakage issues related to the poor image quality of the MRI and the lack of recognizable boundaries between the velum and pharynx during contact moments. Computer-based tracking algorithms are developed to overcome these disadvantages by utilizing machine learning techniques and corresponding speech signals that may be considered prior information. The purpose of this study is to illustrate a methodology to track the velum and pharynx from a MRI sequence using the Hidden Markov Model (HMM) and Mel-Frequency Cepstral Coefficients (MFCC) by analyzing the corresponding audio signals. Auditory models such as MFCC have been widely used in Automatic Speech Recognition (ASR) systems. Our method uses customized version of the traditional approach for audio feature extraction in order to extract visual feature from the outer boundaries of the velum and the pharynx marked (selected pixel) by a novel method, The reduced audio features helps to shrink the search space of HMM and improve the system performance. Three hundred consecutive images were tagged by the researcher. Two hundred of these images and the corresponding audio features (5 seconds) were used to train the HMM and a 2.5 second long audio file was used to test the model. The error rate was measured by calculating minimum distance between predicted and actual markers. Our model was able to track and animate dynamic articulators during the speech process in real-time with an overall accuracy of 81% considering one pixel threshold. The predicted markers (pixels) indicated the segmented structures, even though the contours of contacted areas were fuzzy and unrecognizable.
General note	Presented to the faculty of the Department of Computer Science.
General note	Advisor: M. H. Nassehzadeh Tabrizi.
General note	Title from PDF t.p. (viewed October 2, 2013).
Dissertation note	M.S. East Carolina University 2013.
Bibliography note	Includes bibliographical references.
Technical details	System requirements: Adobe Reader.
Technical details	Mode of access: World Wide Web.

Available Items

Library	Location	Call Number	Status	Item Actions
	Electronic Resources	Access Content Online	✔ Available