ECU Libraries Catalog

Using synchronized audio mapping to predict velar and pharyngeal wall locations during dynamic MRI sequences / by Pooya Rahimian.

Author/creator Rahimian, Pooya author.
Other author/creatorTabrizi, M. H. N., degree supervisor.
Other author/creatorEast Carolina University. Department of Computer Science.
Format Theses and dissertations, Electronic, and Book
Publication Info [Greenville, N.C.] : [East Carolina University], 2013.
Description82 pages : illustrations (some color)
Supplemental Content Access via ScholarShip
Subject(s)
Summary Automatic tongue, velum (i.e., soft palate), and pharyngeal movement tracking systems provide a significant benefit for the analysis of dynamic speech movements. Studies have been conducted using ultrasound, x-ray, and Magnetic Resonance Images (MRI) to examine the dynamic nature of the articulators during speech. Simulating the movement of the tongue, velum, and pharynx is often limited by image segmentation obstacles, where, movements of the velar structures are segmented through manual tracking. These methods are extremely time-consuming, coupled with inherent noise, motion artifacts, air interfaces, and refractions often complicate the process of computer-based automatic tracking. Furthermore, image segmentation and processing techniques of velopharyngeal structures often suffer from leakage issues related to the poor image quality of the MRI and the lack of recognizable boundaries between the velum and pharynx during contact moments. Computer-based tracking algorithms are developed to overcome these disadvantages by utilizing machine learning techniques and corresponding speech signals that may be considered prior information. The purpose of this study is to illustrate a methodology to track the velum and pharynx from a MRI sequence using the Hidden Markov Model (HMM) and Mel-Frequency Cepstral Coefficients (MFCC) by analyzing the corresponding audio signals. Auditory models such as MFCC have been widely used in Automatic Speech Recognition (ASR) systems. Our method uses customized version of the traditional approach for audio feature extraction in order to extract visual feature from the outer boundaries of the velum and the pharynx marked (selected pixel) by a novel method, The reduced audio features helps to shrink the search space of HMM and improve the system performance. Three hundred consecutive images were tagged by the researcher. Two hundred of these images and the corresponding audio features (5 seconds) were used to train the HMM and a 2.5 second long audio file was used to test the model. The error rate was measured by calculating minimum distance between predicted and actual markers. Our model was able to track and animate dynamic articulators during the speech process in real-time with an overall accuracy of 81% considering one pixel threshold. The predicted markers (pixels) indicated the segmented structures, even though the contours of contacted areas were fuzzy and unrecognizable.
General notePresented to the faculty of the Department of Computer Science.
General noteAdvisor: M. H. Nassehzadeh Tabrizi.
General noteTitle from PDF t.p. (viewed October 2, 2013).
Dissertation noteM.S. East Carolina University 2013.
Bibliography noteIncludes bibliographical references.
Technical detailsSystem requirements: Adobe Reader.
Technical detailsMode of access: World Wide Web.

Available Items

Library Location Call Number Status Item Actions
Electronic Resources Access Content Online ✔ Available