Technical Program

Paper Detail

Paper: PS-2B.42
Session: Poster Session 2B
Location: Symphony/Overture
Session Time: Friday, September 7, 19:30 - 21:30
Presentation Time:Friday, September 7, 19:30 - 21:30
Presentation: Poster
Paper Title: Neural network vs. HMM speech recognition systems as models of human cross-linguistic phonetic perception
Manuscript:  Click here to view manuscript
Authors: Thomas Schatz, Naomi Feldman, University of Maryland & Massachusetts Institute of Technology, United States
Abstract: The way listeners perceive speech sounds is largely determined by the language(s) they were exposed to as a child. For example, native speakers of Japanese have a hard time discriminating between American English /ɹ/ and /l/, a phonetic contrast that has no equivalent in Japanese. Such effects are typically attributed to knowledge of sounds in the native language, but quantitative models of how these effects arise from linguistic knowledge are lacking. One possible source for such models is Automatic Speech Recognition (ASR) technology. We implement models based on two types of systems from the ASR literature—hidden Markov models (HMMs) and the more recent, and more accurate, neural network systems—and ask whether, in addition to showing better performance, the neural network systems also provide better models of human perception. We find that while both types of systems can account for Japanese natives’ difficulty with American English /ɹ/ and /l/, only the neural network system successfully accounts for Japanese natives’ facility with Japanese vowel length contrasts. Our work provides a new example, in the domain of speech perception, of an often observed correlation between task performance and similarity to human behavior.