Technical Program

Paper Detail

Paper: GS-4.1
Session: Contributed Talks IV
Location: Ormandy
Session Time: Friday, September 7, 09:50 - 10:30
Presentation Time:Friday, September 7, 09:50 - 10:10
Presentation: Oral
Paper Title: Auditory texture synthesis from task-optimized convolutional neural networks
Manuscript:  Click here to view manuscript
Authors: Jenelle Feather, Josh H. McDermott, Massachusetts Institute of Technology, United States
Abstract: Models of sensory systems have traditionally been hand designed from engineering principles, but modern-day machine learning allows models to be learned from data. We sought to compare hand-engineered and learned models of the auditory system by generating synthetic sound textures. We synthesized sounds that produce the same time-averaged values in each model’s representation as those measured from a natural texture using gradient-based optimization. Such stimuli should evoke the same texture percept if the model replicates the representations underlying auditory texture perception. Previous texture models involved statistics measured from multiple stages of standard visual or auditory processing cascades. We found that auditory textures generated simply from the time-averaged power in the first layer activations of a task-optimized convolutional neural network were as realistic and recognizable as the best previous auditory texture model. Unlike textures generated from traditional models, the textures from task-optimized filters did not require statistics from earlier stages in the sensory model (i.e., the cochlear stage). Further, the textures generated from the task-optimized CNN filters were more realistic than textures generated from a widely used hand-engineered model of primary auditory cortex. The results demonstrate that better sensory models can be obtained by task-optimizing sensory representations.