Technical Program

Paper Detail

Paper: PS-2B.15
Session: Poster Session 2B
Location: H Fl├Ąche 1.OG
Session Time: Sunday, September 15, 17:15 - 20:15
Presentation Time:Sunday, September 15, 17:15 - 20:15
Presentation: Poster
Publication: 2019 Conference on Cognitive Computational Neuroscience, 13-16 September 2019, Berlin, Germany
Paper Title: Towards Global Recurrent Models of Visual Processing: Capsule Networks
Manuscript:  Click here to view manuscript
License: Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 Unported License.
DOI: https://doi.org/10.32470/CCN.2019.1066-0
Authors: Adrien Doerig, Lynn Schmittwilken, EPFL, Switzerland; Mauro Manassi, UC Berkeley, United States; Michael Herzog, EPFL, Switzerland
Abstract: Classically, visual processing is described as a cascade of local feedforward computations and Convolutional Neural Networks (CNNs) have shown how powerful such models can be. However, CNNs only roughly mimic human vision. For example, CNNs do not take the global spatial configuration of visual elements into account but often rely mainly on textures. For example, for CNNs, a face is not different from a scrambled version of it. For this reason, CNNs fail to explain many visual paradigms, such as crowding, where configuration strongly matters. In crowding, the perception of a target deteriorates in the presence of neighboring elements. Classically, adding flanking elements was thought to always decrease performance. However, adding flankers even far away from the target can improve performance, depending on the global configuration (an effect called uncrowding). We showed previously that no classic model of crowding, including CNNs, can explain uncrowding (Doerig et al., 2019). Here, we show that Capsule Networks (CapsNets; Sabour, Frosst, & Hinton, 2017), combining CNNs, learning algorithms and recurrent object segmentation, explain both crowding and uncrowding. Contrary to CNNs, capsule networks use recurrent computations, which leads them to perform very similarly to humans, as we show with psychophysical experiments. These powerful recurrent networks offer a promising general framework to model global object shape recurrent processing.