Deep neural network models (DNNs) are central to cognitive computational neuroscience because they link cognition to its implementation in brains. DNNs promise to provide a language for expressing biologically plausible hypotheses about brain computation. A peril, however, is that high-parametric models are universal approximators, making it difficult to adjudicate among alternative models meant to express distinct computational hypotheses. On the one hand, modeling intelligent behavior requires high parametric capacity. On the other hand, it is unclear how we can glean theoretical insights from overly flexible high-parametric models. Here we present one approach toward a solution to this conundrum: the method of controversial stimuli. Synthetic controversial stimuli are stimuli (e.g. images, sounds, sentences) optimized to elicit distinct predictions from different models. Because synthetic controversial stimuli provide severe tests of out-of-distribution generalization, they reveal high-parametric models’ distinct inductive biases.
Controversial stimuli can be used in experiments measuring behavior or brain activity. In either case, we must first define a controversiality objective that reflects the power afforded by different stimulus sets to adjudicate among our set of DNN models. Ideally, the objective should quantify the expected reduction in our uncertainty about which model is correct (i.e. the entropy reduction of the posterior). In practice, however, heuristic approximations to this objective may be preferable. If the models are differentiable, then gradient descent can be used to efficiently generate controversial stimuli; otherwise gradient-free optimization methods must be used.
We demonstrate the method in the context of a wide range of visual recognition models, including feedforward and recurrent, discriminative and generative, conventionally and adversarially trained models (Golan, Raju & Kriegeskorte, 2020). A stimulus was defined as controversial between two models if it was classified with high confidence as belonging to one category by one of the models and as belonging to a different category by the other model. Our results suggest that models with generative components best account for human visual recognition in the context of handwritten digits (MNIST) and small natural images (CIFAR-10). We will also share new results from applications of controversial stimuli in different domains and discuss the relationship of the method of controversial stimuli to adversarial examples, metamers, and maximally exciting stimuli, other types of synthetic stimuli that can reveal failure modes of models.
Controversial stimuli greatly improve our power to adjudicate among models. In addition, they provide out-of-distribution probes that reveal the inductive biases implicit to the architecture, objective function, and learning rule that defines each model. The method can drive theoretical insight because it enables us to distinguish computational hypotheses implemented in models that are sufficiently high-parametric to capture the knowledge needed for intelligent behavior.