Compositionality of Object Representations in Brains and Machines
Pramod, R T
MetadataShow full item record
Compositionality in object vision can be defined as the principles governing the relationship between whole objects and their constituent attributes. It is known that object information falling on the retina is processed in a hierarchy of cortical regions starting from simple edge-detectors in the primary visual cortex to complex shape representations in the higher visual cortex, yet we still do not understand how whole objects are represented in terms of their attributes. With recent advances in computer vision, we have, for the first time in history, a very good machine vision system in the form of convolutional neural networks. How do these systems compare with human vision? We argue that understanding vision in the brain and making machines see the way we do form two sides of the same coin – understanding one will give us insights into the other. With this in mind, the goal of my thesis is twofold – to study compositionality in object representations in the brain; and to compare compositionality in brains and machines with the goal of improving machine vision. I will present results from a series of studies where we investigate object representations in brains and machines. In the first set of studies, we investigated whether whole object responses in perception and in single neurons could be understood in terms of their parts. The main findings are: (1) Object attributes combine linearly in visual search (Pramod & Arun, 2016); (2) Although symmetry is a salient holistic property, responses to symmetric objects are also explained as a sum of their parts as were asymmetric objects (Pramod & Arun, 2018). Taken together these findings confirm the compositionality of object representations in perception and in high-level visual cortex. In the second set of studies, we compared the compositionality of object representations in brains and machines. The main findings are: (1) Object representations in virtually all computer vision models (including deep neural networks) deviate systematically from human perception (Pramod & Arun, 2016); (2) Symmetric objects are more salient in perception than in deep neural networks, and fixing this bias leads to significant improvements in object detection performance; and finally, (3) we show that under-sampling of the periphery in the biological retina is computationally optimal for object recognition in natural scenes, pointing to dissociable roles for object and context. Taken together, these findings show that machine vision can be understood and improved by studying biological vision.