It has been claimed that convolutional neural networks (CNNs) have now achieved human-level performance at object recognition tasks. However, modest changes to the object stimuli or to the viewing conditions can sometimes cause state-of-the-art CNNs to fail, raising questions as to whether they truly process visual information in a manner that mimics the human visual system. Here, I will present behavioral and neuroimaging data demonstrating the robustness of human vision when tasked with recognizing objects in severe levels of visual noise. Our functional MRI studies demonstrate the powerful role of top-down attentional feedback in dampening neural responses to visual noise, clutter, and competing overlapping objects. In experiments that directly pit human observers and CNNs, we find that humans outperform CNNs by a large margin and that they are affected by white noise and spatially correlated (‘pink’) noise in qualitatively different ways. We developed a noise-training procedure, generating noisy images of objects with low signal-to-noise ratio, to investigate whether CNNs can acquire robustness that better matches human vision. After noise training, CNNs could outperform human observers while exhibiting more similar qualitative patterns of performance. Moreover, noise-trained CNNs provided a better model for predicting human recognition thresholds on an image-by-image basis. Layer-specific analyses revealed that the contaminating effects of noise were dampened, rather than amplified, across successive stages of the noise-trained network. Our findings suggest that CNNs can learn noise-robust representations that better approximate human visual processing, though it remains an open question as to how the incorporation of top-down attention mechanisms might further improve the correspondence between artificial and biological visual systems.
- Tags
-