Fast, recurrent, attentional modulation improves saliency representation and scene recognition

Abstract

The human brain uses visual attention to facilitate object recognition. Traditional theories and models envision this attentional mechanism either in a pure feedforward fashion for selection of regions of interest or in a top-down task-priming fashion. To these well-known attentional mechanisms, we add here an additional novel one. The approach is inspired by studies of biological vision pertaining to the asynchronous timing of feedforward signals among different early visual areas and the role of recurrent connections from short latency areas to facilitate object recognition [7]. It is suggested that recurrence elicited from these short latency dorsal areas improves the slower feedforward processing in the early ventral areas. We therefore propose a computational model that simulates this process. To test this model, we add such fast recurrent processes to a well-known model of feedforward saliency, AIM [6] and show that those recurrent signals can modulate the output of AIM to improve its utility in recognition by later stages. We further add the proposed model to a back-propagation neural network for the task of scene recognition. Experimental results on standard video sequences show that the discriminating power of the modulated representation is significantly improved, and the implementation consistently outperforms existing work including a benchmark system that does not include recurrent refinement. © 2011 IEEE.

Publication
IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
Date
Links
DOI

cited By 6