902

Views

CrossRef citations to date

Altmetric

Original Articles

Modelling search for people in 900 scenes: A combined source model of eye guidance

Krista A. Ehinger Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA

Barbara Hidalgo-Sotelo Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USAView further author information

Antonio Torralba Computer Science and Artificial Intelligence Laboratory, and Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA

Aude Oliva Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USACorrespondence[email protected]

Abstract

How predictable are human eye movements during search in real world scenes? We recorded 14 observers’ eye movements as they performed a search task (person detection) in 912 outdoor scenes. Observers were highly consistent in the regions fixated during search, even when the target was absent from the scene. These eye movements were used to evaluate computational models of search guidance from three sources: Saliency, target features, and scene context. Each of these models independently outperformed a cross-image control in predicting human fixations. Models that combined sources of guidance ultimately predicted 94% of human agreement, with the scene context component providing the most explanatory power. None of the models, however, could reach the precision and fidelity of an attentional map defined by human fixations. This work puts forth a benchmark for computational models of search in real world scenes. Further improvements in modelling should capture mechanisms underlying the selectivity of observers’ fixations during search.

Keywords:

Acknowledgements

The authors would like to thank two anonymous reviewers and Benjamin Tatler for their helpful and insightful comments on an earlier version of this manuscript. KAE was partly funded by a Singleton graduate research fellowship and by a graduate fellowship from an Integrative Training Program in Vision grant (T32 EY013935). BH-S was funded by a National Science Foundation Graduate Research Fellowship. This work was also funded by an NSF CAREER award (0546262) and a NSF contract (0705677) to AO, as well as an NSF CAREER award to AT (0747120). Supplementary information available on the following website: http://cvcl.mit.edu/SearchModels

Notes

¹The complete dataset and analysis tools will be made available at the authors’ website.

²See additional figures on authors’ website for distribution of targets and fixations across all images in the database.

³In our validation set, the best exponent for the saliency map was .025, which is within the optimal range of .01–.3 found by Torralba et al. (2006).

⁴See people detector code at http://pascal.inrialpes.fr/soft/olt/

⁵See the authors’ website for details and results from the other implementations.

⁶See the authors’ website for the detection curves of the other model implementations.

⁷See the authors’ website for a comparison of the ROC curves of the target features model and the target oracle.

Additional information

Notes on contributors

Barbara Hidalgo-Sotelo

KAE and BH-S contributed equally to the work

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Modelling search for people in 900 scenes: A combined source model of eye guidance

Notes on contributors

Barbara Hidalgo-Sotelo

Related Research Data

Information for

Open access

Opportunities

Help and information

Modelling search for people in 900 scenes: A combined source model of eye guidance

Abstract

Acknowledgements

Notes

Additional information

Notes on contributors

Barbara Hidalgo-Sotelo

Reprints and Corporate Permissions

Academic Permissions

Related Research Data

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature