201
Views
3
CrossRef citations to date
0
Altmetric
Articles

Using Semantic Context to Rank the Results of Keyword Search

, , , , , , , , & show all
Pages 725-741 | Published online: 09 Jul 2018
 

ABSTRACT

In an empirical user study, we assessed two approaches to ranking the results from a keyword search using semantic contextual match based on Latent Semantic Analysis. These techniques involved searches initiated from words found in a seed document within a corpus. The first approach used the sentence around the search query in the document as context while the second used the entire document. With a corpus of 20,000 documents and a small proportion of relevant documents (<0.1%), both techniques outperformed a conventional keyword search on a recall-based information retrieval (IR) task. These context-based techniques were associated with a reduction in the number of searches conducted, an increase in users’ precision and, to a lesser extent, an increase in recall. This improvement was strongest when the ranking was based on document, rather than sentence, context. Individuals were more effective on the IR task when the lists returned by the techniques were ranked better. User performance on the task also correlated with achievement on a generalized IQ test but not on a linguistic ability test.

Notes

1 This was necessary because our initial analysis of 71% of the TREC-8 collection (i.e., 377211) found that 1.8% of the documents were in fact duplicates.

2 The original TREC-8 description for this topic (number 405) was “What unexpected or unexplained cosmic events or celestial phenomena, such as radiation and supernova outbursts or new comets, have been detected?” while the accompanying narrative was “New theories or new interpretations concerning known celestial objects made as a result of new technology are not relevant”.

3 The additional step of verifying judgments by two assessors was deemed necessary to improve the precision of the set of relevant documents by removing outlier cases of judgments. For example, in the topic of airport security, three participants marked TREC8400-14186 as relevant to airport security even though the document relates only to inspections by the International Atomic Agency on nuclear facilities in North Korea.

4 A separate analysis of the data was conducted using a lenient criterion (where a document was judged as ultimately relevant when at least one of the judges declared it to be relevant); however, the pattern of results was very similar to that reported here based on the strict criteria.

5 RBP has been shown to be robust for common choices of p values in the literature including 0.5. Ferro and Silvello (Citation2015) found good correlation between RBP scores across a range of p values (i.e., 0.5, 0.8, and 0.95) in a reproduction of Moffat and Zobel's (Citation2008) original study.

Additional information

Notes on contributors

Marcus A. Butavicius

Marcus Butavicius is a senior research scientist with the Defence Science and Technology Group and a Visiting Research Fellow in the Psychology Department at the University of Adelaide. His research interests include human–computer interaction and cyber security.

Kathryn Parsons is a researcher with the Defence Science and Technology Group, where she applies psychological principles to human factors and organizational problems. She completed Master of Psychology in 2005. She is an organizational psychologist and Adjunct Lecturer within the School of Psychology at the University of Adelaide.

Agata McCormac is an organizational psychologist who has worked as a research scientist at Defence Science and Technology Group since 2006. Her work focuses on applying cognitive and perceptual psychology principles to human factors and organizational problems.

Simon Dennis is a professor at the Melbourne School of Psychological Sciences. He holds qualifications in computer science, mathematics, and psychology from the University of Queensland and his research interests are in human memory, language processing, information retrieval, and machine learning.

Aaron Ceglar is a visual analytics scientist at the Defence Science and Technology Group and has a PhD in Visual Analytics from Flinders University, where he remains an Adjunct Research Fellow. He has journal and conference publications in visual analytics, data mining, and similar fields.

Derek Weber is a computer scientist at Australia’s Defence Science and Technology Group and is a PhD candidate examining computational propaganda in social media at the University of Adelaide. He has previously worked in imagery and geospatial systems, information visualization, smart meeting room collaboration systems, information retrieval, and text analytics.

Lael Ferguson earned a Bachelor of Applied Science (Mathematics and Computing) from the University of South Australia in 1997 and began working for the Department of Defence in Canberra as a software developer. In 2000, she transferred to the Defence Science Technology Group as a system administrator and software developer.

Kenneth Treharne holds a PhD from Flinders University. His research investigates the design of cognitively natural user interfaces for presenting and interacting with information search results, with a particular focus on the role of mappings between visuospatial features of icons and their symbolic meanings in search space.

Richard Leibbrandt is a research associate in the School of Computer Science, Engineering, and Mathematics, Flinders University. His research interests include cognitive and physiological factors in human–computer interaction, as well as computational approaches to natural language learning.

David Powers is a professor of Computer and Cognitive Science and Flinders University, and director of the Centre for Knowledge and Interaction Technology. He has multiple qualifications including a PhD from the University of New South Wales (1979) in Computational Psycholinguistics. His research focuses on language and learning.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 306.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.