192
Views
26
CrossRef citations to date
0
Altmetric
Research Article

Knowledge Discovery in Biology and Biotechnology Texts: A Review of Techniques, Evaluation Strategies, and Applications

, , &
Pages 31-52 | Published online: 10 Oct 2008
 

Abstract

Arguably, the richest source of knowledge (as opposed to fact and data collections) about biology and biotechnology is captured in natural-language documents such as technical reports, conference proceedings and research articles. The automatic exploitation of this rich knowledge base for decision making, hypothesis management (generation and testing) and knowledge discovery constitutes a formidable challenge. Recently, a set of technologies collectively referred to as knowledge discovery in text (KDT) has been advocated as a promising approach to tackle this challenge. KDT comprises three main tasks: information retrieval, information extraction and text mining. These tasks are the focus of much recent scientific research and many algorithms have been developed and applied to documents and text in biology and biotechnology. This article introduces the basic concepts of KDT, provides an overview of some of these efforts in the field of bioscience and biotechnology, and presents a framework of commonly used techniques for evaluating KDT methods, tools and systems.

Notes

1A confusion matrix (sometimes referred to as contingency table) is used to record and analyze the relationship between two or more variables. Usually, the variables are categorical variables. In the IR case the variables involved are relevancy (‘Doc dk Relevant?’) and retrieval (‘Doc dk Retrieved?’), each associated with the value set {Yes, No}. The cells of the matrix record the frequency of the various value co-occurrences. In , we are looking at individual documents, here the frequency for a particular value combination can either be zero or one.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 751.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.