Bottom - Index of papers - Paper in HTML - Abstract - CUBIC

Title: Annotating protein function through lexical analysis
Author: Burkhard Rost
Quote: AI Magazine, 25, 45-56

CUBIC papers: abstract for
Annotating protein function through lexical analysis

We now know the entire genomes for over 100 organisms. The experimental characterisation of the newly sequenced proteins is deemed to lack behind this explosion of raw sequences (sequence-function gap). The rate at which expert annotators add experimental information into more or less controlled vocabularies of databases snails along at even slower pace. Most methods that annotate protein function exploit sequence similarity by transferring experimental information for homologues. A crucial development aiding such homology-based information transfer are large-scale, work- and management-intensive projects venturing to develop a comprehensive ontology for protein function, like the Gene Ontology project. In parallel, fully- or semi-automatic methods have successfully begun to mine the existing data through lexical analysis. Some of these tools target parsing controlled vocabularies from databases; others dare mining free texts from MEDLINE abstracts or full scientific papers. Automated text analysis has become a rapidly expanding discipline in bioinformatics. A few of these text-based tools have already been embedded into research projects.

 



Top - Index of papers - Paper in HTML - Abstract - CUBIC