Bottom - Index of papers - Paper in HTML - Abstract - Paper as PDF - CUBIC

Title: Inferring sub-cellular localization through automated lexical analysis
Author:Rajesh Nair & Burkhard Rost
Quote: Bioinformatics, 2002, 11, 2836-2847 (ISMB'2002 Proceedings).

CUBIC papers: abstract for
Title

Motivation: The SWISS-PROT sequence database contains keywords of functional annotations for many proteins. In contrast, information about the sub-cellular localization is only available for few proteins. Experts can often infer localization from keywords describing protein function. We developed LOCkey, a fully automated method for lexical analysis of SWISS-PROT keywords that assigns sub-cellular localization. With the rapid growth in sequence data, the biochemical characterisation of sequences has been falling behind. Our method may be a useful tool for supplementing functional information already automatically available.

Results: The method reached a level of more than 82% accuracy in a full cross-validation test. Due to a lack of functional annotations, we could infer localization for less than half of all proteins in SWISS-PROT. We applied LOCkey to annotate five entirely sequenced proteomes, namely Saccharomyces cerevisiae (yeast), Caenorhabditis elegans (worm), Drosophila melanogaster (fly), Arabidopsis thaliana (plant) and a subset of all human proteins. LOCkey found about 8000 new annotations of sub-cellular localization for these eukaryotes.

Availability: Annotations of localization for eukaryotes at: http://cubic.bioc.columbia.edu/services/LOCkey.

Contact: rost@columbia.edu

Key words: genome sequence analysis, predicting sub-cellular localization, protein function, lexical analysis.

 



Top - Index of papers - Paper in HTML - Abstract - CUBIC