Inferring sub-cellular localization through automated lexical analysis.

TitleInferring sub-cellular localization through automated lexical analysis.
Publication TypeJournal Article
Year of Publication2002
AuthorsNair, R, Rost, B
Volume18 Suppl 1
Date Published2002
KeywordsAbstracting and Indexing as Topic, Algorithms, Animals, Cellular Structures, Databases, Protein, Humans, Information Storage and Retrieval, Natural Language Processing, Pattern Recognition, Automated, Proteins, Sequence Analysis, Protein, Tissue Distribution, Vocabulary, Controlled

MOTIVATION: The SWISS-PROT sequence database contains keywords of functional annotations for many proteins. In contrast, information about the sub-cellular localization is available for only a few proteins. Experts can often infer localization from keywords describing protein function. We developed LOCkey, a fully automated method for lexical analysis of SWISS-PROT keywords that assigns sub-cellular localization. With the rapid growth in sequence data, the biochemical characterisation of sequences has been falling behind. Our method may be a useful tool for supplementing functional information already automatically available.RESULTS: The method reached a level of more than 82% accuracy in a full cross-validation test. Due to a lack of functional annotations, we could infer localization for fewer than half of all proteins in SWISS-PROT. We applied LOCkey to annotate five entirely sequenced proteomes, namely Saccharomyces cerevisiae (yeast), Caenorhabditis elegans (worm), Drosophila melanogaster (fly), Arabidopsis thaliana (plant) and a subset of all human proteins. LOCkey found about 8000 new annotations of sub-cellular localization for these eukaryotes.

Alternate JournalBioinformatics
PubMed ID12169534
Grant List1-P50-GM62413-01 / GM / NIGMS NIH HHS / United States
R01-GM63029-01 / GM / NIGMS NIH HHS / United States