Better prediction of sub-cellular localization by combining evolutionary and structural information.

TitleBetter prediction of sub-cellular localization by combining evolutionary and structural information.
Publication TypeJournal Article
Year of Publication2003
AuthorsNair, R, Rost, B
Date Published2003 Dec 1
KeywordsAlgorithms, Amino Acids, Computational Biology, Databases, Protein, Neural Networks (Computer), Protein Structure, Secondary, Protein Transport, Proteins, Reproducibility of Results, Sequence Alignment

The native sub-cellular compartment of a protein is one aspect of its function. Thus, predicting localization is an important step toward predicting function. Short zip code-like sequence fragments regulate some of the shuttling between compartments. Cataloguing and predicting such motifs is the most accurate means of determining localization in silico. However, only few motifs are currently known, and not all the trafficking appears regulated in this way. The amino acid composition of a protein correlates with its localization. All general prediction methods employed this observation. Here, we explored the evolutionary information contained in multiple alignments and aspects of protein structure to predict localization in absence of homology and targeting motifs. Our final system combined statistical rules and a variety of neural networks to achieve an overall four-state accuracy above 65%, a significant improvement over systems using only composition. The system was at its best for extra-cellular and nuclear proteins; it was significantly less accurate than TargetP for mitochondrial proteins. Interestingly, all methods that were developed on SWISS-PROT sequences failed grossly when fed with sequences from proteins of known structures taken from PDB. We therefore developed two separate systems: one for proteins of known structure and one for proteins of unknown structure. Finally, we applied the PDB-based system along with homology-based inferences and automatic text analysis to annotate all eukaryotic proteins in the PDB ( We imagine that this pilot method-certainly in combination with similar tools-may be valuable target selection in structural genomics.

Alternate JournalProteins
PubMed ID14635133