Consensus sequences improve PSI-BLAST through mimicking profile-profile alignments.

TitleConsensus sequences improve PSI-BLAST through mimicking profile-profile alignments.
Publication TypeJournal Article
Year of Publication2007
AuthorsPrzybylski, D, Rost, B
JournalNucleic Acids Res
Volume35
Issue7
Pagination2238-46
Date Published2007
ISSN1362-4962
KeywordsAmino Acid Sequence, Amino Acid Substitution, Base Sequence, Consensus Sequence, Sequence Alignment, Sequence Analysis, Protein, Software
Abstract

Sequence alignments may be the most fundamental computational resource for molecular biology. The best methods that identify sequence relatedness through profile-profile comparisons are much slower and more complex than sequence-sequence and sequence-profile comparisons such as, respectively, BLAST and PSI-BLAST. Families of related genes and gene products (proteins) can be represented by consensus sequences that list the nucleic/amino acid most frequent at each sequence position in that family. Here, we propose a novel approach for consensus-sequence-based comparisons. This approach improved searches and alignments as a standard add-on to PSI-BLAST without any changes of code. Improvements were particularly significant for more difficult tasks such as the identification of distant structural relations between proteins and their corresponding alignments. Despite the fact that the improvements were higher for more divergent relations, they were consistent even at high accuracy/low error rates for non-trivially related proteins. The improvements were very easy to achieve; no parameter used by PSI-BLAST was altered and no single line of code changed. Furthermore, the consensus sequence add-on required relatively little additional CPU time. We discuss how advanced users of PSI-BLAST can immediately benefit from using consensus sequences on their local computers. We have also made the method available through the Internet (http://www.rostlab.org/services/consensus/).

DOI10.1093/nar/gkm107
Alternate JournalNucleic Acids Res.
PubMed ID17369271
PubMed Central IDPMC1874647
Grant ListR01-LM07329-01 / LM / NLM NIH HHS / United States
U54-GM074958-01 / GM / NIGMS NIH HHS / United States