Table to paper:

Pitfalls of protein sequence analysis

Rost et al., 1996 (Abstract)

Contact: Burkhard Rost (rost@EMBL-Heidelberg.de)


Table: Common pitfalls and ways to avoid them

Mistake Cause Result Fix
over-interpretation of sequence similarity level too low (<30%)

too many gaps

similarity confused with sequence identity

composition bias

incorrectly inferred homology, hence wrong function or wrong structure use thresholds considering gaps, similarity and composition bias
insufficient description of family full family not matching local motifs pattern / homology incorrectly inferred use profiles of sub-family

repeat search with different family members

inaccurate function designation in database errors in original paper

wrong annotation in database

over-interpretation of homology in database

incorrectly inferred function

too detailed predictions for function

check literature

compare several homologues

compare with level of divergence in family

over-interpretation of secondary structure prediction single residue assignments taken too literally wasted time

incorrectly inferred function

consider mainly segment level
over-interpretation of threading trendy

false prediction

prediction not reliable

wrong or partially wrong 3D model use controls appropriate to particular threading method

check functional residues

over-interpretation of 3D model 3D model built on shaky alignment information, or unreliable structures

large loops modelled

wrong choice of modelling and/or checking tools

wrong interpretation of location of residue in 3D structure

wrong prediction of structure-function relationship

do more careful alignment

take into account reliability index for homology models

low level similarity => coarse-grained tools

high level similarity => full atom description tools