* corresponding author: CUBIC, Columbia Univ, Dept Biochemistry & Mol Biophysics, 630 West 168th Street, New York, NY 10032
contact e-mail:rost@columbia.edu
We still cannot predict protein structure from sequence, in general.
But, we can do much better in predicting simplified aspects of
structure. Particularly, the field of secondary structure has
been revived by a break-through that has been achieved by a combination
of elaborated algorithms and evolutionary information available
in ever growing data bases. Some of the new, third generation
methods for secondary structure prediction are clearly superior
to previous methods: b-strands are
predicted more accurately; predicted segments look like those
observed; and the overall accuracy is about ten percentage points
higher than for methods from previous generations. Performance
can be improved even further by using these methods in an 'expert'
rather than in an 'automatic' mode.
The sequence-structure gap is rapidly increasing. Currently, databases for protein sequences (e.g. SWISS-PROT × [1] ) are expanding rapidly, largely due to large-scale genome sequencing projects: at the beginning of 1998, we know already all sequences for a dozen of entire genomes [2] . This implies that despite significant improvements of structure determination techniques the gap between the number of protein structures in public databases (PDB [3] ), and the number of known protein sequences is increasing. The most successful theoretical approach to bridging this gap is homology modelling. It effectively raises the number of 'known' 3D structures from 7,000 to over 50,000 [4, 5] .
No general prediction of structure from sequence, yet. John Moult (CARB, Washington) has initiated an important experiment: those who determine protein structures submitted the sequences of proteins for which they were about to solve the structure to a 'to-be-predicted' database; for each entry in that database predictors could send in their predictions before a given deadline (the public release of the structure); finally, the results were compared, and discussed during a workshop (in Asilomar, California). The results of the first two CASP experiments [6, 7] demonstrated clearly that we can still not predict structure from sequence.
Simplifying the structure prediction problem. The rapidly growing sequence-structure gap has enticed theoreticians to solve simplified prediction problems [4] . An extreme simplification is the prediction of protein structure in one dimension (1D), as represented by strings of, e.g., secondary structure, or residue solvent accessibility. Theoreticians are lucky in that a simplified predictions in 1D (e.g. secondary structure, or solvent accessibility [8, 4, 9] ) - even when only partially correct - are often useful, e.g., for predicting protein function, or functional sites.
In this review we focus on recent secondary structure prediction
methods (for reviews on older methods [10, 11, 12, 13, 14, 15, 16, 17] ,
for reviews on other prediction methods in 1D [18, 4, 5] ).
We shall present some of the new, successful concepts and a few
'hints for the user' based on the currently most widely used secondary
structure prediction method: PHD.
Assignment of secondary structure. Secondary structure is most often assigned automatically based on the hydrogen bonding pattern between the backbone carbonyl and NH groups (e.g. by DSSP [19] ). DSSP distinguishes eight secondary structure classes which are often grouped into three classes: H = helix, E = strand, and L = non-regular structure. Typically the grouping is as follows: 'H' (a-helix) -> H, 'G' (310-helix) -> H, 'I' (p-helix) -> H, 'E' (extended strand) -> E, and 'B' (residue in isolated b-bridge) -> E, 'T' (turn) -> L, 'S' (bend) -> L, ' ' (blank = other) -> L, with the 'corrections': 'B ' -> EE, but 'B_B' -> LLL. Note developers often use different projections of the eight DSSP classes onto three predicted classes are; most of these yield seemingly higher levels of prediction accuracy. For example, short helices are more difficult to predict ( [20] , see also Fig. 5), thus, converting 'GGG' -> LLL results, on average, in higher levels of prediction accuracy.
Per-residue prediction accuracy. The simplest and most widely used score is the three-state per-residue accuracy giving the percentage of correctly predicted residues predicted correctly in either of the three states: helix, strand, other:
(1)where ci is the number of residues predicted correctly in state i (H, E, L), and N is the number of residues in the protein (or in a given data set). As typical data sets contain about 32% H, 21% E, and 47% L, correct prediction of the non-regular class tends to dominate the three-state accuracy. More fine-grained methods that avoid this shortcoming are defined in detail elsewhere [21, 22] .
Per-segment prediction accuracy. Measures for single-residue accuracy do not completely reflect the quality of a prediction [23, 24, 25, 26, 22, 14] . There are three simple measures for assessing the quality of predicted secondary structure segments: (1) the number of segments in the protein, (2) the average segment length, and (3) the distribution of the number of segments with length [27] . These measures are related. They are useful in characterising prediction methods, in particular, methods with fairly high per-residue accuracy, yet an unrealistic distribution of segments. However, more elaborated score base on the overlap between predicted and observed segments [22] .
Conditions for evaluating sustained performance. A systematic testing of performance is a pre-condition for any prediction to become reliably useful. For example, the history of secondary structure prediction has partly been a hunt for highest accuracy scores, with over-optimistic claims by predictors seeding the scepticism of potential users. Given a separation of a data set into a training set (used to derive the method) and a test set (or cross-validation set, used to evaluate performance), a proper evaluation (or cross-validation) of prediction methods needs to meet four requirements. (1) No significant pairwise sequence identity between proteins used for training and test set, i.e., < 25% (length-dependent cut-off [28] ). (2) All available unique proteins should be used for testing, since proteins vary considerably in structural complexity; certain features are easy to predict, others harder. (3) No matter which data sets are used for a particular evaluation, a standard set should be used for which results are also always reported. (4) Methods should never be optimised with respect to the data set chosen for final evaluation. In other words, the test set should never be used before the method is set up.
Number of cross-validation experiments of NO meaning. Most methods are evaluated in n-fold cross-validation experiments (splitting the data set into n different training and test sets). How many separations should be used, i.e., which number of n yields the best evaluation? A misunderstanding is often spread in the literature: the more separations (the larger n ) the better. However, the exact number of n is not important provided the test set is representative, comprehensive and the cross-validation results are NOT miss-used to again change parameters. In other words, the choice of n is of no meaning for the user.
1st generation: single residue statistics. The first
experimentally determined 3D structures of haemoglobin and myoglobin
were published in 1960 [29, 30] . Almost a decade before,
Pauling and Corey suggested an explanation for the formation of
certain local conformational patterns like a-helices
and b-strands [31, 32] . Shortly
later (and still prior to the first published structure), the
first attempt was made to (positively!) correlate the content
of a certain amino acids (e.g. Proline) with the content of a-helix
[33] . The idea was expanded to correlate the content for
all amino acids with that of the a-helix
and the b-strand structure [34, 35] .
The field of predicting secondary structure had been opened.
Most method of the first generation based on single residue statistics,
i.e., from the limited data bases evidence was extracted for the
preference of particular residues for particular secondary structure
states [36, 37, 38, 39, 40, 41, 42, 43, 44] . By 1983, it
became clear that the performance accuracy had been overstated
[45] (
Fig. 1 ).
Fig. 1. Three-state per-residue accuracy of various prediction methods. Shaded bars: methods of 1st and 2nd generation; filled bars: methods of 3rd generation. The left axis showed the normalised three-state per-residue accuracy, for which a random prediction would rate 0%, and an optimal prediction by homology modelling would rate as 100% (un-normalised values according to eq. 1 shown on right axis):
(F1)
= 88.4% ,
and
= 35.2%
Only methods were included for which the accuracy had been compiled
based on comparable data sets, the sets in particular are: K&S62
, 62 proteins taken from [45] ; LPAG60 , 60 proteins
taken from [122] ; Pre124 , 124 unique proteins taken
from [48] . The methods were: C+F Chou & Fasman
(1st generation) [42, 148] ; Lim (1st) [43] ; GORI
(1st) [53] ; Schneider (2nd) [87] ; ALB
(2nd) [62] ; GORIII (2nd) [54] ; LPAG (3rd)
[122] ; COMBINE (2nd) [17] ; S83 (2nd) [86] ;
NSSP (3rd) [84] ; PHD (3rd) [48] . Most
values were re-compiled, only those for NSSP and LPAG were taken
from the original publications. The scores for PHD on the three
different data sets illustrated that data sets can be tuned to
give more optimistic (LPAG62), or more realistic estimates for
prediction accuracy. The first two structure prediction contests
have indicated that the most conservative estimates of this graph
(Pre124) tend to be slightly too optimistic, still: e.g. PHD rates
at an average accuracy of about 72% (as originally estimated [48, 18] ).
2nd generation: segment statistics. The principal improvement of the 2nd generation of prediction tools was a combination of a larger data base of protein structure and the usage of statistics based on segments: typically 11-21 adjacent residues are taken from a protein and statistics are compiled to evaluate how likely the residue central in that segment is to be in a particular secondary structure state. Similar segments of adjacent residues were also used to base predictions on more elaborated algorithms, some of which were spun off from artificial intelligence [46] . Almost any algorithm has meanwhile been applied to the problem of predicting secondary structure; all were limited to accuracy levels slightly higher than 60% ( Fig. 1 ; reports of higher levels of accuracy were usually based on too small, or non-representative data sets [21, 47, 25, 48] ). The mainly used algorithms based on: (i) statistical information [49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61] ; (ii) physico-chemical properties [62] ; (iii) sequence patterns [63, 64, 65] ; (iv) multi-layered (or neural) networks [66, 67, 68, 69, 70, 71, 72, 73] ; (v) graph-theory [74, 75] ; (vi) multivariate statistics [76, 77] ; (vii) expert rules [78, 79, 80, 75, 81, 82] ; and (viii) nearest-neighbour algorithms [83, 84, 85] .
Problems with 1st and 2nd generation methods. All methods from the first and second generation shared, at least, two of the following problems (most all three):
(1) three-state per-residue accuracy was below 70%,
(2) b-strands were predicted at levels of 28-48%, i.e., only slightly better than random,
(3) predicted helices and strands were too short.
The first problem (<100% accuracy) has two sources: (i) secondary
structure assignments differ even between different crystals of
the same protein, and (ii) secondary structure formation is partially
determined by long-range interactions, i.e., by contacts between
residues that are not visible by any method based on segments
of 11-21 adjacent residues. The second problem (b-strands
< 50% accuracy) has been explained by the fact that b-sheet
formation is determined by more non-local contacts than is a-helix
formation. The third problem was basically overlooked by most
developers (exceptions: [86, 87] ). This problem makes predictions
very difficult to use, in practice (
Fig. 2 .). As we shall show
in the next paragraph: some of the prediction methods of the third
generation address all three problems simultaneously, and are
clearly superior to the old methods (
Fig. 1 ). Nevertheless, many
of the secondary structure prediction methods available today
(e.g. in GCG [88] , or from internet services [89] ) are
unfortunately still using the dinosaurs of secondary structure
prediction.
Fig. 2. Example for typical secondary structure prediction of the 2nd generation.
The protein sequence
(SEQ ) given was the SH3 structure [131] . The observed
secondary structure (OBS ) was assigned by DSSP [19]
(H = helix; E = strand; blank = non-regular structure; the dashes
indicate the continuation of the 2nd strand that was missed by
DSSP). The typical prediction of too short segments (TYP
) poses the following problems in practice. (i) Are the residues
predicted to be strand in segments 1, 5, and 6 errors, or should
the helices be elongated? (ii) Should the 2nd and 3rd strand
be joined, or should one of them be ignored, or does the prediction
indicate two strands, here? Note: the three-state per-residue
accuracy is 60% for the prediction given.
3.2.1. Evolutionary Odyssey Informative?
Variation in sequence space. The exchange of a few residues can already destabilise a protein [90] . This implies that the majority of the 20N possible sequences of length N form different structures. Has evolution really created such an immense variety? Random errors in the DNA sequence lead to a different translation of protein sequences. These 'errors' are the basis for evolution. Mutations resulting in a structural change are not likely to survive, since the protein can no longer function appropriately. Furthermore, the universe of stable structures is not continuous: minor changes on the level of the 3D structure may destabilise the structure (due to high complexity). Thus, residue exchanges conserving structure are statistically unlikely. However, the evolutionary pressure to conserve function has led to a record of this unlikely event: structure is more conserved than sequence [91, 92, 93] . Indeed, all naturally evolved protein pairs that have 35 of 100 pairwise identical residues have similar structures [28, 94] . However, the attractors of protein structures are larger, even: the majority of protein pairs of similar structures has levels of below 15% pairwise sequence identity [95, 96] .
Long-range information in multiple sequence alignments. The residue substitution patterns observed between proteins of a particular family, i.e., changes that conserved structure, are highly specific for the structure of that family. Furthermore, multiple alignments of sequence families, implicitly also contain information about long-range interactions: suppose residues i and i + 100 are close in 3D, then the types of amino acids that can be exchanged (without changing structure) at position i are constrained by that their physico-chemical characteristics have to fit the amino acid types at position i + 100 [97, 98] .
3.2.2. Can Evolutionary Information Be Used?
Expert predictions: visual use of alignment information. The first method that used information from family alignments was proposed in the 70's already [99] . Furthermore, experts have based single-case predictions successfully on multiple alignments [100, 99, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116]
Automatic use of alignment information. The simplest
way to use alignment information automatically has been proposed
first by Maxfield & Scheraga and by Zvelebil et al. [117, 118] :
predictions were compiled for each protein in an alignment, and
then averaged over all proteins. A slightly more elaborated way
of automatically using evolutionary information is to directly
base prediction on a profile compiled from the multiple sequence
alignment [21, 48, 18] . The following steps are applied in
particular for the PHD method [119, 18] (
Fig. 3 ).
(1) A sequence of unknown structure (U ) is quickly (typically
by Blast [120] ) aligned against the data base of known
sequences (i.e. no information of structure required!). (2) Proteins
with sufficient sequence identity to U to assure structural
similarity are extracted and re-aligned by a multiple alignment
algorithm MaxHom [121] . (3) For each position the
profile of residue exchanges in the final multiple alignment is
compiled, and used as input to a neural network.
Fig. 3. Using evolutionary information to predict secondary structure.
Starting from a sequence of
unknown structure (SEQUENCE ) the following steps are required
to finally feed evolutionary information into the PHD
neural networks (upper right): (1) a data base search for homologues
(method Blast [120] ), (2) a refined profile-based
dynamic-programming alignment of the most likely homologues (method
MaxHom [121] ), (3) a decision for which proteins will
be considered as homologues (length-depend cut-off for pairwise
sequence identity [91, 28] ),
and (4) a final refinement, and
extraction of the resulting multiple alignment. Numbers 1-3 indicate
the points where users of the PredictProtein service [18]
can interfere to improve prediction accuracy without changes made
to the final prediction method PHD .
3.2.3. 3rd Generation: Evolution To Better Predictions
Example chosen: PHD. We illustrated the principle concepts of 3rd generation methods based on the particular neural network-based method PHD because it is currently the most accurate method [7] , and because most of these concepts were introduced by this method [21, 48] . Meanwhile, several other methods have reported and/or achieved similar levels of performance [122, 21, 114, 48, 84, 123, 124, 16, 125, 126, 127, 18, 128, 129] .
Multiple levels of computations. PHD processes the input information on multiple levels (neural network in Fig. 3 ). The first level is a feed-forward neural network with three layers of units (input, hidden, and output). Input to this first level sequence-to-structure network consists of two contributions: one from the local sequence, i.e., taken from a window of 13 adjacent residues, and another from the global sequence. Output of the first level network is the 1D structural state of the residue at the centre of the input window. The second level is a structure-to-structure network. The next level consists of an arithmetic average over independently trained networks (jury decision). The final level is a simple filter.
Balanced predictions by balanced training. The distribution
of the training examples (known structures) is rather uneven:
about 32% of the residues are observed in helix, 21% in strand,
and 47% in loop. Choosing the training examples proportional
to the occurrence in the data set (unbalanced training), results
in a prediction accuracy that mirrors this distribution, e.g.,
strands are predicted inferior to helix or loop [21, 48, 20] .
A simple way around the data base bias is a balanced training:
at each time step one example is chosen from each class, i.e.,
one window with the central residue in a helix, one with the central
residue in a strand and one representing the loop class. This
training results in a prediction accuracy well balanced between
the output states (
Fig. 4 ).
Fig. 4. Prediction balanced between three secondary structure states.
The pies were valid for a
simple neural network prediction not using evolutionary information
+ D) all correctly predicted residues, (B)
all residues in a representative subset of PDB, and (C)
all residues presented during balanced training. The basic message
is that the prediction of strand is not inferior to the one for
helix for 2nd generation methods (A)
because strand formation is more dominated by long-range interactions
(as previously argued), but because the data base distributions
differ between the three states (B).
Simply skewing the distribution (C)
resulted in an equally accurate prediction for all three states
(D).
Better segment prediction by structure-to-structure networks.
The first level sequence-to-structure network uses as input
the following information from 13 adjacent residues: (1) the profile
of amino acid substitutions for all 13 residues; (2) the conservation
weights compiled for each column of the multiple alignment; (3)
the number of insertions, and the number of deletions in each
column; (4) the position of the current segment of 13 residues
with respect to the N- and C-term; (5) the amino acid composition;
and (6) the length of the protein). Output consists of three
units coding for helix, strand, and non-regular structure. The
output coding for the second level network is identical to the
one for the first. The dominant input contribution to the second
level structure-to-structure network is the output of the first
level sequence-to-structure network. The reason for introducing
a second level is the following. Networks are trained by changing
the connections between the units such that the error is reduced
for each of the examples successively presented to the network
during training. The examples are chosen at random. Therefore,
the examples taken at time step t and at time step t+1
are usually not adjacent in sequence. This implies that the
network cannot learn that, e.g., helices contain at least three
residues. The second level structure-to-structure network introduces
a correlation between adjacent residues with the effect that predicted
secondary structure segments have length distributions similar
to the ones observed [27] . Problems arise, in particular,
for short segments (Fig 5).
Fig 5. Distribution of segment length.
(A) The number of helical segments
observed (open squares; according to DSSP [19] ) and predicted
(filled triangles; by PHD [18] ) is plotted against their length.
Obviously, most short helices are missed by the prediction.
The inlet zooms on longer helices, revealing that PHD predicts
slightly too long helices. Figures for strand and non-regular
structure are not given, as the observed and predicted distributions
agree relatively well, at least, for longer segments. However,
there are important differences for shorter segments: (B)
plots the differences between the numbers of observed - predicted
segments at given lengths (helices: open squares, strands: filled
triangles, non-regular structure: dashed line with crosses).
In particular, strands of a single residue are overpredicted;
short loop regions and 310 helices (3 residues)
are underpredicted.
3.3.1. Estimates Of Prediction Accuracy
Difference between 60% and 70% accuracy may matter a lot!
Some of the 3rd generation methods for secondary structure prediction
are clearly superior to previous methods: b-strands
are predicted more accurately; predicted segments look like those
observed; and the overall accuracy is about ten percentage points
higher. The advantage in practice is illustrated in
Fig. 6 .
Not only that the 3rd generation method (here PHD) gets most segments
right, but it also enables to focus on more reliably predicted
residues. The reliability index (Rel in
Fig. 6 ) is compiled
as the difference between the output unit with highest value (winner
unit) and the output unit with the next highest value (normalised
to a scale from 0 (low) to 9 (high)). All strongly predicted
residues (* in
Fig. 6 ) are predicted correctly.
Fig. 6. Example for secondary structure prediction of 1st-3rd generation.
The protein sequence (SEQ
) given was the SH3 structure [131] . The observed secondary
structure (OBS ) was assigned by DSSP [19] (H = helix;
E = strand; blank = non-regular structure; the dashes indicated
the continuation of the 2nd strand that was missed by DSSP).
The methods are 1st generation: C+F [42] ; 2nd generation:
GOR [17] (= GORIII), and 3rd generation: PHD
[18] . The levels of three-state accuracy were: C+F = 59%;
GOR = 65%; and PHD = 72%. Whereas the 1st and 2nd generation
methods performed above their average accuracy (Fig. 1) for this
protein, the PHD prediction was average (Fig. 1; Fig. 7). The
strength of the PHD prediction was reflected in the one-digit
reliability index (Rel , 0 = low, 9 = high) correlated
with prediction accuracy. All residues predicted at values of
Rel > 4 (marked by *) were predicted correctly.
Values for expected prediction accuracy are distributions.
Statements such as 'secondary structure is about 90% conserved
within sequence families' [22] refer to averages over distributions.
The same holds for the expected prediction accuracy (
Fig. 7 ).
Such distributions explain why some developers have over-estimated
the performance of their tools using data sets of only tens of
proteins (or even fewer). In general, single sequences yield
accuracy values about ten percentage points lower than multiple
alignments [21, 25, 48] . Note that for most proteins some
helix and strand residues are confused (BAD predictions in
Fig. 7 ).
Fig. 7. Expected variation of prediction accuracy with protein chain.
(A)
Three-state per-residue accuracy (eq. 1; PDB identifier given
for the proteins predicted worst); (B)
percentage of BAD predictions, i.e., residues either predicted
in helix and observed in strand, or predicted in strand and observed
in helix (introduced by [14] ); (B inlet)
cumulative percentage of proteins with BADly predicted residues
(e.g. for 80% of the proteins the percentage of confusing helix
and strand residues is < 7%; however, for only for 30% of all
proteins such a confusion never happened). Given: distributions
(over 721 unique protein chains), averages, and one standard deviation.
Reliability of prediction correlates with accuracy. For
the user interested in a particular protein U, the fact that prediction
accuracy varies with the protein (
Fig. 7 ) implies a rather unfortunate
message: the accuracy for U could be lower than 40%, or it could
be higher than 90% (
Fig. 7 ). Is there any way to provide an estimate
at which end of the distribution the accuracy for U is likely
to be? Indeed, the reliability index correlates with accuracy.
In other words, residues with higher reliability index are predicted
with higher accuracy [21, 48, 18] . Thus, the reliability
index offers an excellent tool to focus on some key regions predicted
at high levels of expected accuracy. Furthermore, the reliability
index averaged over an entire protein correlates with the overall
prediction accuracy for this protein (
Fig. 8 ). (Note however,
that the reliability indices tend to be unusually high for alignments
of sequence families without very divergent sequences.)
Fig. 8. Correlation between reliability and accuracy.
Residues predicted at higher reliability are predicted
more accurately [21, 48, 18] . Here, we plotted the reliability
index averaged over a protein with the overall accuracy for that
protein (A). Even a simple linear
fit (A) provided a reasonably accurate
estimate of the performance: for more than 80% of all proteins
the linear fit yielded estimates in the range of < ± 10%
accuracy (B).
Understandable why certain proteins predicted poorly? For some of the worst predicted proteins, the low level of accuracy could be anticipated from their unusual features, e.g., for crambin, or the antifreeze glycoprotein type III. However, this procedure turned out to be rather arbitrary. First, some proteins with the same 'unusual features' are predicted at high levels of accuracy. Second, occasionally similar proteins are predicted at very different levels of accuracy, e.g. both the phosphotidylinonitol 3-kinase [130] and the Src-homology domain of cytoskeletal spectrin have homologous structure [131] but prediction accuracy varies between less than 40% (pik) and more than 70% (spectrin). None of the conclusions from studying poor predictions has yielded a way to better predictions, yet. Nevertheless, two observations may be added. First, bad alignments (i.e. non-informative and/or falsely aligned residues) result in bad predictions. Second, frequently the BAD predictions ( Fig. 7 ), i.e., the confusion of helix and strand are observed in regions that are stabilised by long range interactions. For example, the peptide around the fourth strand of SH3 ( Fig. 6 ) forms a helix in solution (Luis Serrano, personal communication). Furthermore, helices and strands that are confused despite a high reliability index often have functional properties, or are correlated to disease states (Rost, unpublished data).
3.3.2. Availability of Methods
Internet prediction services for secondary structure, in general. Programs for the prediction of secondary structure available as internet services have mushroomed since the first prediction service PredictProtein went on line in 1992 [119, 132] (a list of links in [133] ). Unfortunately, not all services are sufficiently tested. In general, prediction accuracy is significantly superior if predictions are based on multiple alignments [13, 16, 4] .
Completely vs. almost automatic. The PHD prediction
method is automatically available via the internet service PredictProtein
[18] (send the word help to PredictProtein@EMBL-Heidelberg.DE,
or use the WWW interface [132] ). Users have the choice between
the fully automatic procedure taking the query sequence through
the entire cycle, or expert intervention into the generation of
the alignment. Indeed, without spending much time the result
was that predictions could be improved easily [134] .
The following notes result from the experiences one of us (BR) has gathered by offering, and running the PredictProtein [132] service and during various structure prediction workshops [135] . Some comments apply in particular to the PHD methods [18, 136] ; however, most hold also for using other secondary structure prediction methods (we strongly recommend reading the detailed 'hints' on the PredictProtein WWW page: [132] ).
How accurate are the predictions ? The expected levels of accuracy (Q3 = 72±11%) are valid for typical globular, water-soluble proteins when the multiple alignment contains many and diverse sequences. High values for the reliability indices indicate more accurate predictions. (Note: for alignments with little variation in the sequences, the reliability indices adopt misleadingly high values.) PHD predictions tend to be relatively accurate for porins [18] ; however, for helical membrane proteins other programs ought to be used [18, 136, 5] .
How useful are the predictions? The prediction of secondary structure can be accurate enough to assist chain tracing. Furthermore, PHD predictions are being used as a starting point for modelling 3D structure and predicting function [137, 138, 139, 140, 123, 115, 141, 116, 142, 143] .
Confusion between strand and helix? PHD (as well as other methods) focuses on predicting hydrogen bonds. Consequently, occasionally strongly predicted (high reliability index) helices are observed as strands and vice versa ( Fig. 7 B).
Strong signal from secondary structure caps? The ends of helices and strands contain a strong signal. However, on average PHD predicts the core of helices and strands more accurately than the caps [20] . This seems to also hold for other methods.
Internal helices predicted poorly? Steven Benner has indicated that internal helices are difficult to predict [107, 24] . On average, this is not the case for PHD predictions [144] .
What about protein design and synthesised peptides? The PHD networks are trained on naturally evolved proteins. However, the predictions have been useful in some cases to investigate the influence of single mutations (e.g. for Chameleon [145, 146] , or for Janus [147] , Rost, unpublished). For short poly-peptides, users should bear in mind that the network input consists of 17 adjacent residues. Thus, shorter sequences may be dominated by the ends (which are treated as solvent by the current version of PHD).
70% correct implies 30% incorrect. The most accurate methods for predicting secondary structure reach sustained levels of about 70% accuracy. When interpreting predictions for a particular protein it is often instructive to mark the 30% of the residues you suspect to be falsely predicted.
Spread of prediction accuracy. An expected accuracy of 70% does NOT imply that for your protein U 70% of all residues are correctly predicted. Instead, values published for prediction accuracy are averaged over hundreds of unique proteins. An expected accuracy of 70±10% (one standard deviation) implies that, on average, for two thirds of all proteins between 60 and 80% of the residues will be predicted correctly ( Fig. 7 ). Thus, prediction accuracy can be higher than 80% or lower than 60% for your protein. Few methods supply well tested indices for the reliability of predictions ( Fig. 8 ; [18, 134] ). Such indices can help to reduce or increase your trust in a particular prediction.
Special classes of proteins. Prediction methods are usually derived from knowledge contained in proteins from subsets of current data bases. Consequently, they should not be applied to classes of proteins not included in these subsets, e.g., methods for predicting helices in globular proteins are likely to fail when applied to predict transmembrane helices. In general, results should be taken with caution for proteins with unusual features, such as proline-rich regions, unusually many cysteine bonds, or for domain interfaces.
Better alignments yield better predictions. Multiple alignment-based predictions are substantially more accurate than single sequence-based predictions. How many sequences do you need in your alignment for an improvement; and how sensitive are prediction methods to errors in the alignment? The more divergent sequences contained in the alignment, the better (two distantly related sequences often improve secondary structure predictions by several percentage points). Regions with few aligned sequences yield less reliable predictions. The sensitivity to alignment errors depends on the methods, e.g., secondary structure prediction is less sensitive to alignment errors than accessibility prediction.
Better + worse = even better? Today, several automatic services accomplish secondary structure predictions. Some users fall into the what-is-common-is-correct trap, i.e., they average over all prediction methods and consider identical regions as more reliable. Exceptionally, such a majority vote may be beneficial. However frequently, the result will be the worst-of-all prediction. Often, it is preferable to use reliability indices provided by some methods. Such indices answer the question: how reliably is the tryptophan at position 307 predicted in a surface loop? (Note: the correlation between such indices and prediction accuracy is sufficiently tested for a few methods, only.)
1D structure may or may not be sufficient to infer 3D structure.
Say you obtain as prediction for regular secondary structure:
helix-strand-strand-helix-strand-strand (H-E-E-H-E-E). Assume,
you find a protein of known structure with the same motif (H-E-E-H-E-E).
Can you conclude that the two proteins have the same fold? Yes,
and no, your guess may be correct, but there are various ways
to realise the given motif by completely different structures.
For example, at least, 16 structurally unrelated proteins contain
the secondary structure motif 'H-E-E-H-E-E'.