| Title: | Predicting simplified features of protein structure |
| Author: | Dariusz Przybylski & Burkhard Rost |
| Quote: | In: Bioinformatics - From Genomes To Therapies. Thomas Lengauer (ed.), Wiley-VCH: Weinheim, 2007, in Press. |
Predicting simplified features of protein structure
| 1 | Dept. of Biochemistry and Molecular Biophysics, Columbia University, 630 West 168th Street, New York, NY 10032, USA |
| 2 | Columbia University Center for Computational Biology and Bioinformatics (C2B2), 1130 St. Nicholas Avenue Rm 802, New York, NY 10032, USA |
| 3 | North East Structural Genomics Consortium (NESG), Department of Biochemistry and Molecular Biophysics, Columbia University, 630 West 168th Street, New York, NY 10032, USA |
| * | Corresponding author: darek@rostlab.org URL http://www.rostlab.org/ Tel: +1-212-851-4669 |
This article is published in (T Lengauer (ed.) Bioinformatics – From Genomes to Therapies, Wiley-VCH, Weinheim, 2007, pages) Š copyright Wiley-VCH (2007). Wiley-VCH is the only authorized source. All copying of this article including placing on third party websites requires the written permission of the copyright owner.
Protein structures determined much slower than sequences. At the end of 2005 about 30,000 experimentally determined protein three-dimensional (3d) structures are in public databases [1] . At the same time there are almost 40 million genes known [2] and approximately 1.5 million verified [3] protein sequences. This gap between structure and sequence continues to grow: Despite successful efforts at large-scale structure determination (structural genomics [4, 5] ), the rate of new structures (thousands per year) continues to increase much slower than the rate of new sequences (many millions per year). Moreover, experimental structure determination has been largely or entirely unsuccessful for important classes such as cell membrane proteins.
Reliable and comprehensive computations of 3D structures are not yet possible. In principle, we could compute 3D structures from sequences using basic physical principles [6] . However, the complexity of the problem exceeds by far today's computational resources. Speeding up molecular dynamics by a factor of a thousand appears an objective within reach to Schroedinger Inc. While this would undoubtedly yield important insights into the problem, it may still not bring reliable predictions of 3D structures from sequence. Even given infinite CPU resources, another serious obstacle is raised by the minute energy differences between native and unfolded structures (~1kcal/mol). This minute difference along with the uncertainty in estimating constants needed for calculations based on first principles makes it very difficult to find an approximate approach that is both simple and sufficiently accurate. Although we cannot model from sequence, comparative modeling yields rather accurate predictions based on sequence homology to proteins of known structure [7] . Such modeling is based on the fact that proteins with similar sequences usually have similar structures. Assume we know the structure for K, and that we want to predict the structure for U that is sequence-similar to K. Comparative modeling simply predicts U to have the same structure as K and models the structure of U based on the known backbone of K. However, for the majority of protein sequences no sufficiently detailed structural information is available or computable.
Predictions of simplified aspects of 3D structure are often very successful. In the absence of experimental or predicted 3D structures, many researchers concentrate on trying to simplify the problem and predict particular structural features. One of the first well-defined problems was the prediction of protein secondary structure. Progress in this field has been steady and current secondary structure predictions are useful to many biological applications. Techniques that were developed in the context of secondary structure predictions were successfully applied to the prediction of many other aspects of protein structure such as solvent accessibility, inter-residue contact maps, disordered regions, domain organization, and specialized for distinctive cases such as transmembrane regions of proteins.
Regular secondary structure formation is mostly a local process. 3D structures exhibit extensive local conformational regularities known as regular secondary structure. These local structures (most importantly helices and sheets) can be described as ordered arrangements of a polypeptide chain without reference to amino acid type or actual 3D conformations. They are stabilized primarily by hydrogen bonds formed between the atoms present in the polypeptide backbone but interactions with solvent and other protein atoms also play an important role. It is believed that the formation of secondary structure is an important step toward folding. Identifying the rules for packing the elements of secondary structure against each other would afford the derivation of a very limited number of possible stable conformations. Unfortunately, the formation of secondary structure is not entirely a local process. Thus, a perfect prediction of secondary structure without knowledge of non-local information is unlikely. Note that secondary structure can be written in a string of assignments for each residue, i.e. it is essentially a one-dimensional feature of protein 3D structure. (Unfortunately, some authors are lured into misusing the term 2D structure, possibly in response to a misunderstanding of the word secondary.)
Secondary structures can be somehow flexible. Regular secondary structure is a striking, macroscopically visible aspect of 3D structure. However secondary structures are not rigid. Calculations and experiments indicate that structural shifting occurs, especially in surface regions. The adoption of a particular structure may depend on many environmental factors. This is illustrated by the fact that sometimes the secondary structure states differ among various crystals of the same protein as well as various NMR models by as much as 5-15 percent. This variability constrains the upper limit of what we can expect from prediction methods: Arguably levels of about 90% (percentage of residues predicted correctly in either of the three states helix, strand, other). While many residues can be confidently classified into one of the secondary structure types there are also those for which classification is ambiguous. This problem is especially evident at terminal locations of secondary structure elements; it is just another aspect of the observation that protein structures are dynamic objects. Historically, assignments were carried out through visual inspection by experimentalists. That approach introduced a human-based inconsistency. 20 years ago, this inconsistency was first addressed by an objective, automatic assignment method (DSSP, see below). Many such methods followed; they all apply criteria consistent for all proteins but they often differ between each other.
Automatic assignments of secondary structure. The first assignments of protein secondary structure were carried out by Pauling and others [8] even before experimental 3D structures of proteins became available. They were based on intra-backbone hydrogen bonds. One of the first and most popular automatic methods, the Dictionary of Secondary Structure of Proteins (DSSP) [9] used the similar approach. The DSSP method calculates the interaction energy between backbone atoms based on an electrostatic model [9] . It assigns a hydrogen bond if the interaction energy is below a chosen threshold (-0.5 kcal/mol). The structure assignments are defined such that visually appealing and unbroken structures are formed from groups of hydrogen bonds. Another popular automatic assignment method, the STRuctural IDEntification method (STRIDE [10] ) uses phi-psi torsion angles and empirically derived hydrogen bond energy. The parameters used by this method are optimized to reproduce visual assignments provided by experimentalists determining 3D structures and so in effect the method averages out human bias. The method DEFINE [11] assigns secondary structure using C-alpha coordinates. The assignment is carried out through comparison of observed C-alpha distances with those derived from ideal secondary structures. If the distances are within set discrepancy limits then the secondary structure is assigned. The method P-Curve [12] makes assignments based on geometrical analysis of protein curvature. It uses differential geometry based representations of standard structural motifs and through a set of geometrical transformations tries to match these motifs with those found in known 3D structures. P-Curve assignments differ significantly from those based on hydrogen bonds and/or phi-psi torsion angles. The above three assignment methods agree for only about two-thirds of all residues [13] . There are various reasons for disagreements; the most important one may simply be that secondary structure is dynamic, i.e. that there simply is no such thing as a secondary structure state. This problem is reflected in the DSSPcont method that introduces continuous secondary structure assignments [14] . The continuum results from calculations of weighted averages of DSSP assignments that are based on various hydrogen bond energy thresholds. As a result, each protein residue is assigned with likelihoods of all secondary structure states. Residues that have a higher probability for a single state appear to also be more rigid according to NMR measurements of motions on time scales important for protein function [14] . Other, more application-oriented approaches to defining local structures are possible. For example one may try to define a new secondary structure alphabet with a goal of improving fold recognition algorithms [15] . The numerical values of prediction accuracy presented in this chapter are based on the most widely used DSSP assignment. Evaluations based on STRIDE tend to yield higher values, and no state-of-the-art prediction method has been evaluated on P-Curve.
Reduction to three secondary structure states. DSSP distinguishes eight different states: three types of helical structures - alpha-helix ('H', 4-residue period) , pi-helix ('I', 5-residue period), and 310-helix ('G', 3-residue period); extended beta-sheet ('E'); beta-bridge ('B'); turn ('T'); bend ('S') and other non-regular states (blank). Of those, alpha-helix and beta-strand ( Fig. 1 ) comprise more than 50% of all protein residues. Some prediction methods attempt to predict all eight states. However, a widely used strategy is to map the eight states into three major classes - helical, extended and other (often imprecisely referred to as non-regular, coil, or turn). Different maps are possible but the most popular one (which incidentally is most difficult to predict [16] ) is the following: [GHI ] - helical ('h'), [EB] - extended ('e'), [TS ] - non-regular (coil) (' ' ). The alternative translation that results in seemingly higher prediction accuracies: [H] - helical, [E] - extended, [GITS ] - non-regular is sometimes used.
Performance has many aspects relating to many different measures. Depending on the application there are various views as to what constitutes a high-quality prediction. On the one hand, it is important to correctly predict the secondary structure state for each residue (per-residue accuracy); on the other hand, it may be more relevant to predict the coarse-grained presence of e.g. a helix than all residues in the helix (segment-based accuracy). Accordingly, many measures have been used to assess prediction quality: simple percentages of per-residue accuracy ( eqn. 1), Matthew's correlation coefficients, percentage of confusion between strand and helix states [17] ( eqn. 2); simple segment-based measures such as the number of correctly predicted segments, the average ratio of predicted to observed segment lengths, the difference between the distribution of predicted and observed segment lengths [18] ; or the more elaborated and widely used segment overlap score SOV [19, 20] ( eqn. 3). These are only some of the measures that have been applied. In this chapter, we focus on two measures for per-residue accuracy, namely percentages QK ( eqn. 1) and the BAD score ( eqn. 2), and one measure for per-segment accuracy, namely SOV.
Per-residue percentage accuracy: QK. Perhaps the most intuitive and simplest measure for performance is the average percentage of correctly predicted states. For a protein composed of L residues and for K possible secondary structure states the per-residue prediction accuracy QN is defined as:
|
| (Eq. 1) |
where Ci is the number of residues correctly predicted in secondary structure state i. For a three-state alphabet this translates into a Q3 measure. The average accuracy can be computed as an average per protein or an average per residue in which case the number of all residues is used for L.
Per-residue confusion between regular elements: BAD. Not all secondary structure prediction mistakes are equal. For instance, when using secondary structure predictions to model 3D structure, confusing helix and extended (strand) is more detrimental than confusing regular with non-regular states. The percentage of such 'bad' predictions constitutes the BAD score. If L is a total number of amino-acid residues in a protein, and Bh (Be) the number of helical (strand) residues predicted in strand (helix) state then the BAD score is expressed as:
|
| (Eq. 2) |
Two predictions with equal Q3 and/or SOV scores can have very different BAD scores.
Per-segment prediction accuracy: SOV. Regular secondary structure elements are built of continuous stretches of residues belonging to the same state, e.g. most helices are about 10 residues long. It can be argued that mis-predicting 2 residues at either end of a helix is not an important mistake (note: 2+2 out of 10 means 60% accuracy). In contrast, only predicting 60% of the helices in a protein is a severe problem. Such realities are reflected in segment-based measures. The most widely used is the Segment Overlap (SOV) measure [19, 20] :
|
| (Eq. 3) |
where K is the number of different secondary structure types; the second summation is over all overlapping secondary structure segments of observed sobs and predicted spred secondary structure of the same type; minov is the number of positions at which segments overlap; maxov is the number of overlapping positions plus the number of remaining residues from each segment of the given pair; len(sobs) is the length of a reference secondary structure segment (observed experimentally); N is the total number of overlapping segments pairs of the same type; d(sobs,spred) is the accepted variation between segments that assures ratio of 1.0 when the variations between sobs and spred are minor. One can easily envision two different secondary structure predictions that have the same Q3 and different SOV scores. For example, if instead of a observed long helix of length n one prediction consists of a shorter helix of length m and the second prediction comprises two short helices of combined length equal to m (other residues predicted as coil) than the Q3 scores of both predictions are going to be the same while the SOV scores are going to be different.
Generic problems. In this section we describe problems with the evaluation of prediction methods that are entirely generic, i.e. valid for all prediction methods. As much as many ideas and concepts have been introduced to predict secondary structure and have then been used for other purposes, many of the mistakes in comparing methods have also been unraveled first and most clearly for the example of secondary structure predictions. Secondary structure prediction methods may be the only example of publications with claims to performance accuracy that survived more than a decade. (To put this into perspective: our section focuses on methods for which performance has - on average - been unusually well estimated; nevertheless, the only other field that we review for which ANY estimate survived five years was the prediction of accessibility, and the vast majority of publications in that field heavily over-estimated performance!)
Numbers can often not be compared between two different publications. Prediction methods are often published with estimates of performance that are supported by cross-validation experiments. However, the terms cross-validation or the related term jack-knife are by no means sufficiently well-defined to translate into estimate ok. In fact, most publications make some serious mistake as is demonstrated by the simple fact that very few estimates for performance have survived. One problem is the overlap between training and testing set. It is trivial to reach very high performance by training on proteins that are very similar to those in the testing set. There are various strategies that deal with the similarity problem [21, 20] . Another issue is that of using the performance of the test set to choose some parameters by e.g. reporting full cross-validation results for N different parameters and then concluding that the best of those N is the performance of the final method. Instead, performance estimates should always be based on a data set that was not used in ANY step of the development. However, even if we had two publications that both used cross-validation correctly, we still cannot necessarily compare the numbers published by both directly: firstly, both have to have used the same standard of truth (here: the same assignment method, e.g. DSSP, and the same conversion of the eight DSSP states into three prediction classes). Secondly, they both have to have been based on identical test sets. Often, the test sets used by developers are not representative and differ from each other. Proteins vary in their structural complexity and such variation is correlated with prediction difficulty. We could argue that test sets should be frozen (and this has indeed been done in many cases). Such a set should be sufficiently large to allow proper evaluation of statistical differences among methods. Although a sine qua non, this freezing strategy does not suffice: data sets in biology change constantly, almost always more recent sets are more reliable and representative. Therefore, we also need evaluations based on sets that are as recent as possible. One way of merging these two demands is by carrying out two tests: one on a frozen set used by others, and the other on a more recent set. As an aside, it is not necessary to use n-fold cross validation experiments with the largest possible n. The exact value of n is not important as long as the test set is not misused for adjusting a method's parameters and it is representative of the entire structure space.
Appropriate comparisons of methods require large, "blind" data sets. One of the solutions to the problem of comparing methods is to use a sufficiently large test set composed of proteins that were neither used nor are similar to any protein that was used for development of any method. This idea was first realized in the field of structure prediction through the CASP (Critical Assessment of Structure Prediction) experiments in which various prediction methods are tested over the course of a few months on sequences of proteins the 3D structure of which is unknown at the time of the prediction ("blind" prediction). Those experiments evaluate fully automatic methods as well as human experts (see also Chapter 11 for a more detailed description of CASP and CAFASP). Because of a variety of reasons, CASP cannot be based on sufficiently large, representative data sets. Servers that automatically evaluate methods whenever new data is available address this shortcoming. Such servers base their comparisons on thousands instead of tens of test cases (as does CASP). Two such servers exist, EVA and LiveBench. EVA [22] continuously evaluates automatic prediction methods (servers) providing results based on a large, statistically significant and subsequently more representative data sets. One of its principles is to facilitate comparisons on identical sets and to render comparisons on different sets very difficult. Another principle is to never distinguish in the rank between two methods if the difference in their performance is not statistically significant. Both principles are in stark contrast to what most CASP assessors did.
First generation: single-residue statistics. First attempts to correlate amino-acid residue frequency with secondary structure type can be traced to correlating the content of certain amino acids (e.g., Proline) with the content of alpha-helix [23] . This was done even before the first crystallographic structures were available [24, 25] . Attempts to correlate the content of all amino acids with the content of alpha-helix and beta-strand opened the field of secondary structure prediction [26, 27] . The early methods were usually based on single-residue statistics obtained from very limited data sets of known protein structures. As such they were not very accurate ( Fig. 2 ) and in addition their accuracy was overestimated at the time.
|
|
Fig. 1 : Ribbon diagram of protein secondary structure. Secondary structures are local arrangements of the protein backbone without reference to the amino acid type or the three-dimensional conformation. They are stabilized by hydrogen bonds between atoms of the main chain (backbone). Very roughly, secondary structures can be classified into 3 classes: helical (H), extended (E) (strand) and loopy (other) L. The figure contains a schematic representation of E2 DNA-Binding Domain [176] (Protein Data Bank [1] code 1a7g).
|
Second generation: segment statistics. As the number of experimentally determined protein structures grew it became possible to estimate propensities for secondary structure based on consecutive segments of residues. Various numbers of adjacent residues (typically 11-21) were considered in assigning secondary structure to a central residue of a segment. Many different algorithms were applied but they did not achieve per-residue prediction accuracies higher than slightly above 60% ( Fig. 2 ). Reports of higher accuracies were due to small data sets and did not hold for long. The main approaches used were i) statistical information, ii) physicochemical properties, iii) sequence patterns, iv) artificial neural networks, v) graph theory, vi) expert rules, vii) nearest-neighbor algorithms; and hybrid approaches of various algorithms.
|
|
Fig. 2 : Three-state per-residue accuracy of various prediction methods. Included are only those methods for which we could run independent tests. Unfortunately, for most old methods this was not possible. However, for each method we had independent results from PHD (3rd generation, 1993) [37, 104, 38] available. We normalized the differences between data set by simply compiling levels of accuracy with respect to PHD. For comparison, we added the expected accuracy of a random prediction (RAN), and the best currently possible prediction accuracy achieved through comparative modeling of close homologue (PDB98). The methods were: C+F Chou & Fasman (1st generation, 1974) [177, 178] ; Lim (1st, 1974) [179] ; GORI (1st, 1978) [180] ; Schneider (2nd, 1989) [181] ; ALB (2nd, 1983) [182] ; GORIII (2nd, 1987) [183] ; COMBINE (2nd, 1996) [184] ; S83 (2nd, 1983) [185] ; LPAG (3rd, 1993) [186] ; NSSP (3rd, 1994) [187] ; PHDpsi (3rd, 2001) [52] ; JPred2 (3rd, 2000) [51] ; SSpro (3rd, 1999) [188] ; PROF (3rd, 2001) [108] ; PSIPRED (3rd, 1999) [39] .
|
Third generation: evolutionary information. Proteins with similar sequences adopt similar structures [28, 29] . In fact, proteins can change more than 70% of their residues without altering the basic fold [30, 31, 32, 20] . However, the vast majority of possible sequences supposedly do not adopt globular structures, at all. Rather, the exact substitution pattern of which residues can be changed and how, is indicative of particular structural details. Consequently, the evolutionary information contained in sequence alignments can aid structure prediction. In particular this approach improves prediction of beta-strands. For the first and second generation of prediction methods beta-strand prediction was particularly bad (often only slightly better than random). The pioneering method that used alignment information was proposed in the 1970's [33] . The first approaches were based on visual gathering of information from sequence alignments. In one of the first automatic algorithms making use of alignment information [34, 35] the final secondary structure prediction was an average over all predictions compiled for each sequence in the alignment. The first method that succeeded in significantly improving performance by automatically using alignment information was PHD [36, 37, 38] ( Fig. 3 ). This method used a residue profile extracted from a multiple sequence alignment as an input to the artificial neural network. Many other methods used artificial neural networks [39, 40, 41] but various other algorithms were also applied successfully [17, 10, 42, 43, 44, 45, 46] including support vector machines (SVM) [47, 48] , HMMs [49] , nearest neighbor algorithms [43] .
|
Fig. 3 : Using evolutionary information to predict secondary structure. Starting from a sequence of unknown structure (SEQUENCE) the following steps are required to feed evolutionary information into the PROFsec neural networks (upper right): (1, 2) a database search for homologs through iterated PSI-BLAST [189] (protocol from [52] ), (3) a decision for which proteins will be considered as homologs, (4) a reduction of redundancy (purge too many too similar proteins), and (5) a final refinement, and extraction of the resulting multiple alignment. Numbers 1-5 illustrate where users of the PredictProtein server [38, 105] can interact to improve prediction accuracy without changes made to the actual prediction method PROFsec.
|
Recent improvements of third generation methods. PHD tore down what once was a magical wall of 70% accuracy. The mark has been put much higher since. The first significant improvement was achieved by training neural networks on more diverse sequence alignments [39] . The alignments were generated by a new alignment method, PSI-BLAST [50] . It has been shown that a major improvement can be achieved by using previous types of neural networks with PSI-BLAST alignments [51] . Interestingly, it was also shown that a significant part of the improvement was simply due to the growth of sequence databases that resulted in more diverse profiles [52] . In general, the more divergent the alignment the better the prediction can be obtained. The input quality is also dependent on alignment quality. This is especially important for divergent homologous proteins where alignment methods tend to make many mistakes. Yet another simple source of improvement is related to the growth of the database of protein structures [1] . Apart from improvements in alignments, there is a lot of research pursuing development of more sophisticated and accurate algorithms. Those include new network architectures or learning techniques [41, 15, 53, 54, 55] , SVMs [48] and many others.
Meta-predictors improve somehow. Different methods often make different mistakes. As long as those errors are not purely systematic, combining any number of methods can lead to improvements in prediction accuracy [56] . For example, the PHD method utilized this observation by combining differently trained neural networks. Various implementations of the similar concept were used in many other methods [57, 51, 58] . Alternatively, or in addition, different methods can be combined [59, 16, 60, 61, 62, 63, 64] . Overall, combinations of independent methods tend to top the single best method. However, it probably is not beneficial to use all of the available prediction methods in the meta-methods. For example, averaging over all methods evaluated by EVA evaluation server [22, 65] decreased accuracy over the best individual methods (Rost, unpublished). It is not fully straightforward how to decide whether to include a given method or not [64] . Concepts weighing the individual method based on its accuracy and 'entropy' [58] appear successful only for large numbers of methods. More rigorous studies for the optimal combination may provide a better picture. An interesting approach resulted from attempts to improve meta-methods by developing new methods that are algorithmically different from the methods already used [66, 67] . Recently an observation has been made indicating that optimizing meta servers to achieve highest per-residue prediction accuracy is not always beneficial when using the final predictions in various applications [68] . Another issue that has first been introduced for secondary structure prediction is the measurement for the reliability of a prediction. To make an extreme point: a method that has 50% accuracy but that always correctly identifies in which of the cases it is right and it which it errs (before knowing the answer) is more useful than a method with 75% accuracy and no notion about which 25% of the residues are wrong. State-of-the-art method are reliably estimate the reliability of a prediction. This measure is not the case for any of the existing Meta methods.
Average predictions have good quality. Today's best methods reach average levels of almost 78% in Q3 ( eqn. 1) [22, 69] . They are able to accurately predict most segments (SOV scores around 76%). In addition the confusion between helices and strands is low (BAD score of less than 3%).
Prediction accuracy varies among proteins. The standard deviation of three-state-per-residue accuracy computed on the per-protein basis is about 13% [22, 69] ( Fig. 4 ). Thus some of the proteins are predicted very well (above 90%) while others are predicted very badly (even below 40% accuracy levels). The relatively large deviations are also found in prediction quality measured by other measures. The standard deviation of SOV score is about 15%, and BAD score about 5%. In particular, proteins having no sequence homologs (no alignment input) are poorly predicted. This is an important issue for the applicability of secondary structure predictions since badly predicted secondary structure is not very valuable.
|
|
Fig. 4 : Expected variation of prediction accuracy for PROFsec. (A) Three-state per-residue (Q3) and segment overlap (SOV) accuracies; (B) percentage of BAD predictions, i.e., residues either predicted in helix and observed in strand, or predicted in strand and observed in helix (introduced; Given: distributions (over 1198 unique protein chains), averages, and one standard deviation.
|
Reliability of prediction correlates with accuracy. For the user interested in a particular protein U, the fact that the prediction accuracy varies from protein to protein implies a rather unfortunate message: the accuracy for U could be lower than 40%, or it could be higher than 90% ( Fig. 4 ). Is there any way to provide an estimate at which end of the distribution the accuracy for U is likely to be? Indeed, many methods provide numerical estimates of the expected quality of their predictions through so called reliability indices. Those indices correlate with accuracy. In other words, residues with higher reliability index are predicted with higher accuracy [36, 37, 38] . Thus, the reliability index offers an excellent tool to focus on some key regions predicted at high levels of expected accuracy. Furthermore, the reliability index averaged over an entire protein correlates with the overall prediction accuracy for this protein ( Fig. 5 ).
|
|
Fig. 5 : Prediction quality correlates with reliability indices. (A) Average three-state per-residue accuracy and BAD score at different reliability index thresholds (averaged over entire protein) as predicted by PROFsec [108] ; (B) Corresponding values of standard deviation.
|
Understandable why certain proteins predicted poorly? It is not easy to anticipate performance of a secondary structure prediction method based on overall structural features of proteins. However, prediction accuracy is correlated with alignment quality. Poor alignments (i.e. non-informative and/or falsely aligned residues) result in inaccurate predictions. Another interesting observation is that frequently the BAD predictions, i.e., the confusion of helix and strand are observed in regions that are stabilized by long-range interactions. Furthermore, helices and strands that are confused despite a high reliability index often have functional properties, or are correlated to disease states (Rost, unpublished data). Regions predicted with equal propensity in two different states often correlate with 'structural switches'.
Better database searches. Initially, three groups independently applied secondary structure predictions for fold recognition, i.e., the detection of structural similarities between proteins of unrelated sequences [70, 71, 72] . A few years later, almost every other fold recognition/threading method has adopted this concept [73, 74, 75, 76, 77, 78, 79, 80, 81, 82] . Two recent methods extended the concept by not only refining the database search, but by actually refining the quality of the alignment through an iterative procedure [83, 84] . A related strategy has been employed to improve predictions and alignments for membrane proteins [85] . It has also been indicated that prediction mistakes tend to correlate among structurally related proteins [86] and that alignments based on purely predicted secondary structure have comparable quality with those based on matching predicted and observed states. Thus predicted secondary structure may prove useful in searching sequence databases.
1D predictions assist in the prediction of higher dimensional structure. Secondary structure predictions are now accurate enough to be used as input for methods that target the prediction of higher order aspects of protein structure automatically. To just list a few successful applications: Contact map predictions [87] have recently improved the level of accuracy significantly; an important contribution was the inclusion of secondary structure predictions [55] . They also help in the prediction of folding rates [88, 89] . Secondary structure predictions have also become a popular first step toward predicting 3D structure. Ortiz et al. [90] successfully use secondary structure predictions as one component of their 3D structure prediction method. Eyrich et al. [91, 92] minimized the energy of arranging predicted rigid secondary structure segments. Lomize et al. [93] also started from secondary structure segments. Chen et al. [94] suggested using secondary structure predictions to reduce the complexity of molecular dynamics simulations. Levitt et al. [95, 96] combined secondary structure-based simplified presentations with a particular lattice simulation attempting to enumerate all possible folds.
Predicted secondary structure helps annotating function. Secondary structure predictions are also useful to annotate/predict protein function. For example, secondary structure predictions have been used successfully in completely automatic predictions of sub-cellular localization [97] . A more typical use of secondary structure prediction is in aiding experts in finding similarities among proteins with insignificant sequence similarity. In this way functional annotation is sometimes transferred from one protein to another [98] .
Secondary structure based classifications in the context of genome analysis. Proteins can be classified into families based on predicted and observed secondary structure [99, 100] . However, such procedures have been limited to a very coarse-grained grouping only sometimes useful for inferring function. Nevertheless, predictions of membrane helices and coiled-coil regions are crucial for genome analysis. More than one fifth of all eukaryotic proteins appear to have regions longer than 60 residues apparently lacking any regular secondary structure [101] . Most of these regions were not of low-complexity, i.e. not composition-biased. Surprisingly, these regions appeared evolutionarily as conserved as all other regions in the respective proteins. This application of secondary structure prediction may aid in classifying proteins, and in separating domains, possibly even in identifying particular functional motifs.
Regions likely to undergo structural change predicted successfully. Prions and prion-like proteins appear to aggregate through the transition of a regular secondary structure: what is usually a helical region switches to a strand that becomes the root of aggregation in the case of disease mutants. The reliability of the PHD secondary structure predictions combined with experimental evidence gave the first hint where this expected transition might occur [102] . Interestingly, it is still difficult to actually observe the strand in structures of even the mutant prion, while state-of-the-art prediction methods always predict the region with an observed helix to be in a strand. This example casts some light on the importance of transitions and the usefulness of predictions to capture such transitions. Young, Kirshenbaum, Dill & Highsmith [103] have pushed this observation further by unraveling an impressive correlation between local secondary structure predictions and global conditions. The authors monitor regions for which secondary structure prediction methods give equally strong preferences for two different states. Such regions are processed combining simple statistics and expert rules. The final method has been tested on 16 proteins known to undergo structural rearrangements, and on a number of other proteins (one of those was a prion). The authors report no false positives, and identify most known structural switches. Subsequently, the group applied the method to the myosin family identifying putative switching regions that were not known before, but appeared reasonable candidates [103] . This method is remarkable in two ways: (1) it is a very general method using predictions of protein structure to predict some aspects of function, and (2) it illustrates that predictions may be useful even when structures are known (as in the case of the myosin family). While the method is tailored to catch more subtle changes than occur in prions, there is some evidence that amyloid aggregation is also captured to some extent.
Special classes of proteins. Prediction methods are usually derived from knowledge contained in proteins from subsets of current databases. Consequently, they should not be applied to classes of proteins not included in these subsets, e.g., methods for predicting helices in globular proteins are likely to fail when applied to predict transmembrane helices. In general, results should be taken with caution for proteins with unusual features, such as proline-rich regions, unusually many cysteine bonds, or for domain interfaces.
Better alignments yield better predictions. Multiple alignment-based predictions are substantially more accurate than single sequence-based predictions. How many sequences are needed in the alignment for an improvement; and how sensitive are prediction methods to errors in the alignment? The more sequences contained in the alignment diverge, the better (two distantly related sequences often improve secondary structure predictions by several percentage points). Regions with few aligned sequences yield less reliable predictions. The sensitivity to alignment errors depends on the methods, e.g., secondary structure prediction is less sensitive to alignment errors than solvent accessibility prediction.
Internet services are widely available. Programs for the prediction of secondary structure available as internet services have mushroomed since the first prediction service PredictProtein went on line in 1992 [104, 105] . The META-PredictProtein server [106] enables users to access a number of the best prediction methods through one single interface. Unfortunately, not all methods available have been sufficiently tested, and some are not very accurate. This problem is addressed by EVA server that evaluates prediction servers continuously and automatically [22, 69] . In general, prediction accuracy is significantly superior if predictions are based on multiple alignments [107, 44, 38] .
Interactive services. The PHD/PROF prediction methods are automatically available via the internet service PredictProtein [106] . Users have the choice between the fully automatic procedure taking the query sequence through the entire cycle, or expert intervention into the generation of the alignment. Indeed, without spending much time users typically can improve prediction accuracy easily by choosing 'good' alignments. A few of the state-of-the-art methods are also available to run locally. Note however that one crucial step is the generation of appropriate alignments; usually this is not done for you when you run the prediction method locally!
Servers. The following servers are publicly available (most links given by the EVA server): PROFsec [108] , PHDsec [104] , PHDpsi [52] , PSIPRED [39] , SSPRO [109] , PORTER [54] , SABLE [53] , SAM-T02 [49] , Jpred [51] , APSSP, JUFO [110] , PROF [40] , YASPIN [111] .
Transmembrane proteins are an extremely important class of proteins. Approximately 15-30% of all proteins are estimated to contain transmembrane regions [112, 113] . Those proteins are responsible for the communication between the cell and its surroundings and are of great importance to biomedicine. The cell membrane environment, composed of a lipid bilayer, is very different from one found in most cellular compartments. The transmembrane segments of proteins tend to be hydrophobic which enables them to remain within a membrane by avoiding the solvent present at both boundaries. The special features of transmembrane protein sequences serve as the basis for identifying them by computational methods. As in case of globular proteins, the transmembrane segments form regular secondary structures and can be assigned to two broad classes: those composed entirely of helices and those composed of strands (despite ardent searches and putative evidence, we still do not have any proof for the existence of a mixture of the two). By far the largest majority of all membrane proteins appear to be of the helical type [114] . An important characteristic of transmembrane proteins is the orientation of membrane segments with respect to the N-terminus of a protein, often referred to as the topology. Usually, the successful prediction of transmembrane segments requires proper identification of transmembrane regions in sequence, actual prediction of the secondary structure and deciphering the topology. It is very difficult to experimentally determine 3D structures for transmembrane proteins. Despite considerable advances over the last decade, we still have experimental structures, or theoretical models for supposedly less than 10% of, e.g., all human membrane proteins (Punta, Liu & Rost, unpublished). Useful predictions of structural and functional aspects are therefore highly needed.
Prediction methods. Although all known trans-membrane regions constitute of regular secondary structures, most secondary structure prediction methods developed for non-membrane proteins mostly fail to correctly predict membrane regions. Furthermore, very few methods have been developed for proteins with beta-strands in the membrane. The first and most basic methods for helical membrane regions focused on identification of transmembrane segments based simply on residue hydrophobicity [115] . It was observed that positively charged residues are more abundant on the inside of the membrane (the 'positive-in' rule). A simple Kyte-Doolittle hydrophobicity plot [115] can thus provide much information on the presence of such transmembrane segments. This led to the development of the method that predicted positions of helices and the topology of helical membrane proteins [116] . Next, neural networks were applied to better identify transmembrane helices and differentiate between membrane and non-membrane proteins [117] . Among other approaches were Hidden Markov Model methods attempting to match the sequence to the predefined "grammar" of transmembrane proteins [118] (see Chapter 3 for basics on Hidden Markov Models), and many others [119, 120] . Recently, groups have begun to venture into the development of methods that predict membrane regions with beta strands [121, 122, 123, 114] .
Performance. Estimates for the accuracy in predicting membrane regions are extremely problematic because there are so few high-resolution structures available. Consequently, all methods in the past were evaluated by also using low-resolution information from biochemical experiments that provide some evidence for the location of transmembrane regions. Unfortunately, such experiments can be more inaccurate than prediction methods [124] . This was one of the reasons why the performance of prediction methods had been significantly over-estimated by the end of the last millennium [125, 124] . It now appears that the best prediction methods correctly predict all membrane helices for about 50-70% of all proteins; very few methods avoid the confusion between very hydrophobic signal peptides and membrane proteins; and the best methods falsely identify membrane helices in about 10% of all non-membrane proteins [125, 124] . However, results can be far worse, e.g. most hydrophobicity-based methods misclassify over 50% (!) of all globular proteins as containing membrane helices [124] . Over-estimates in publications are also a very serious problem: even over the last few years, methods have been published in prominent journals with estimated levels of above 95% accuracy that failed to reach significantly above 50% and misclassified over 30% of the globular proteins [124] . Note also that there are a few top methods available at the moment; all of these have their own strengths and weaknesses, i.e. there is no single one best method. Predictions of beta-barrel membrane regions currently appear to be more accurate than those for helical membrane regions, however, this may likely turn out to be an over-estimate caused by the fact that we have too limited experimental information.
Servers. There are many more methods than the following available, however, the methods listed here have sustained many evaluations. Helical membrane proteins: PHDhtm [117] , SOSUI [119] , TopPred [116] , TMHMM [118] , DAS [120] ; beta-barrel membrane regions: ProfTMB [114] .
Solvent accessibility somehow distinguishes structurally important from functionally important. In 3D structures of globular proteins some of residues are buried deep inside while others are located on the surface and thus are more exposed to the surrounding solvent. Residues that are more exposed to solvent are also more accessible to other biological agents and consequently are much more likely to be involved in functional interactions which require spatial accessibility such as enzymatic activity, DNA-binding, signal transduction and others. On the other hand, buried residues are much more likely to play important roles in stabilizing structures of proteins. Thus, a good distinction between exposed and buried residues can be very useful to distinguish residues that are important for function (conserved and exposed) from those that are important for structure (conserved and buried).
Measuring solvent accessibility. Solvent accessibility is usually measured in terms of the surface area accessible to water molecules. The values can range from 0 for entirely buried residues to around 300 for the largest residues on the surfaces of proteins. A measure that is not dependent on the size of the amino-acid residue is the relative solvent accessibility expressed as a percentage of the residue surface that is exposed to solvent. It appears that among homologous proteins the relative solvent accessibility is less conserved than secondary structures [126] . In addition the solvent accessibility of protein residues is strongly influenced by non-local interactions, where residues located far away along a protein sequence can be in spatial proximity resulting in mutual screening from solvent. Thus, predicting solvent accessibility appears to be more difficult than prediction of secondary structure. It addition, it was shown that among the evolutionarily related proteins of similar structure buried residues (<10% accessible surface area) tend to be much more highly conserved than highly exposed residues (>60%) [126] . Thus, for methods that use evolutionary information derived from alignments of related proteins it should be easier to closely predict accessibility for buried residues than for the exposed ones. A simplified approach is to try to distinguish between residues below a certain solvent accessibility threshold ('buried') and those above it ('exposed'). There is no biophysical reason to choose one threshold over another, and different researchers often choose different thresholds (7%, 9%, 16% and 25% are used). On average about half of all protein residues have more than 25% of their surfaces exposed.
Best methods combine evolutionary information with machine-learning. Some of the methods that predict secondary structure also have the capability of predicting solvent accessibility, since essentially the same basic concepts apply to building a solvent accessibility predictor. For example PHDacc [126] and PROFacc [108] methods, which are part of the PredictProtein [104, 105] server use the same sequence profile input as do their respective secondary structure prediction counterparts (PHDsec and PROFsec). They use a neural network that assigns relative solvent accessibility into one of the 10 states corresponding to squares of relative solvent accessibility (state 10 corresponds to a range 81% to 100% of solvent accessibility). This 10-state scheme can be converted to a two-state scheme or to a prediction in terms of actual value of the exposed surface. Another well known method is Jpred [51] . It is also a server that predicts both secondary structure and solvent accessibility. The method uses alignments generated by HMM's and PSI-BLAST as input to a neural network. The output of predictions from two different networks is combined to give a final relative solvent accessibility. Many other variations and similar approaches have been attempted which include various types of neural networks [51, 127, 41, 128] , Support Vector Machines (SVM's) [129] , Bayesian networks [130] , information-theoretic approaches [131] and simple baseline approaches [132] . Most recently the relation between secondary structure and accessibility was explored to develop methods that combine both predictions explicitly to improve each one [128, 108] .
Performance. Unlike the prediction of secondary structure that is continuously assessed and monitored on identical data sets, methods for the prediction of solvent accessibility are not. Given that different groups use widely different data sets and different conventions to convert actual values of solvent accessibility into prediction states, it is impossible to compare and reasonably summarize levels of performance. However, two-state predictions (either buried or exposed) are predicted at levels >75% accuracy. Whatever values you read, note that advanced methods are significantly more accurate than simple methods based on simple features such as hydrophobicity, polarity, or simple statistics.
Servers. PROFacc [108] , PHDacc [126] , SABLE [128] , Jpred [51] , ACCpro [133] .
2D predictions may be a step toward 3D structures. Directly predicting 3D structure still fails. Predictions of 1D aspects of protein structure, such as secondary structure and solvent accessibility, provide very valuable information. However, 1D predictions are far too simplified. There is a path seemingly in between these two extremes (1D/3D), namely, the prediction of inter-residue distances. In fact, 3D structures can be reconstructed more or less completely from 2D distance maps. The catch is that distance maps are as hard to guess as 3D coordinates. As a consequence, existing methods try to solve the simplified problem of predicting contact maps, where two residues are considered to be in contact if they are located within a certain spatial cut-off distance (this results in a binary classification of residue pairs, contact/non contact pairs).
Measuring performance. There is no widely accepted threshold for the maximal distance between two residues that are considered as in contact. While the smallest physically possible distance could be agreed upon, the limit beyond which the interaction between two residues can be considered negligible is more difficult to define. However, the distance of 8 between Cb atoms is the most widely used threshold for the evaluation of the performance of these prediction methods. The output of contact prediction programs is generally a list of residue pairs, ranked according to some internal confidence score. Usually, only contacts between pairs that exceed a minimal sequence separation are evaluated. Although many different thresholds have been used, minimal separations of 6 and 24 sequence positions are most common for prediction of medium- and long-range contacts, respectively. These parameters are important as the task becomes more difficult with increasing separation (this tendency levels off for separations >20).
Prediction methods. One line of methods was based upon the observation that evolutionary pressure on maintaining protein structure would sometimes require correlation in the mutations of amino acid residues that are in spatial proximity to each other. In principle such patterns of correlation could be discerned in the multiple alignments of protein sequences. Some of the early contact prediction methods have indeed used only correlated mutations computed from multiple sequence alignments [134, 135] . The currently best methods make also use of other protein features, such as evolutionary profiles of the nearest neighbors of the residue pair being predicted, sequence separation, secondary structure and solvent accessibility predictions. Further improvement of predictions was achieved through machine learning techniques such as: neural networks that use [136, 137, 138] , or do not use [139, 55] correlated mutations, Hidden Markov Models [140, 141] , Support Vector Machines [142] and genetic programming [143] .
Performance and applications. Because the prediction of non-local contacts is difficult, progress in the field had been slow until recently when two promising new methods entered the CASP6 competition in 2004. When L/2 predictions are considered, the accuracy of state-of-the-art methods is around 30% for sequence separation of at least 6, and around 20% for sequence separation of at least 24. Though predicted contact maps are not very accurate they are nevertheless better on average than the contact maps obtained from the best de novo predictions of three-dimensional structures [65] . As a result, the automatically predicted contact maps were successfully used in prediction of three-dimensional protein structures [135, 90, 144] .
Servers. PROFcon [55] , CORNET [136] , CMAPpro [109] , GPCPRED [143] , Hamilton's server [138] .
Local mobility, rigidity and disorder all are features that relate to function. In crystal structures of proteins, the uncertainty of atomic positions can be represented by B-factors (Debye-Waller factors) [145] . B-factors represent the combined effects of thermal variation and static disorder. In general, the higher the B-factor of a residue the higher is its flexibility. Further, it has been demonstrated that many proteins and protein regions lack a unique three-dimensional structure [146] . Those regions are often characterized as an ensemble of rapidly changing alternative structures with differing backbone torsion angles. Estimates indicate that a substantial fraction of all proteins (as much as 25%) may contain disordered regions or be entirely disordered [147, 148, 101, 149] . Many important functional interactions such as cell-cycle regulation, signal transduction, gene expression, and chaperon action, are associated with proteins containing very flexible and disordered regions. Determination of these regions also plays an important role in structural genomics, since such regions can be a source of problems in protein expression, purification and crystallization.
Measuring flexibility and disorder. Protein flexibility can be derived from normalized B-factors [150] . Characterization of disordered regions can be provided by many experimental techniques, but in particular by NMR spectroscopy. Regions of protein X-ray structures without atomic coordinates are often considered as intrinsically disordered regions. Successful predictions should be able to simply indicate intrinsically disordered regions, or in case of protein flexibility to assign accurate normalized B-factors to protein residues.
Prediction methods. Methods predicting regions of low compositional complexity in protein sequences (SEG [151] and CAST [152] ) can be considered as the first methods predicting disordered regions in proteins. However the correlation between low-complexity regions and disorder is far from perfect. The low-complexity regions are highly repetitive in their amino-acid composition but many of them have well defined 3-dimensional structures [153] . There are methods that attempt to predict if entire proteins are in Ônatively unfoldedŐ configurations based on hydrophobicity and charge information derived from sequences [154] . The disordered regions can be predicted based on disorder propensity assigned to each amino acid [155] . Other methods use machine learning algorithms such as neural networks [156, 157, 155] or support vector machines [149] . The NORSp method [158] predicts extended non-regular secondary structure segments that often correlate with disorder . Predictions of B-factors were also carried out by methods using artificial neural networks [159] or support vector regression [160] . The prediction accuracy of those methods was not experimentally verified on the large scale yet.
Servers. PROFbval [159] , PONDR [156] , DISOPRED [157] , DISOPRED2 [149] , GlobPlot [155] , NORSp [158] , FoldIndex [161] , DisEMBL [155] .
Independent folding units. The visual inspection of 3D structures of large proteins often reveals compact structural subunits referred to as protein domains. Such domains are assumed to often constitute units that fold independently. Studies indicate that some of those proteins can be viewed as combinatorial arrangements of protein domains that are genetically mobile. Often, the structural domains are associated with particular biological functions. It is postulated that domains are independent folding units of large proteins. Knowledge of the domain organization of proteins of unknown three-dimensional structures can help experimental and computational attempts to elucidate their structure and function. Recent analyses of sequence-structure families suggest that over two-thirds of all proteins have more than one domain and that most domains span over about 100 residues [162] .
Prediction methods. The prediction of the domain organization is a challenging problem if we do not know the 3D structure (and automatic assignment methods disagree much more than secondary structure assignment methods even if we know the structure). Many sequence-based methods predict domains that are significantly shorter than actual structural domains [163] . The first automatic prediction methods, such as ProDom [164] , attempted to determine domains based on 'boundaries' in multiple alignments of protein sequences. This approach often results in fragmentation of actual structural domains since sequence similarity conservation often does not extend over entire domains. In a similar approach, domain constraints can also be obtained from sequence alignment databases such as BLOCKS [165] . Attempts to explicitly elongate sequence alignments were also made [166] . Other automatic prediction methods apply concepts from protein structure prediction [167] or try do derive domains from predicted contact maps [168] . There are methods that use statistics of domain size distributions [169] or a statistical approach toward combining various sources of information [170] . Some of the methods use artificial neural networks [171, 172] . Others explore alternative ways of using sequence alignment information [173] or alignments of predicted secondary structure elements [174] . The most accurate methods (e.g. CHOP [162] ) simply use sequence homology to proteins with known domain assignments. The downside of such methods is the low coverage, i.e. that they often do not find domains. None of these more recent methods has yet been experimentally verified on large scale.
Servers. CHOP (homology-based) [162] , CHOPnet [175] , ProDom (homology-based) [164] , DOMAINATION [166] , SnapDRAGON [167] , DomSSEA [174] .
We are grateful to Kaz Wrzeszczynski, Marco Punta and Avner Schlessinger (all from Columbia University) for valuable input. Thanks to Volker Eyrich (Schroedinger Inc.) and Ingrid Koh (Columbia) for their help in setting up the EVA server. The work of DP and BR was supported by the grant RO1-LM07329-01 from the National Library of Medicine (NLM). Last, not least, thanks to all those who deposit their experimental data in public databases, and to those who maintain these databases.
| 1. | Berman, H. M., Westbrook,J., Feng, Z., Gillliland, G., Bhat, T. N. et al. (2000). The Protein Data Bank.Nucleic Acids Research, 28,235-242. |
| 2. | Benson, D. A.,Karsch-Mizrachi, I., Lipman, D. J., Ostell, J. & Wheeler, D. L. (2003).GenBank. Nucleic Acids Research, 31,23-27. |
| 3. | Bairoch, A. & Apweiler,R. (2000). The SWISS-PROT protein sequence database and its supplement TrEMBLin 2000. Nucleic Acids Research, 28,45-48. |
| 4. | Rost, B. (1998). Marryingstructure and genomics. Structure, 6,259-263. |
| 5. | Norvell, J. C. &Machalek, A. Z. (2000). Structural genomics programs at the US NationalInstitute of General Medical Sciences. Nat Struct Biol, 7 Suppl, 931. |
| 6. | Anfinsen, C. B. (1973).Principles that govern the folding of protein chains. Science, 181, 223-230. |
| 7. | Liu, J. & Rost, B.(2002). Target space for structural genomics revisited. Bioinformatics, 18, 922-33. |
| 8. | Pauling, L. & Corey, R.B. (1951). Configurations of Polypeptide Chains with Favored OrientationsAround Single Bonds: Two New Pleated Sheets. Proceedings of the NationalAcademy of Sciences, 37,729-740. |
| 9. | Kabsch, W. & Sander, C.(1983). Dictionary of protein secondary structure: pattern recognition ofhydrogen bonded and geometrical features. Biopolymers, 22, 2577-2637. |
| 10. | Frishman, D. & Argos,P. (1995). Knowledge-based protein secondary structure assignment. Proteins:Structure, Function, and Genetics, 23,566-579. |
| 11. | Richards, F. M. &Kundrot, C. E. (1988). Identification of structural motifs from proteincoordinate data: secondary structure and first-level supersecondary structure. Proteins, 3, 71-84. |
| 12. | Sklenar, H., Etchebest, C.& Lavery, R. (1989). Describing protein structure: a general algorithmyielding complete helicoidal parameters and a unique overall axis. Proteins:Structure, Function, and Genetics, 6,46-60. |
| 13. | Colloc'h, N., Etchebest,C., Thoreau, E., Henrissat, B. & Mornon, J. P. (1993). Comparison of threealgorithms for the assignment of secondary structure in proteins: theadvantages of a consensus assignment. Protein Eng, 6, 377-82. |
| 14. | Andersen, C. A. F.,Palmer, A. G., Brunak, S. & Rost, B. (2002). Continuum secondary structurecaptures protein flexibility. Structure,10, 175-184. |
| 15. | Karchin, R., Cline, M.,Mandel-Gutfreund, Y. & Karplus, K. (2003). Hidden Markov models that usepredicted local structure for fold recognition: alphabets of backbone geometry.Proteins: Structure, Function, and Genetics,51, 504-514. |
| 16. | Cuff, J. A. & Barton,G. J. (1999). Evaluation and improvement of multiple sequence methods forprotein secondary structure prediction. Proteins: Structure, Function, andGenetics, 34, 508-519. |
| 17. | Defay, T. & Cohen, F.E. (1995). Evaluation of current techniques for ab initio protein structureprediction. Proteins: Structure, Function, and Genetics, 23, 431-445. |
| 18. | Rost, B. & Sander, C.(1993). Improved prediction of protein secondary structure by use of sequenceprofiles and neural networks. Proceedings of the National Academy ofSciences, 90, 7558-7562. |
| 19. | Rost, B., Sander, C. &Schneider, R. (1994). Redefining the goals of protein secondary structureprediction. Journal of Molecular Biology,235, 13-26. |
| 20. | Zemla, A., Venclovas, C.,Fidelis, K. & Rost, B. (1999). A modified definition of SOV, asegment-based measure for protein secondary structure prediction assessment. Proteins:Structure, Function, and Genetics, 34,220-223. |
| 21. | Hobohm, U., Scharf, M.,Schneider, R. & Sander, C. (1992). Selection of representative protein datasets. Protein Science, 1,409-17. |
| 22. | Eyrich, V., Mart-Renom,M. A., Przybylski, D., Fiser, A., Pazos, F. et al. (2001). EVA: continuousautomatic evaluation of protein structure prediction servers. Bioinformatics, 17, 1242-1243. |
| 23. | Szent-Gyrgyi, A. G. &Cohen, C. (1957). Role of proline in polypeptide chain configuration ofproteins. Science, 126,697. |
| 24. | Kendrew, J. C., Dickerson,R. E., Strandberg, B. E., Hart, R. J., Davies, D. R. et al. (1960). Structureof myoglobin: a three-dimensional Fourier synthesis at 2 resolution. Nature, 185, 422-427. |
| 25. | Perutz, M. F., Rossmann,M. G., Cullis, A. F., Muirhead, G., Will, G. et al. (1960). Structure ofhaemoglobin: a three-dimensional Fourier synthesis at 5.5 resolution,obtained by X-ray analysis. Nature,185, 416-422. |
| 26. | Blout, E. R., de Loz, C.,Bloom, S. M. & Fasman, G. D. (1960). Dependence of the conformation ofsynthetic polypeptides on amino acid composition. Journal of AmericanChemical Society, 82,3787-3789. |
| 27. | Blout, E. R. (1962). Thedependence of the conformation of polypetides and proteins upon amino acidcomposition. In Polyamino Acids, Polypeptides, and Proteins (Stahman, M.,eds.), pp. 275-279, Univ. of Wisconsin Press, Madison. |
| 28. | Chothia, C. & Lesk, A.M. (1986). The relation between the divergence of sequence and structure inproteins. EMBO Journal, 5,823-826. |
| 29. | Sander, C. &Schneider, R. (1991). Database of homology-derived structures and thestructural meaning of sequence alignment. Proteins: Structure, Function, andGenetics, 9, 56-68. |
| 30. | Benner, S. A. &Gerloff, D. (1991). Patterns of divergence in homologous proteins as indicatorsof secondary and tertiary structure: a prediction of the structure of thecatalytic domain of protein kinases. Adv. Enzyme Regul., 31, 121-181. |
| 31. | Abagyan, R. A. &Batalov, S. (1997). Do aligned sequences share the same fold? Journal ofMolecular Biology, 273,355-368. |
| 32. | Park, J., Karplus, K.,Barrett, C., Hughey, R., Haussler, D. et al. (1998). Sequence comparisons usingmultiple sequences detect three times as many remote homologues as pairwisemethods. Journal of Molecular Biology,284, 1201-1210. |
| 33. | Dickerson, R. E.,Timkovich, R. & Almassy, R. J. (1976). The cytochrome fold and theevolution of bacterial energy metabolism. Journal of Molecular Biology, 100, 473-491. |
| 34. | Maxfield, F. R. &Scheraga, H. A. (1979). Improvements in the prediction of protein topography byreduction of statistical errors. Biochemistry, 18, 697-704. |
| 35. | Zvelebil, M. J., Barton,G. J., Taylor, W. R. & Sternberg, M. J. E. (1987). Prediction of proteinsecondary structure and active sites using alignment of homologous sequences. Journalof Molecular Biology, 195,957-961. |
| 36. | Rost, B. & Sander, C.(1993). Prediction of protein secondary structure at better than 70% accuracy. Journalof Molecular Biology, 232,584-599. |
| 37. | Rost, B. & Sander, C.(1994). Combining evolutionary information and neural networks to predictprotein secondary structure. Proteins: Structure, Function, and Genetics, 19, 55-72. |
| 38. | Rost, B. (1996). PHD:predicting one-dimensional protein structure by profile based neural networks. Methodsin Enzymology, 266, 525-539. |
| 39. | Jones, D. T. (1999).Protein secondary structure prediction based on position-specific scoringmatrices. Journal of Molecular Biology,292, 195-202. |
| 40. | Ouali, M. & King, R.D. (2000). Cascaded multiple classifiers for secondary structure prediction. ProteinScience, 9, 1162-1176. |
| 41. | Pollastri, G., Przybylski,D., Rost, B. & Baldi, P. (2002). Improving the prediction of proteinsecondary structure in three and eight classes using recurrent neural networksand profiles. Proteins: Structure, Function, and Bioinformatics, 47, 228-235. |
| 42. | Mehta, P. K., Heringa, J.& Argos, P. (1995). A simple and fast approach to prediction of proteinsecondary structure from multiply aligned sequences with accuracy above 70%. ProteinScience, 4, 2517-2525. |
| 43. | Salamov, A. A. &Solovyev, V. V. (1995). Prediction of protein secondary structure by combiningnearest-neighbor algorithms and multiple sequence alignment. Journal ofMolecular Biology, 247,11-15. |
| 44. | Di Francesco, V., Garnier,J. & Munson, P. J. (1996). Improving protein secondary structure predictionwith aligned homologous sequences. Protein Science, 5, 106-113. |
| 45. | Riis, S. K. & Krogh,A. (1996). Improving prediction of protein secondary structure using structuredneural networks and multiple sequence alignments. Journal of ComputationalBiology, 3, 163-183. |
| 46. | Levin, J. M. (1997).Exploring the limits of nearest neighbour secondary structure prediction. ProteinEngineering, 10, 771-6. |
| 47. | Hua, S. & Sun, Z.(2001). Support vector machine approach for protein subcellular localizationprediction. Bioinformatics, 17,721-728. |
| 48. | Ward, J. J., McGuffin, L.J., Buxton, B. F. & Jones, D. T. (2003). Secondary structure predictionwith support vector machines. Bioinformatics, 19, 1650-5. |
| 49. | Karplus, K., Karchin, R.,Draper, J., Casper, J., Mandel-Gutfreund, Y. et al. (2003). Combininglocal-structure, fold-recognition, and new fold methods for protein structureprediction. Proteins: Structure, Function, and Genetics, 53, 491-496. |
| 50. | Altschul, S. F., Madden,T. L., Schaeffer, A. A., Zhang, J., Zhang, Z. et al. (1997). Gapped Blast andPSI-Blast: a new generation of protein database search programs. NucleicAcids Research, 25,3389-3402. |
| 51. | Cuff, J. A. & Barton,G. J. (2000). Application of multiple sequence alignment profiles to improveprotein secondary structure prediction. Proteins: Structure, Function, andGenetics, 40, 502-511. |
| 52. | Przybylski, D. & Rost,B. (2002). Alignments grow, secondary structure prediction improves. Proteins:Structure, Function, and Bioinformatics,46, 195-205. |
| 53. | Adamczak, R., Porollo, A.& Meller, J. (2005). Combining prediction of secondary structure andsolvent accessibility in proteins. Proteins,59, 467-75. |
| 54. | Pollastri, G. &McLysaght, A. (2005). Porter: a new, accurate server for protein secondarystructure prediction. Bioinformatics,21, 1719-20. |
| 55. | Punta, M. & Rost, B.(2005). PROFcon: novel prediction of long-range contacts. Bioinformatics, 21, 2960-8. |
| 56. | Hansen, L. K. &Salamon, P. (1990). Neural Network Ensembles. IEEE Transactions on patternanalysis and machine intelligence, 12,993-1001. |
| 57. | Chandonia, J. M. &Karplus, M. (1999). New methods for accurate prediction of protein secondarystructure. Proteins: Structure, Function, and Genetics, 35, 293-306. |
| 58. | Petersen, T. N.,Lundegaard, C., Nielsen, M., Bohr, H., Bohr, J. et al. (2000). Prediction ofprotein secondary structure at 80% accuracy. Proteins: Structure, Function,and Genetics, 41, 17-20. |
| 59. | Cuff, J. A., Clamp, M. E.,Siddiqui, A. S., Finlay, M. & Barton, G. J. (1998). JPred: a consensussecondary structure prediction server. Bioinformatics, 14, 892-893. |
| 60. | Guermeur, Y., Geourjon,C., Gallinari, P. & Deleage, G. (1999). Improved performance in proteinsecondary structure prediction by inhomogeneous score combination. Bioinformatics, 15, 413-421. |
| 61. | Selbig, J., Mevissen, T.& Lengauer, T. (1999). Decision tree-based formation of consensus proteinsecondary structure prediction. Bioinformatics, 15, 1039-1046. |
| 62. | King, R. D., Ouali, M.,Strong, A. T., Aly, A., Elmaghraby, A. et al. (2000). Is it better to combinepredictions? Protein Engineering, 13,15-19. |
| 63. | Rost, B. & Sander, C.(2000). Third generation prediction of secondary structures. Methods MolBiol, 143, 71-95. |
| 64. | Albrecht, M., Tosatto, S.C., Lengauer, T. & Valle, G. (2003). Simple consensus procedures areeffective and sufficient in secondary structure prediction. Protein Eng, 16, 459-62. |
| 65. | Eyrich, V. A., Przybylski,D., Koh, I. Y., Grana, O., Pazos, F. et al. (2003). CAFASP3 in the spotlight ofEVA. Proteins, 53 Suppl 6,548-60. |
| 66. | Kloczkowski, A., Ting, K.L., Jernigan, R. L. & Garnier, J. (2002). Combining the GOR V algorithmwith evolutionary information for protein secondary structure prediction fromamino acid sequence. Proteins: Structure, Function, and Genetics, 49, 154-166. |
| 67. | Sen, T. Z., Jernigan, R.L., Garnier, J. & Kloczkowski, A. (2005). GOR V server for proteinsecondary structure prediction. Bioinformatics,. |
| 68. | McGuffin, L. J. &Jones, D. T. (2003). Benchmarking secondary structure prediction for foldrecognition. Proteins: Structure, Function, and Genetics, 52, 166-175. |
| 69. | Koh, I. Y., Eyrich, V. A.,Marti-Renom, M. A., Przybylski, D., Madhusudhan, M. S. et al. (2003). EVA:Evaluation of protein structure prediction servers. Nucleic Acids Res, 31, 3311-5. |
| 70. | Rost, B. (1995). TOPITS:Threading One-dimensional Predictions Into Three-dimensional Structures. InThird International Conference on Intelligent Systems for Molecular Biology(Rawlings, C., Clark, D., Altman, R., Hunter, L., Lengauer, T. et al., eds.),pp. 314-321, Menlo Park, CA: AAAIPress, Cambridge, England. |
| 71. | Fischer, D. &Eisenberg, D. (1996). Fold recognition using sequence-derived properties. ProteinScience, 5, 947-955. |
| 72. | Russell, R. B., Copley, R.R. & Barton, G. J. (1996). Protein fold recognition by mapping predictedsecondary structures. Journal of Molecular Biology, 259, 349-365. |
| 73. | Ayers, D. J., Gooley, P.R., Widmer-Cooper, A. & Torda, A. E. (1999). Enhanced protein foldrecognition using secondary structure information from NMR. Protein Science, 8, 1127-1133. |
| 74. | de la Cruz, X. &Thornton, J. M. (1999). Factors limiting the performance of prediction-basedfold recognition methods. Protein Science,8, 750-759. |
| 75. | Di Francesco, V., Munson,P. J. & Garnier, J. (1999). FORESST: fold recognition from secondarystructure predictions of proteins. Bioinformatics, 15, 131-140. |
| 76. | Hargbo, J. & Elofsson,A. (1999). Hidden Markov models that use predicted secondary structures forfold recognition. Proteins: Structure, Function, and Genetics, 36, 68-76. |
| 77. | Jones, D. T. (1999).GenTHREADER: an efficient and reliable protein fold recognition method forgenomic sequences. Journal of Molecular Biology, 287, 797-815. |
| 78. | Jones, D. T., Tress, M.,Bryson, K. & Hadley, C. (1999). Successful recognition of protein foldsusing threading methods biased by sequence similarity and predicted secondarystructure. Proteins: Structure, Function, and Genetics, 37, 104-111. |
| 79. | Koretke, K. K., Russell,R. B., Copley, R. R. & Lupas, A. N. (1999). Fold recognition using sequenceand secondary structure information. Proteins: Structure, Function, andGenetics, 37, 141-148. |
| 80. | Ota, M., Kawabata, T.,Kinjo, A. R. & Nishikawa, K. (1999). Cooperative approach for the proteinfold recognition. Proteins, 37,126-132. |
| 81. | Panchenko, A.,Marchler-Bauer, A. & Bryant, S. H. (1999). Threading with explicit modelsfor evolutionary conservation of structure and sequence. Proteins:Structure, Function, and Genetics, Suppl3, 133-140. |
| 82. | Kelley, L. A., MacCallum,R. M. & Sternberg, M. J. (2000). Enhanced genome annotation usingstructural profiles in the program 3D-PSSM. J Mol Biol, 299, 499-520. |
| 83. | Heringa, J. (1999). Twostrategies for sequence comparison: profile-preprocessed and secondarystructure-induced multiple alignment. Comput Chem, 23, 341-64. |
| 84. | Jennings, A. J., Edge, C.M. & Sternberg, M. J. (2001). An approach to improving multiple alignmentsof protein sequences using predicted secondary structure. Protein Eng, 14, 227-31. |
| 85. | Ng, P. C., Henikoff, J. G.& Henikoff, S. (2000). PHAT: a transmembrane-specific substitution matrix.Predicted hydrophobic and transmembrane. Bioinformatics, 16, 760-6. |
| 86. | Przybylski, D. & Rost,B. (2004). Improving fold recognition without folds. J Mol Biol, 341, 255-69. |
| 87. | Baldi, P., Pollastri, G.,Andersen, C. A. & Brunak, S. (2000). Matching protein beta-sheet partnersby feedforward and recurrent neural networks. Ismb, 8, 25-36. |
| 88. | Ivankov, D. N. &Finkelstein, A. V. (2004). Prediction of protein folding rates from the aminoacid sequence-predicted secondary structure. Proc Natl Acad Sci U S A, 101, 8942-4. |
| 89. | Punta, M. & Rost, B.(2005). Protein folding rates estimated from contact predictions. Journal ofMolecular Biology, 348,507-512. |
| 90. | Ortiz, A. R., Kolinski,A., Rotkiewicz, P., Ilkowski, B. & Skolnick, J. (1999). Ab initio foldingof proteins using restraints derived from evolutionary information. Proteins:Structure, Function, and Genetics, Suppl3, 177-185. |
| 91. | Eyrich, V. A., Standley,D. M., Felts, A. K. & Friesner, R. A. (1999). Protein tertiary structureprediction using a branch and bound algorithm. Proteins: Structure,Function, and Genetics, 35,41-57. |
| 92. | Eyrich, V. A., Standley,D. M. & Friesner, R. A. (1999). Prediction of protein tertiary structure tolow resolution: performance for a large and structurally diverse test set. Journalof Molecular Biology, 288,725-742. |
| 93. | Lomize, A. L., Pogozheva,I. D. & Mosberg, H. I. (1999). Prediction of protein structure: the problemof fold multiplicity. Proteins: Structure, Function, and Genetics, Suppl 3, 199-203. |
| 94. | Chen, C. C., Singh, J. P.& Altman, R. B. (1999). Using imperfect secondary structure predictions toimprove molecular structure computations. Bioinformatics, 15, 53-65. |
| 95. | Samudrala, R., Xia, Y.,Huang, E. & Levitt, M. (1999). Ab initio protein structure prediction usinga combined hierarchical approach. Proteins: Structure, Function, andGenetics, Suppl 3, 194-198. |
| 96. | Samudrala, R., Huang, E.S., Koehl, P. & Levitt, M. (2000). Constructing side chains on near-nativemain chains for ab initio protein structure prediction. Protein Engineering, 13, 453-457. |
| 97. | Nair, R. & Rost, B.(2003). Better prediction of sub-cellular localization by combiningevolutionary and structural information. Proteins, 53, 917-30. |
| 98. | Whisstock, J. C. &Lesk, A. M. (2003). Prediction of protein function from protein sequence andstructure. Quarterly Reviews of Biophysics,36, 307-340. |
| 99. | Gerstein, M. & Levitt,M. (1997). A structural census of the current population of protein sequences. Proceedingsof the National Academy of Sciences, 94,11911-11916. |
| 100. | Przytycka, T., Aurora, R.& Rose, G. D. (1999). A protein taxonomy based on secondary structure. NatureStructural Biology, 6,672-682. |
| 101. | Liu, J., Tan, H. &Rost, B. (2002). Loopy proteins appear conserved in evolution. J Mol Biol, 322, 53-64. |
| 102. | Prusiner, S. B., Scott,M. R., DeArmond, S. J. & Cohen, F. E. (1998). Prion protein biology. Cell, 93, 337-48. |
| 103. | Kirshenbaum, K., Young,M. & Highsmith, S. (1999). Predicting allosteric switches in myosins. ProteinScience, 8, 1806-1815. |
| 104. | Rost, B., Sander, C.& Schneider, R. (1994). PHD - an automatic server for protein secondarystructure prediction. CABIOS, 10,53-60. |
| 105. | Rost WWW, B. (2000).PredictProtein - internet prediction service. . |
| 106. | Eyrich, V. & Rost, B.(2000). The META-PredictProtein server. . |
| 107. | Barton, G. J. (1995).Protein secondary structure prediction. Current Opinion in StructuralBiology, 5, 372-376. |
| 108. | Rost, B. (2005). How touse protein 1D structure predicted by PROFphd. In The Proteomics ProtocolsHandbook (Walker, J. E., eds.), pp. 875-901, Humana, Totowa NJ. |
| 109. | Pollastri, G. &Baldi, P. (2002). Prediction of contact maps by GIOHMMs and recurrent neuralnetworks using lateral propagation from all four cardinal corners. Bioinformatics, 18 Suppl 1, S62-70. |
| 110. | Meiler, J., Mueller, M.,Zeidler, A. & Schmaeschke, F. (2001). Generation and evaluation ofdimension-reduced amino acid parameter representation by artificial neuralnetworks. Journal of Molecular Modelling,7, 360-369. |
| 111. | Lin, K., Simossis, V. A.,Taylor, W. R. & Heringa, J. (2005). A simple and fast secondary structureprediction method using hidden neural networks. Bioinformatics, 21, 152-9. |
| 112. | Liu, J. & Rost, B.(2001). Comparing function and structure between entire proteomes. ProteinScience, 10, 1970-1979. |
| 113. | Melen, K., Krogh, A.& von Heijne, G. (2003). Reliability measures for membrane protein topologyprediction algorithms. Journal of Molecular Biology, 327, 735-744. |
| 114. | Bigelow, H. R., Petrey,D. S., Liu, J., Przybylski, D. & Rost, B. (2004). Predicting transmembranebeta-barrels in proteomes. Nucleic Acids Res, 32, 2566-77. |
| 115. | Kyte, J. & Doolittle,R. F. (1982). A simple method for displaying the hydrophathic character of aprotein. Journal of Molecular Biology,157, 105-132. |
| 116. | von Heijne, G. (1992).Membrane protein structure prediction. Journal of Molecular Biology, 225, 487-494. |
| 117. | Rost, B., Casadio, R.& Fariselli, P. (1996). Refining neural network predictions for helicaltransmembrane proteins by dynamic programming. In Fourth InternationalConference on Intelligent Systems for Molecular Biology (States, D., Agarwal,P., Gaasterland, T., Hunter, L. & Smith, R. F., eds.), pp. 192-200, MenloPark, CA: AAAI Press, St. Louis, M.O., U.S.A.. |
| 118. | Krogh, A., Larsson, B.,von Heijne, G. & Sonnhammer, E. L. (2001). Predicting transmembrane proteintopology with a hidden Markov model: application to complete genomes. Journalof Molecular Biology, 305,567-580. |
| 119. | Hirokawa, T.,Boon-Chieng, S. & Mitaku, S. (1998). SOSUI: classification and secondarystructure prediction system for membrane proteins. Bioinformatics, 14, 378-379. |
| 120. | Cserzo, M., Eisenhaber,F., Eisenhaber, B. & Simon, I. (2004). TM or not TM: transmembrane proteinprediction with low false positive rate using DAS-TMfilter. Bioinformatics, 20, 136-7. |
| 121. | Gromiha, M. M., Majumdar,R. & Ponnuswamy, P. K. (1997). Identification of membrane spanning betastrands in bacterial porins. Protein Engineering, 10, 497-500. |
| 122. | Diederichs, K., Freigang,J., Umhau, S., Zeth, K. & Breed, J. (1998). Prediction by a neural networkof outer membrane beta-strand protein topology. Protein Science, 7, 2413-2420. |
| 123. | Jacoboni, I., Martelli,P. L., Fariselli, P., De Pinto, V. & Casadio, R. (2001). Prediction of thetransmembrane regions of beta-barrel membrane proteins with a neuralnetwork-based predictor. Protein Science,10, 779-787. |
| 124. | Chen, C. P., Kernytsky,A. & Rost, B. (2002). Transmembrane helix predictions revisited. ProteinSci, 11, 2774-91. |
| 125. | Mller, S., Croning, D.R. & Apweiler, R. (2001). Evaluation of methods for the prediction ofmembrane spanning regions. Bioinformatics,17, 646-653. |
| 126. | Rost, B. & Sander, C.(1994). Conservation and prediction of solvent accessibility in proteinfamilies. Proteins, 20,216-26. |
| 127. | Ahmad, S. & Gromiha,M. M. (2002). NETASA: neural network based prediction of solvent accessibility.Bioinformatics, 18,819-24. |
| 128. | Adamczak, R., Porollo, A.& Meller, J. (2004). Accurate prediction of solvent accessibility usingneural networks-based regression. Proteins,56, 753-67. |
| 129. | Kim, H. & Park, H.(2004). Prediction of protein relative solvent accessibility with supportvector machines and long-range interaction 3D local descriptor. Proteins, 54, 557-62. |
| 130. | Thompson, M. J. &Goldstein, R. A. (1996). Predicting solvent accessibility: higher accuracyusing Bayesian statistics and optimized residue substitution classes. Proteins, 25, 38-47. |
| 131. | Naderi-Manesh, H.,Sadeghi, M., Arab, S. & Moosavi Movahedi, A. A. (2001). Prediction ofprotein surface accessibility with information theory. Proteins, 42, 452-9. |
| 132. | Richardson, C. J. &Barlow, D. J. (1999). The bottom line for prediction of residue solventaccessibility. Protein Eng, 12,1051-4. |
| 133. | Pollastri, G., Baldi, P.,Fariselli, P. & Casadio, R. (2002). Prediction of coordination number andrelative solvent accessibility in proteins. Proteins, 47, 142-53. |
| 134. | Gobel, U., Sander, C.,Schneider, R. & Valencia, A. (1994). Correlated mutations and residuecontacts in proteins. Proteins, 18,309-17. |
| 135. | Olmea, O., Rost, B. &Valencia, A. (1999). Effective use of sequence correlation and conservation infold recognition. Journal of Molecular Biology, 293, 1221-1239. |
| 136. | Olmea, O. & Valencia,A. (1997). Improving contact predictions by the combination of correlatedmutations and other sources of sequence information. Folding & Design, 2, S25-S32. |
| 137. | Fariselli, P., Olmea, O.,Valencia, A. & Casadio, R. (2001). Prediction of contact maps with neuralnetworks and correlated mutations. Protein Engineering, 14, 835-843. |
| 138. | Hamilton, N., Burrage,K., Ragan, M. A. & Huber, T. (2004). Protein contact prediction usingpatterns of correlation. Proteins, 56,679-84. |
| 139. | Pollastri, G., Baldi, P.,Fariselli, P. & Casadio, R. (2001). Improved prediction of the number ofresidue contacts in proteins by recurrent neural networks. Bioinformatics, 17, S234-242. |
| 140. | Bystroff, C. & Shao,Y. (2002). Fully automated ab initio protein structure prediction usingI-SITES, HMMSTR and ROSETTA. Bioinformatics,18, S54-S61. |
| 141. | Shao, Y. & Bystroff,C. (2003). Predicting interresidue contacts using templates and pathways. Proteins:Structure, Function, and Genetics, 53,497-502. |
| 142. | Zhao, Y. & Karypis,G. (2003). Clustering in life sciences. Methods Mol Biol, 224, 183-218. |
| 143. | MacCallum, R. M. (2004).Striped sheets and protein contact prediction. Bioinformatics, 20 Suppl 1, I224-I231. |
| 144. | Skolnick, J., Zhang, Y.,Arakaki, A. K., Kolinski, A., Boniecki, M. et al. (2003). TOUCHSTONE: a unifiedapproach to protein structure prediction. Proteins, 53 Suppl 6, 469-79. |
| 145. | Creighton, T. (1992).Proteins:structures and molecular properties. W.H. Freeman, . |
| 146. | Vucetic, S., Obradovic,Z., Vacic, V., Radivojac, P., Peng, K. et al. (2005). DisProt: a database ofprotein disorder. Bioinformatics, 21,137-40. |
| 147. | Romero, P., Obradovic,Z., Kissinger, C. R., Villafranca, J. E., Garner, E. et al. (1998). Thousandsof proteins likely to have long disordered regions. Pac Symp Biocomput,437-48. |
| 148. | Dunker, A. K., Brown, C.J., Lawson, J. D., Iakoucheva, L. M. & Obradovic, Z. (2002). Intrinsicdisor |