bottom - CUBIC-papers - CUBIC

Title: Transmembrane helix predictions revisited
Author:CP Chen, A Kernytsky & B Rost
Quote: Protein Science, 2002, 11, 2774-91

Transmembrane helix predictions revisited

Chien Peter Chen 1, Andrew Kernytsky 1 & Burkhard Rost 1, 2, 3,*

1 CUBIC, Dept. of Biochemistry and Molecular Biophysics, Columbia University, 650 West 168th Street BB217, New York, NY 10032, USA
2 Columbia University Center for Computational Biology and Bioinformatics (C2B2), Russ Berrie Pavilion, 1150 St. Nicholas Avenue, New York, NY 10032, USA
3 North East Structural Genomics Consortium (NESG), Department of Biochemistry and Molecular Biophysics, Columbia University, 650 West 168th Street BB217, New York, NY 10032, USA
* Corresponding author:  email = rost@columbia.edu URL http://cubic.bioc.columbia.edu/  Tel: +1-212-305-3773, fax: +1-212-305-7932

 

This article is published in (Protein Science, issue, 2002 and pages) © copyright Cold Spring Harbor Laboratory Press (2002). CSHL Press is the only authorised source. All copying of this article including placing on another website requires the written permission of the copyright owner.

 

 

Abstract

Methods that predict membrane helices have become increasingly useful in the context of analysing entire proteomes, as well as in everyday sequence analysis. Here, we analysed 27 advanced and simple methods in detail. To resolve contradictions in previous works and to re-evaluate transmembrane helix prediction algorithms, we introduced an analysis that distinguished between performance on redundancy-reduced high- and low-resolution data sets, established thresholds for significant differences in performance, and implemented both per-segment and per-residue analysis of membrane helix predictions. While some of the advanced methods performed better than others, we showed in a thorough bootstrapping experiment based on various measures of accuracy that no method performed consistently best. In contrast, most simple hydrophobicity scale-based methods were significantly less accurate than any advanced method as they over-predicted membrane helices and confused membrane helices with hydrophobic regions outside of membranes. In contrast, the advanced methods usually distinguished correctly between membrane helical and other proteins. Nonetheless, few methods reliably distinguished between signal peptides and membrane helices. We could not verify a significant difference in performance between eukaryotic and prokaryotic proteins. Surprisingly, we found that proteins with more than five helices were predicted at a significantly lower accuracy than proteins with five or fewer. The important implication is that structurally unsolved multi-spanning membrane proteins, which are often important drug targets, will remain problematic for transmembrane helix prediction algorithms. Overall, by establishing a standardized methodology for transmembrane helix prediction evaluation, we have resolved differences among previous works and presented novel trends that may impact the analysis of entire proteomes.

Abbreviations used

A-Cidnormalised hydrophobicity scale for alpha-proteins [1]
Av-Cidnormalised average hydrophobicity scale [1]
Ben-Talhydrophobicity scale representing the free energy of transferring an amino acid from water into the centre of the hydrocarbon region of a lipid bilayer [2]
BIGnon-identical merger of SWISS-PROT [3] + TrEMBL [3] + PDB [4]
BLASTfast sequence alignment method [5]
Bull-BreeseBull-Breese hydrophobicity scale [6]
DSSPprogram assigning secondary structure [7]
Eisenbergnormalised consensus hydrophobicity scale [8]
EMSolvation free energy [9]
EVAserver automatically evaluating structure prediction methods [10, 11]
Faucherehydrophobic parameter pi from the partitioning of N-acetyl-amino-acid amides [12]
GEShydrophobicity property [13, 14]
Heijnetransfer free energy to lipophilic phase [15]
HMMHidden Markov Model
HMMTOPHidden Markov model predicting transmembrane helices [16]
Hopp-WoodsHopp-Woods hydrophilicity value [17]
KDKyte-Doolittle hydropathy index [18]
Lawsontransfer free energy [19]
LevittHydrophobic parameter [20]
MaxHomdynamic programming algorithm for conservation weight based multiple sequence alignment [21]
MEMSATdynamic-programming based prediction of transmembrane helices [22]
META-PPinternet service allowing to access a variety of bioinformatics tools through one single interface [23]
Nakashimanormalised composition of membrane proteins [24]
PDBProtein Data Bank of experimentally determined 3D structures of proteins [25, 4]
PHDhtmProfile based neural network prediction of transmembrane helices [26, 27]
PHDpsihtmdivergent profile (PSI-BLAST) based neural network prediction of transmembrane helices [27, 28]
PSI-BLASTposition specific iterated database search [29]
Radzickatransfer free energy from 1-octanol to water [30]
Rosemansolvation corrected side-chain hydropathy [31]
SignalPsignal peptide prediction [32]
SOSUIhydrophobicity and amphiphilicity based transmembrane helix prediction [33]
SPLITtransmembrane helix prediction [34]
Sweetoptimal matching hydrophobicity [35]
SWISS-PROTdata base of protein sequences [3]
TMtransmembrane
TMAPalignment-based prediction of transmembrane helices [36]
TMHtransmembrane helix
TMHMMTrans-Membrane prediction using cyclic Hidden Markov Models [37, 38]
TMpredprediction of transmembrane helices [39]
TopPred2hydrophobicity-based membrane helix prediction [40, 41]
TrEMBLtranslation of the EMBL-nucleotide database coding DNA to protein sequences [3]
Wolfendenhydration potential [42]
WWWimley-White hydrophobicity scale-based method [43, 44, 45, 46] .
 
Terminology used:
 
 
advanced prediction methodsall methods that do not exclusively use a hydrophobicity scale
simple prediction methodsmembrane prediction methods exclusively based on hydrophobicity scales.
 
Abbreviations used for formulas:
 
 
htmtransmembrane helix
Ttransmembrane helix
Lnon-transmembrane helix

 


Introduction

Helical membrane proteins challenge bioinformatics. Membrane proteins are crucial for survival. They constitute key components for cell-cell signalling, mediate the transport of ions and solutes across the membrane, and are crucial for recognition of self [47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57] . Furthermore, the pharmaceutical industry preferably targets membrane bound receptors [58, 59, 60, 61, 62] . Despite their great biological and medical importance, we still have very little experimental information about their 3D structures: less than 1% of the proteins of known structure are membrane proteins. Fortunately, it is relatively easy to identify the location of membrane helices through low-resolution experiments. An expert-curated list of low-resolution experiments maintained by Steffen Möller and colleagues [63] considers information from C-terminal fusions with indicator proteins [64, 65, 66, 67] and from antibody-binding studies [66, 68, 69, 70, 71] . Nevertheless, we only have low-resolution experimental information for less than 500 helical membrane proteins, and PDB [4] contains fewer than 50 sequence-unique protein chains with high-resolution helical membrane structures (Methods). These numbers contrast with the over 7000 helical membrane proteins expected in humans alone [72, 38, 73] . Thus, bioinformatics is challenged to help bridge the information gap between what we want and what we have.

Published estimates for membrane helix prediction questioned by recent analyses. Recently, a few groups have questioned the estimated levels of performance for membrane helix prediction methods. Möller, Croning & Apweiler analysed 14 prediction methods that did not use alignment information on a set of 188 proteins with experimentally known helices [63, 74] . They also applied the prediction methods to globular proteins and to signal peptides. The results suggested the following conclusions. (1) The best prediction method (TMHMM) correctly predicts all membrane helices for 52-69% of all proteins, tested. (2) The best distinction between globular and membrane helical proteins reaches levels of over 97% for the globular proteins tested (TMHMM and SOSUI). (3) On a set of 34 signal and transit peptide proteins, the best methods reached 98% (PHDhtm) to 100% (ALOM2) accuracy in distinguishing these from membrane helices. (4) The best simple hydrophobicity index (Kyte-Doolittle, KD [18] ) correctly predicted all helices for 44% of all the proteins in a set for which HMMTOP [16] reached only 43% accuracy. Another recent analysis was based on a set of 145 sequence-unique proteins [75] . The authors tested 10 prediction methods not using alignment information on their data set. In contrast to Möller et al. the authors found that HMMTOP was not only much better than the KD hydrophobicity index but that it was the most accurate prediction method, correctly predicting all membrane helices for about 68% of all proteins. Averaging over all 10 methods, the authors found the resulting consensus prediction about ten percentage points more accurate than the best single method. The authors also claimed that prediction accuracy is higher for prokaryotes than for eukaryotes. They speculated that they found different levels of accuracy than Möller et al. because they used different percentages of prokaryotic proteins in their data sets. Jayasinghe, Hristova and White analysed four prediction methods on two different sets of proteins with known membrane helix locations: (a) on 150 high-resolution structures from PDB and (b) on 242 low-resolution proteins [76] . The authors found that the results between the high- and low-resolution sets differed marginally and reported that the best methods (PHDhtm and HMMTOP) correctly predict more than 93-97% of all helices. This group has also proposed a method based on a novel entropy-based hydrophobicity scale, namely the Wimley-White scale (WW) that is claimed to correctly predict 99% of all membrane helices [77] . One major problem of hydrophobicity-based methods appears to be the poor distinction between membrane and globular proteins [78, 22, 79, 27, 77, 74] .

Problems with previous analyses. Previous analyses were limited in various ways. (1) Performance on high- and low-resolution data sets was neither distinguished by Möller et al. nor by Ikeda et al., although it seemed that performance differed between the two [76] . (2) The redundancy in data sets resulting from many copies of very similar proteins was neither reduced by Möller et al. nor by Jayasinghe et al. However, such bias is known to create problems when estimating prediction methods [80, 79, 27, 81] . (3) Neither Möller et al. nor Ikeda et al. tested any method based on alignment information although such methods are known to be more accurate [80, 82, 83, 79, 26, 84] . (4) No group explored per-residue along with per-segment based measures for prediction accuracy. Instead, all groups focused on one particular definition of prediction accuracy; no two groups applied the same definition. (5) No group established levels for significant differences between methods. This makes it impossible to conclude whether or not differences between any two methods are relevant. In general, levels of significant differences typically depend on the data sets and the scores used [10, 85, 86] . (6) Only Möller et al. tested proteins with signal peptides; however, their analysis was restricted to a small set of 34 proteins with known signal peptides. (7) No group analysed more than 14 prediction methods. (8) Generally, prediction accuracy differs significantly between proteins used to develop a method and proteins never seen by a method [87, 88, 89] . For membrane proteins, this effect is very difficult to estimate because few high-resolution structures of membrane proteins are added over a course of a year. While Möller et al. tried to estimate this effect by analysing only proteins not used for developing a method, they did not rule out that the proteins tested in the category ‘not known to the method’ were similar to proteins used for development. Surprisingly, Möller et al. found most methods to perform better on proteins not used for development. Given how prediction methods are developed, it is very unlikely that this result holds, in general. Either, the differences are not significant, or the data sets were not representative (or both).

To resolve these limitations and to standardise membrane helix prediction performance comparisons, we have presented an analysis that distinguished between performance on redundancy-reduced high- and low-resolution data sets, established thresholds for significant differences in performance by introducing a bootstrap experiment, and implemented both per-segment and per-residue analysis of membrane helix predictions. Additionally, we analysed more methods (8 publicly available advanced prediction methods and 19 different hydrophobicity scales). In particular, we included alignment-based prediction methods. Furthermore, we tested membrane helix prediction methods on a large, representative set of 1418 unique signal peptides and 616 unique globular protein folds taken from SCOP [90] . While we confirmed many previous findings, overall, our results differed greatly in detail from previous publications.


Results


Accuracy in predicting membrane helices

Prediction methods not significantly less accurate than low-resolution experiments! We compared the membrane annotations for 13 proteins for which we had both low-resolution and high-resolution data available. While about 94-96% of the helices agreed between the two experimental methods, for only 11 of the 13 proteins did all helices overlap between the two experimental methods ( Table 1 ). Also, the two methods agreed on only 82% of all residue assignments (Q 2 , Table 1 ). A detailed comparison of the percentage of identically assigned membrane helical residues confirmed that for most cases, the differences arose from the longer segments observed in the high-resolution data ( < ). Assuming that the high-resolution data were correct, we can interpret the low-resolution data as an experimental prediction of transmembrane helices. Surprisingly, most prediction methods performed as well as the low-resolution experiments ( Table 1 ). In fact, in terms of almost all measures for accuracy, we could find one method that numerically agreed more with the high-resolution data than the low-resolution experiment. However, given the small size of the data set, this statement ignored the error margins in the estimate for accuracy.

TABLES in separate window

Simple hydrophobicity-based predictions were less accurate than advanced methods. Of the methods that only used hydrophobicity scales for prediction, none detected all membrane helices correctly for more than 70% of the high-resolution proteins ( Table 2 Qok). However, most methods correctly identified more than 90% of all observed membrane helices ( Table 2 ). In fact, measured by this score alone, most simple hydrophobicity-based methods appeared more accurate than many advanced prediction methods, but this success was achieved by over-predicting membrane helices ( Table 2 : < ). Encouragingly, over 80% of the helices predicted by most methods were correct ( Table 2 ). Unfortunately, the real problem with the simple methods was that they did not correctly predict the non-membrane regions as apparent in levels of less than 70% correctly predicted residues ( Table 2, Q2). Note that we implemented all simple hydrophobicity scales by using the algorithm proposed by the White group [77] . To ensure that this optimised or at least did not penalise membrane protein prediction for some hydrophobicity scales, we also tested thresholds suggested in the original publications for the GES and KD scales [18, 13] . Interestingly, the originally proposed thresholds decreased prediction accuracy ( Table 1App, Appendix).

TABLES in separate window

Most advanced predictions were correct. All advanced prediction methods correctly identified all helices for most high-resolution proteins ( Table 2 Qok). In contrast, the only two methods we found to also accurately predict the orientation of the helices, i.e. the topology, most often were TopPred2 and HMMTOP2 ( Table 2 TOPO). Note that HMMTOP2 was developed using ALL the 36 high-resolution chains for which we compiled the results. On the other hand, TopPred2 used only four of the 36 chains when it was developed. All methods tested correctly predicted more than 70% of the residues in either of the two states, TMH (T) and non-TMH (N, Table 2 Q2). However, all methods significantly under-predicted residues in membrane helices ( < , Table 2 ).

No single advanced method best by all scores. The set of 36 high-resolution proteins was small enough to require extreme caution in ranking methods based on numerical differences. When comparing pairwise ranks of the methods according to various scores, we found that no advanced method performed consistently best, and none consistently worst ( Fig. 1 ). Interestingly, TMHMM1 and TopPred2 appeared to be the most 'representative' methods in that the scores for these methods were most often indistinguishable from all other advanced methods in pairwise comparisons. In contrast, DAS appeared to be most 'unique' in that it was often better and often worse than all other methods. Three methods were clearly more often worse than better: WW (5 times better / 30 times worse), PRED-TMR (6/23), and SOSUI (7/26). Three methods were clearly more often better than worse: HMMTOP2 (21 times better / 1 time worse), PHDpsihtm08 (27/2), and PHDhtm08 (20/6).



Fig. 1
fig1.gif

Fig. 1. : Pairwise comparison of methods. For all high-resolution results compiled in Table 2, we show the pairwise comparison for eight different scores, and nine methods. Differences by more than one (two) standard error(s) are marked by one (two) arrow(s). Empty boxes indicate that the difference between the respective scores of the two methods is not significant. For example, DAS is two standard errors better than WW in terms of the number of correctly predicted proteins (Qok),, while HMMTOP2 is two standard errors better than DAS in terms of the overall per-residue accuracy (Q2). The lower table summarises the respective counts of pair-comparisons for which a particular method is better or worse than the others. TopPred2 and TMHMM1 appear to be the most 'neutral' method (44 and 46 times indistinguishable) while DAS seems the most 'unique' method in that it is often better than the others and equally often worse. Note: only DAS, PHDhtm08, PHDpsihtm07, and TopPred2 did not use most of the proteins tested to optimise prediction accuracy; thus, the results for all the other methods are likely to be over-estimates.





 

Performance on low-resolution data set: distinct differences. The low-resolution set was considerably larger (165 proteins) than the high-resolution set (36 chains). Nevertheless, we could still not find any method that performed consistently better than all the others ( Table 3 ). Most methods reached better per-segment scores for the high- than for the low-resolution data. The opposite was the case for per-residue scores as they were consistently higher for the low-resolution proteins. Most surprising may be the significant differences between the two data sets in terms of the percentage of proteins for which all helices were correctly predicted for the old methods DAS and TopPred2 (Qok in Table 2 and Table 3 ). Even more stunning was the extremely poor performance of most simple methods using only hydrophobicity scales for the prediction. Interestingly, for the hydrophobicity scales, the two newest ones (WW and Ben-Tal) performed overall best on the data from low-resolution experiments.

TABLES in separate window

Most errors were under- or over-predictions of one TMH. The good news was that all methods predicted the number of membrane helices correctly for most proteins ( Fig. 2 ). However, this number differed significantly between the high- (71%) and the low-resolution data (56%). The majority of deviations were to predict one helix too few or one too many (68% for high; 64% for low-resolution, Fig. 2 centre). Interestingly, the errors were rather symmetric for the low-resolution set, while they were substantially asymmetric for the high-resolution data. We could not find any significant correlation between the number of membrane helices and the errors of a particular method (data not shown). However, this may be largely due to the few high-resolution structures in our data set.



Fig. 2
fig2.gif

Fig. 2. : Over- and under-prediction of membrane helices. All methods (top panel): For all methods and all proteins in the high- and low-resolution sets, the difference between the number of membrane helices predicted and observed is shown. Although the two distributions appear rather similar, the higher symmetry in the low-resolution graph hid that the percentages with no difference were quite different: 71 % for the high-resolution data and 56% for the low-resolution data. The inset (centre) underlined the observation that the majority of errors were due to under- or over-predicting one helix.





 

Accuracy lower for proteins with > 5 TMHs. For proteins with five or fewer membrane helices, the average over all advanced methods exceeded 80% (Qok, eqn. 4 ) for the high-resolution data and 60% for the low-resolution data ( Fig. 3 ). However, prediction accuracy dropped significantly for proteins with more than five helices to values between 33-36% ( Fig. 3 ). Why are proteins with ≤ 5 TMHs so different from proteins with ≥ 6 TMHs? Answers to this question remain speculative.



Fig. 3
fig3.gif

Fig. 3. : Proteins with many helices predicted less accurately. We binned the results for all advanced methods according to the number of observed membrane helices such that the three classes contained similar numbers of proteins (x-axis). Accuracy (y-axis) is measured in terms of the percentage of proteins for which all helices are correctly predicted (Qok). Both, for the high- and the low-resolution data, proteins with more than five membrane helices were predicted at significantly lower levels of accuracy.


Most proteins and most helices correctly predicted by one of the methods. None of the high-resolution helices has been consistently mis-predicted by all programs. However, this may reflect that the more recent methods used all these proteins for training. In contrast, three transmembrane helices from three proteins of the low-resolution set were not identified by any of the methods: (1) The C4-dicarboxylate transport protein from Rhizobium meliloti, (SWISS-PROT identifier dcta_rhime; helix from residues 282-300, sequence ALPGLMNKMEKAGCKRSVV) has a relatively hydrophobic sequence, but it has a polar stretch of residues, NKMEK, in the middle of the helix. The gene fusion constructs were not always created with the reporter gene present in the predicted loop regions [91] . In some cases, the reporter gene was present in the predicted membrane regions. This is a problem because it may alter the topological placement of the reporter gene with respect to the membrane. In addition, gene fusion constructs were not made for each loop region because reporter genes were introduced at random. Hence, each loop was not tested, which included loops for helix 282-300, for its topological placement. Hence, the experimental evidence for this membrane helix (282-300) was weak, at best. (2) The Haemolysin Secretion ATP-Binding Protein from Escherichia coli (hlyb_ecoli, residues 38-51, sequence GTGLGLTSWLLAAK) is an integral membrane protein. However, the particular membrane helix missed appears very short. The other seven membrane helices of hylb_ecoli are at least 20 residues long. However, some authors have claimed that membrane spanning helices may be as short as 10 residues long [92] . The experimental evidence for hylb_ecoli had similar problems as that for dcta_rhime: The experimentalists found it difficult to identify membrane-spanning regions through predictions [93] . This was due to the high proportion of hydrophilic residues in the N-terminal portion of HlyB. Consequently, the authors did not know where to insert their reporter gene, which in this case was b-lactamase. Thus, they randomly inserted the reported gene. Additionally, topological models identify the short stretch as loop [93, 94] . (3) Like all other problematic cases, the Mitochondrial brown fat uncoupling protein 1 from Rattus norvegicus (ucp1_rat, residues 178-194, sequence PNLMRNVIINCTELVTY) has transmembrane regions that contain many polar residues. For this protein the experimentalists stated that their data did not suffice to strongly conclude that residues 178-194 are in a membrane helix [95] .

No significant difference in performance for prokaryotic and eukaryotic proteins. We compared the performance of each method for eukaryotic and prokaryotic proteins. Most methods did not consistently perform better for both the high- and low-resolution data ( Table 4 ; DQok). In fact, the trends differed greatly between both data sets, and for different measures of prediction accuracy. While prokaryotic proteins were predicted more accurately in terms of per-segment measures for the high-resolution data sets, the opposite was the case for most methods when compared on the low-resolution set. Only four methods had a similar trend in Qok: PRED-TMR predicted eukaryotic proteins more accurately; SOSUI, TopPred2 and WW predicted prokaryotic proteins more accurately for both sets. However, none of the values exceeded two times the estimated error, i.e. none was statistically very significant. All methods predicted topology (∆TOPO) better for the prokaryotic proteins in the high-resolution set and for the eukaryotic proteins in the low-resolution set. When measuring prediction accuracy in terms of per-residue performance (DQ2), we could not find any significant difference between prokaryotic and on eukaryotic proteins; all methods did slightly better for eukaryotic proteins for both high- and low-resolution data. Nevertheless, due to the lack of consistent direction of the difference and for the lack of statistical significance, our data did not support the previously published conclusion that either prokaryotic or eukaryotic proteins were predicted more accurately.

TABLES in separate window

Accuracy of distinguishing between membrane and other proteins

Few false positives: Best methods found few membrane helices in globular proteins. Most advanced methods correctly distinguished between membrane and globular proteins ( Table 5 ). The best methods confused between the two types of proteins for less than four percent of all globular proteins tested ( Table 5 ). DAS had the highest error rate of the advanced methods (16% false positives) which was surprising given that DAS tended to under-predict residues in membrane helices. In contrast to the advanced methods, the simple methods distinguished only poorly between membrane and globular proteins. The two exceptions were the old scale from Wolfenden [42] and the new one from Ben-Tal [2] . The latter also predicted membrane proteins rather accurately ( Table 2 and Table 3 ). However, most methods found helices in over 90% of all the globular proteins.

>>>Table 5<<<

Few false negatives. Most methods find all membrane proteins. While most hydrophobicity scales detected membrane helices in over 90% of the globular proteins, they detected all membrane proteins as such. The exceptions were the two scales that were best in rejecting globular proteins: Wolfenden and Ben-Tal ( Table 5 ). Similarly, PHDhtm08 mis-classified only 2% of the globular proteins, but also missed about 20% of the membrane proteins. The only methods that mis-classified less than 10% of the globular proteins and over-looked less than 10% of the membrane proteins were: SOSUI, TMHMM1, PHDpsihtm, PRED-TMR and HMMTOP2 ( Table 5 ).

Signal peptides falsely predicted to be membrane helices by most methods. Even the advanced methods had high error rates for signal peptides ( Table 6 ). In fact, one of the most accurate rejections of signal peptides was achieved by the simple method solely using the Wolfenden [96] hydrophobicity scale (26% errors). Many of the false predictions were at the very beginning of the respective secreted proteins. Thus, we tested the following simple expert rule: 'delete all membrane helices predicted between 5-10 residues after an N-terminal Methionine'. For PHDpsihtm08 this reduced the falsely predicted signal peptides from 322 (23%) to 146 (10%). Encouragingly, when we applied the same rule to the set of membrane proteins, no helix was removed by this rule. For three out of the 1418 signal peptides, PHDpsihtm08 incorrectly predicted two transmembrane helices.

>>>Table 6<<<

TABLES in separate window

 

 

Discussion


Confirming previous analyses

Some methods correctly distinguish globular from helical membrane proteins. Previous analyses showed that simple hydrophobicity-based methods have problems to distinguishing between helical transmembrane and globular proteins [78, 22, 79, 77, 74] . In general, we confirmed this finding ( Table 5 ). However, the Wolfenden and the Ben-Tal scales were clearly exceptional in this respect. Both performed on par with the best advanced methods that predict membrane helices in at most 3% of all globular proteins ( Table 5 ). Interestingly, these levels of accuracy are similar to the performance of the same methods six years ago [97, 27] . This finding confirms that the globular proteins added to PDB over the last decade are not radically different from the structures that we knew before [80, 98] . Möller and colleagues published significantly more pessimistic estimates for the confusion between globular and membrane proteins [74] . While our estimates were based entirely on proteins of known structure, those from Möller et al. were based on proteins of unknown structure. Thus, we see two possible reasons for the difference between the two estimates. (1) Proteins in PDB differ from proteins in SWISS-PROT in their average length by almost a factor of two since structural biologists often have to truncate the proteins to obtain high-resolution structures. We might argue that the truncated regions are more likely to be confused with membrane helices than the regions for which structure is determined. (2) Many of the proteins used by Möller and colleagues may in fact contain membrane helices or signal peptides (for which the error is higher, Table 6 ). We suspect that the truth lies somewhere between the two extremes. Hence, our estimates for the confusion between globular and membrane proteins may be slightly optimistic.

Most methods confuse signal peptides and membrane helices. Möller et al. tested prediction methods on 34 signal and target peptides. They found that most methods incorrectly predicted these regions to contain membrane helices. We tested all 27 methods on 1418 sequence unique signal peptides. Our results confirmed the previously uncovered trends ( Table 6 ). However, the larger set that we used unravelled that TMHMM1, which is one of the best methods in this respect, confuses over 30% of the signal peptides with membrane helices rather than < 10% as previously estimated [74] . Most simple methods based only on hydrophobicity scales confused more than 90% of all the signal peptides with membrane helices (exception: Wolfenden scale, Table 6 ). The good news was that the error could be reduced by experts who discard all membrane helices predicted closer than ten residues to an N-terminal Methionine. In this best-case scenario, PHDhtm and PHDpsihtm falsely predicted only about 10% of the signal peptides as membrane helices. Possibly, combinations of membrane optimised and signal-peptide optimised programs could reduce this error rate.

Most methods identify most membrane helices. We confirmed [75, 76, 74] that many methods correctly predict most membrane helices ( Fig. 2 ). We also found the most common mistake to be the under- or over-prediction of a single transmembrane helix. However, our results differed in detail from previous analyses (see below).

TABLES in separate window

 


Resolving differences in previous analyses

Some methods are better - none is clearly best. Evaluations of membrane prediction methods are sometimes based on different definitions for performance accuracy. A particular example of the latter is to count a prediction of one long helix as correct although it stretches over two observed helices and thus misses the break in between the two. Another misleading standard procedure is to only report values covering one side of the coin, i.e. only the values of correctly predicted as percentage of observed or vice versa. Here, we carefully evaluated all methods on identical data sets and compiled all reasonable scores for prediction accuracy. To simplify the complexity, we focused in our report on a relatively limited number of scores. Another problem with many previous analyses is that authors have not estimated the error associated with a particular score. For example, from Table 1 we may conclude that HMMTOP2 is much better than TopPred2 when applying any measure for prediction accuracy. Although the numbers differed greatly, a thorough bootstrap experiment revealed that the performance of the two methods was indeed indistinguishable. We compared the methods in a pairwise manner for each score of the high-resolution data set ( Fig. 1 ). Some methods appeared more accurate than others. However, no method(s) performed consistently better than all others by more than one standard error ( Fig. 1 ). Our estimates of error margins explained the numerical differences found between three analyses [75, 76, 74] .

Simple hydrophobicity-based methods less accurate than advanced methods. Möller et al. suggested that simple hydrophobicity scale-based methods predict membrane helices almost as accurately as the best advanced methods [74] . We could not confirm this proposition. In contrast, we found that the best advanced methods were significantly more accurate than the best hydrophobicity-scale based methods, both in terms of per-segment and per-residue accuracy ( Table 2 and Table 3 ). The only possible exception may be the per-residue performance of the Ben-Tal scale for the low-resolution data ( Table 3 ). However, we did confirm that – due to over-prediction – a few hydrophobicity scale-based methods identify the observed membrane helices at a level of accuracy similar to that of advanced methods ( in Table 2 and Table 3). Jayasinghe et al. found that the WW hydrophobicity scale-based method that they introduced outperformed even the best advanced methods (“We find that (the) WW scale … identifies TM helices of membrane proteins with an accuracy greater than 99 %” [77] ). We could also not confirm this finding, no matter which definition of prediction accuracy we compared. Nevertheless, the major problem with simple hydrophobicity-based methods is their failure on globular proteins ( Table 5 ) and signal peptides ( Table 6 ). In fact, the error of hydrophobicity scales depends on the length of the protein. For example, the high-resolution chains had an average length around 215 residues, whereas low-resolution proteins were – on average - about 420 residues long. While hydrophobicity scales correctly predicted all helices in 28-65% of the short proteins ( Table 2 ), they only detected 5-29% for the long proteins ( Table 3 ). In particular, the scale that performed best on the high-resolution set (KD) dropped in accuracy from 65% (high) to 13% (low), while the scale that performed most poorly on the short proteins in the high-resolution data (Wolfenden) became best for the long proteins in the low-resolution data. The Wolfenden scale also performed relatively well on globular proteins ( Table 5 ) and on signal peptides ( Table 6 ). The price for the lack of over-prediction is a low accuracy in detecting membrane helices (under-prediction). Overall, the most successful hydrophobicity scale appeared the Ben-Tal scale which is based on the free energy of transferring an amino acid from water into the centre of the hydrocarbon region of a lipid bilayer [2] . It out-performed the Wolfenden scale for membrane proteins and for globular proteins, and it bested all other scales for the low-resolution set. Simple hydrophobicity scales obviously have tremendous importance for sequence analysis. However, to use them as the only criterion to predict membrane helices appears to be a bad idea.

Incorrect ranking by per-segment accuracy depends on definition of score. As discussed above, any attempt to rank prediction methods should account for the standard error in the estimated level of accuracy. A particular illustration of this finding is that different definitions of the accuracy in correctly predicting all helices ( eqn. 4 ) would slightly alter the ranks. For example, DAS scored worst amongst all advanced methods when an overlap of at least nine residues was required to consider a helix correctly predicted (definition introduced by [74] ), while it appeared to be the third-best of all advanced methods when we applied the definition introduced by Ikeda et al. [75] (Appendix, Table 1 App). When giving different ranks only for significant differences, this apparent contradiction was resolved. Most averages were relatively insensitive to whether we required an overlap of 3 or 9 residues between predicted and observed helix ( Table 1 App, Qok3 and Qok9). However, contrary to what has been claimed previously, some methods had lower averages when requiring nine overlapping residues. Similarly, for most methods the average scores did not change considerably when using the definition of Ikeda et al (Qok11Centre in Table 1 App). However, while the score was lower for most methods for which it differed from the other two, it was lower, for a few it was actually higher. These were methods that tended to under-predict helices. Overall, the dependence of ranking on the definition of the score used underscored the need to standardise evaluations.

Similar prediction accuracy for prokaryotic and eukaryotic membrane proteins. Ikeda et al. [75] found that prediction methods are consistently worse at predicting membrane proteins from eukaryotes than those from prokaryotes. We could not verify this finding. Both for the high- and for the low-resolution data sets, we found that some methods reached slightly higher levels on one than on the other ( Table 4 ). However, the differences were not significant.

TABLES in separate window

 


Novel findings

Low-resolution experiments not much more accurate than prediction methods. The low-resolution experiments differed substantially in their assignments of membrane helices from high-resolution experiments. In fact, for a small subset of 13 high-resolution chains, many prediction methods appeared to be as correct - or as incorrect - as previously deposited low-resolution experiments (Table 1). This problem was also reflected in the substantial differences between the numerical scores for some of the methods. For example, DAS, TopPred2 and the PHDhtm series used partial information about 9 of the 36 high-resolution chains for development. For these methods, the scores on the 27 cross-validated high-resolution chains were similar to those for the 36 high-resolution chains (data not shown). However, the per-segment scores for the low-resolution sets differed from those for the high-resolution sets ( Table 2 and Table 3 in particular Qok). There are two possible explanations for this: either the low-resolution set contains 'new motifs', or the low-resolution experiments over- or under-assign many helices. Such errors could result in a particular poor performance in terms of predicting all TM helices correctly. In fact, for the set of 13 proteins for which we had low- and high-resolution experiments Qok was low (84%, Table 1 ) for the low-resolution experiments. Furthermore, the observation that DAS, TopPred2 and the PHDhtm series got higher per-residue scores on the low-resolution data than on the high-resolution data suggested that the low-resolution assignments might not reflect completely new membrane motifs. Thus, the estimate for these cross-validated methods may be correctly estimated by the high-resolution data set ( Table 2 ).

Problems with topology assignments by low-resolution data. The topologies of two proteins were incorrectly assigned by the low-resolution experiments ( Table 1 ). These two proteins were (1) PDB: 1EHK:B / SWISS-PROT: COX2_THETH and (2) PDB: 1EUL:A / SWISS-PROT: ATA2_RABIT. (1) 1EHK:B has one membrane helix and the N-terminus is in the periplasm. Thus, PDB annotates the topology IN. In contrast, SWISS-PROT (release 34) annotates COX2_THETH with topology OUT despite experimental data suggesting otherwise [99] . Note that the latest SWISS-PROT release still annotates COX2_THETH as OUT. (2) The second pair is more complicated: the old SWISS-PROT release 20 entry for ATCA_RABIT was annotated with 10 membrane helices with topology IN, whereas the PDB structure 1EUL:A has 10 membrane helices with topology OUT. In contrast, the latest SWISS-PROT release for ATA2_RABIT annotates 10 helices, but still assigns the topology as IN according to antibody studies [100] . However, this experimentally determined topology may be incorrect due to non-specific antibodies for the N-terminus epitope. Indeed, the experimentalists noted that the antibody against the N-terminus was only immuno-reactive to the 1-243 N-terminal fragment rather than specific to the N-terminal twelve residues. At the same time, they argued that this antiserum can correctly locate the epitope for residues 1-12 [101] . They suggested that the N-terminus is cytoplasmic, but for other cytosolic loops, the authors observed enhanced antibody reactivities. Additionally, the N-terminus may be OUT because after solubilisation with C12E6, proteolysis did not drastically increase reactivity of antiserum 1-12. Furthermore, anti-sera to epitopes on all loop regions of ATA2_RABIT were not tested. Therefore, it would be useful to acquire information of the location of the other loops in ATA2_RABIT to verify the topological orientation of this protein.

All prediction methods missed only helices with weak experimental evidence. None of the helices in the high-resolution set and only three in the low-resolution set were missed by all advanced methods. As described above (Results), the experiments done for these three proteins were not fully convincing in terms of the assignments of transmembrane helices and topology. This observation suggests implementing a consensus prediction of membrane helices. The potential success of such an approach has been initially tried out by a couple of authors [102, 75] . However, these two initial attempts have focused only on advanced methods. Although advanced methods are more accurate than simple hydrophobicity-based methods, they tend to under-predict transmembrane helices, especially for high-resolution structures ( Table 2 ). Advanced methods could thus serve as a specificity filter for a consensus method. Using both advanced and simple methods could help to verify low-resolution experimental results from proteolysis and gene fusion.

Not all membrane proteins identified. The only advanced method that predicted all known helical membrane proteins to contain at least one helix was DAS ( Table 5 false negatives). However, the flip-side of the same coin was that DAS also performed poorly on globular proteins ( Table 5 false positives). The other extreme was PHDhtm based on conventional pairwise alignments that performed well in rejecting globular proteins while also missing almost one-fifth of the membrane proteins with the default parameters. Obviously, there is a trade-off between predicting too many globular as membrane proteins, and too many membrane as globular proteins. Possibly the best compromise was achieved by SOSUI and TMHMM that missed 6% of the membrane proteins while incorrectly predicting membrane helices in about 1% of all globular proteins. PHDhtm based on PSI-BLAST profiles (PHDpsihtm) reached a similar compromise: 8% of the membrane proteins were missed, and 2% of all globular proteins mis-predicted. Nevertheless, the problem of missing membrane proteins underlines once again that we need better methods that correctly distinguish between globular and membrane proteins.

Dependence of prediction accuracy on number of helices. We did not find any significant difference in the performance between proteins with one and many membrane proteins. In contrast, proteins with ≤ 5 membrane helices (≤ 5) were predicted more accurately than proteins with more (> 6, Fig. 2 B). While we could label the difference as significant, we failed to come up with any reasonable explanation for this finding. Readers may speculate that the numerical differences we observe between 6TM and 7TM proteins could be explained by the overabundance of transporters with buried charged residues. However, the number of proteins in each category was too small to validate such a fine-grained distinction.

 

 

Conclusions

We also over-estimated the performance. Although we spent considerable effort on comparing prediction methods, our comparisons suffered from one crucial problem: we do not have cross-validation data available for all methods. In fact, the only methods for which we had cross-validated results were DAS, PHDhtm, PHDpsihtm, TopPred2, and most of the simple methods using only hydrophobicity scales. Although the overall scores for the advanced methods did not differ substantially between the sets of 27 cross-validated and 36-non-cross-validated high-resolution chains (data not shown), they did differ markedly between the nine chains used for development and the 27 cross-validated chains. This seemingly contradictory result is explained by the simple fact that most high-resolution proteins were not used in the development of these methods. In contrast, the newer prediction methods PRED-TMR, SOSUI, TMHMM, and WW used most and HMMTOP2 used all of the high-resolution chains for development. In fact, we observed two trends: (1) newer methods were slightly better than older ones (HMMTOP2 was clearly more accurate than HMMTOP1 when tested on a small subset of the data), and (2) methods based on alignments were superior to those based on single sequences, in fact, when switching from using MaxHom alignments against SWISS-PROT as input to PHDhtm to using PSI-BLAST alignments against all known sequences (BIG, PHDpsiHtm), prediction accuracy increased considerably.

Most methods get most membrane helices, but the type of membrane protein is often wrong. The most common mistake was the under- or over-prediction of one transmembrane helix. This appears encouraging in terms of prediction methods, in general. However, membrane predictions are so important in the context of analysing entire proteomes because the number of orientation of the helices typically reveals aspects about function. In fact, only the very best methods predict all helices and the topology more often correctly than not. We may rightfully argue that current methods are still not good enough. Because both the number of helices and their orientation can easily be altered by engineering [103, 104, 105, 106] the task at hand is, however, not an easy one. These experiments along with our analysis of the conservation of transmembrane helices strongly argue against the view that the number and orientation of membrane helices constitute a 'solid reality written into the sequence'. Rather, single residue exchanges can alter these macroscopic features. Thus, correct predictions require a precision typically not achieved. Perhaps current methods have reached the maximum possible level of accuracy and that the chapter of 'simply' predicting the location and orientation of membrane proteins is closed. With the recent high-resolution structures challenging common assumptions and our current analysis highlighting the number of urgent problems with prediction methods, we strongly doubt this. Therefore, we challenge that the issues elucidated in this investigation have reopened the field rather than closed it.

 

Methods

Data sets

High-resolution data sets for membrane proteins. We started with a total set of 105 chains from helical membrane proteins for which a high-resolution structure was deposited in PDB [4] . We identified these as helical membrane proteins according to the excellent up-to-date collection of membrane proteins at blanco.biomol.uci.edu [76] .

Low-resolution data sets for membrane proteins. We used an expert-curated set of 165 helical membrane proteins that was collected by Stefan Möller and colleagues [63] . For all these proteins good low-resolution experimental evidence about localisation was available. For the comparison between high-resolution and low-resolution data, we used the annotations we found about transmembrane helix location in old SWISS-PROT versions released prior to the publication of the high-resolution structures.

High-resolution data set for globular proteins. The EVA server [10] continuously maintains a sequence-unique subset of PDB proteins. We used the version from Jul 2001 with 1852 representative protein chains. From that set we first removed all membrane proteins. Then we removed all proteins that were similar to one representative in a SCOP super-family [107, 108] . Representatives were taken to be the longest proteins in the respective super-family. This procedure yielded a final set of 616 globular protein chains.

Data set of proteins with known signal peptides. Henrik Nielsen and colleagues at the CBS in Copenhagen keep an up-to-date list of experimentally known signal peptides at their Web site (www.cbs.dtu.dk/ftp/signalp/readme). This group also spent considerable effort at defining thresholds for what constitutes redundancy in sets of signal peptides [109, 32] . We downloaded a set of 1418 sequence unique signal peptides from a total list of 2845.

Sequence-unique subsets reduce bias. Many of the proteins for which we have information about TM regions are similar to one another. If we want to analyse prediction methods or simple features such as TM length, this bias is problematic. In order to reduce the bias from the set of enzymes of known function, we have to first generate all-against-all alignments that capture the bias existing in that set. Then, we have to choose the maximal subset that fulfils the constraint that no pair in that subset is sequence similar. Technically, we accomplished this objective in the following way. First, a pairwise BLAST [5] aligned all membrane proteins against each other. Second, the resulting pairs were filtered applying the HSSP-threshold (value J = 0, below) such that all remaining pairs were likely to have similar structures. Third, the resulting families were sorted by number of members and length. Fourth, all pairs were clustered with a simple greedy algorithm starting with the largest and longest families [110] . Note that the threshold chosen roughly translated to 'no pair with more than 33% sequence identity over more than 100 residues aligned'. In particular, we used the following formula to compile the distance DIST from the HSSP-curve HSSP_PIDE [111] :

   (Eq. 1)

where PIDE is the percentage pairwise sequence identity (ignoring gaps and insertions). This procedure yielded 36 proteins in the high-resolution set, and 165 proteins in the low-resolution set.

 

Programs tested

Building multiple alignments. Two different alignment schemes were explored: (1) dynamic programming method MaxHom [21] , and (2) profile-based PSI-BLAST [29] . The particular protocol for finding similarities with PSI-BLAST applied the usual precautions to avoid drift and pollution [112, 28] . Searches were restricted to three iterations, and the iteration parameter (H-value) to 10-10 was set. The search databases were SWISS-PROT [3] and BIG (= SWISS-PROT [3]  + TrEMBL [3] + PDB [4] ). To explore the conservation of membrane helices, we filtered all MaxHom alignments according to various distances J ( eqn. 1 ).

Advanced prediction methods. We referred to prediction methods as 'advanced' when they implement more than 'simple' hydrophobicity scales. We tested the following programs: DAS, HMMTOP (version 2), PHDhtm, PHDpsihtm, PRED-TMR, SOSUI, TMHMM (version 2), and TopPred2. TopPred2 averages the GES-scale of hydrophobicity [13] using a trapezoid window [40, 113] . PHDhtm combines a neural network using evolutionary information with a dynamic programming optimisation of the final prediction [79, 27] . DAS optimises the use of hydrophobicity plots [41] . SOSUI [33] uses a combination of hydrophobicity and amphiphilicity preferences to predict membrane helices. TMHMM is the most advanced – and seemingly most accurate - current method to predict membrane helices [37] . It embeds a number of statistical preferences and rules into a Hidden Markov model to optimise the prediction of the localisation of membrane helices and their orientation (note: similar concepts are used for HMMTOP [16] ). PRED-TMR uses a standard hydrophobicity analysis with emphasis on detecting the ends and beginnings of membrane helices [114] .

Simple methods exclusively based on hydrophobicity scales. We also implemented our in-house prediction methods that simply used various hydrophobicity scales for prediction. In particular, we tested the following scales. A-Cid: normalised hydrophobicity scale for alpha-proteins [1] , Av-Cid: normalised average hydrophobicity scale [1] , Ben-Tal: Hydrophobicity scale representing free energy of transfer of an amino acid from water into the centre of the hydrocarbon region of a model lipid bilayer [2] , Bull-Breese: Bull-Breese hydrophobicity scale [6] , Eisenberg: normalised consensus hydrophobicity scale [8] , EM: Solvation free energy [9] , Fauchere: hydrophobic parameter pi from the partitioning of N-acetyl-amino-acid amides [12] , GES: hydrophobicity property [13] , Heijne: transfer free energy to lipophilic phase [15] , Hopp-Woods: Hopp-Woods hydrophilicity value [17] , KD: Kyte-Doolittle hydropathy index [18] , Lawson: transfer free energy [19] , Levitt: Hydrophobic parameter [20] , Nakashima: normalised composition of membrane proteins [24] , Radzicka: transfer free energy from 1-octanol to water [30] , Roseman: solvation corrected side-chain hydropathy [31] , Sweet: optimal matching hydrophobicity [35] , Wolfenden: hydration potential [42] , and WW: Wimley-White scale [77] . Replacing the WW scale with each of the above-mentioned hydrophobicity indices, we used the WW algorithm to evaluate the predictive performance of each index.

 

Measuring accuracy

Measuring per-segment accuracy. The ultimate goal of prediction methods obviously is to correctly predict all residues. Assume a protein with 10 membrane helices of 20 residues each; method A predicts 10 helices but gets the five residues at each end of each helix wrong and method B misses four helices but gets the ends for the other six entirely right. Which method is better? Possibly, many readers would favour method A. This problem is captured in using two different scores measuring prediction accuracy in the field of globular secondary structure prediction: per-residue scores and per-segment scores [80, 115] . While globular secondary structure segments are – on average – rather short (helices ~10 residues, strands ~ 5 residues), membrane helices are rather long. Consequently, the problem of evaluating the per-segment accuracy allows a more coarse-grained measure than required for globular secondary structure prediction [115, 116] . There are two separate issues to address when defining a helix to be predicted correctly. The first concerns counting the same helix twice. We used the following simple concept of 'correctly predicted segment':

In particular, the observed helix O2 is NOT correctly predicted, since P1 overlaps already with O1. Similarly, P2 is counted as correct with respect to O3, while P3 is not. The second issue concerns the minimal overlap required between the observed and predicted helix. If not stated otherwise, we required a minimal overlap of 3 residues, following the definitions previously used in many other publications [40, 22, 82, 117, 79, 36, 27, 37] . Möller et al. used a similar procedure [74] , however, they required an overlap of at least 9 rather than 3 residues. Other groups required a minimal overlap of 1 residue (e.g. [41, 16] ). Jayasinghe required an overlap of 9 [76] and 3 [77] residues, however, in both publications, they counted the same predicted helix twice, thus yielding 100% accuracy for the overlap between O1/P1 and O2/P2 in the above sketch. Yet another measure was introduced by Ikeda et al. [75] : helices were considered as correctly predicted if the centres of the predicted and the observed helix overlapped by at least 11 residues. The different measures are illustrated in the following example for a prediction (T=Transmembrane):

observed: --TTTTTTTTTTTTTTTTTTTT---------TTTTTTTTTTTTTTTTTTT-
predict 1:----------------------TTTTTTTTTT-------------------
predict 2:------------------TTTTTTTTTTTTTTTT-----------------
predict 3:--------------TTTTTTTTTTTTTTTTTTTT-----------------
predict 4:------------TTTTTTTTTTTTTTTTTTTTT------------------

Jayasinghe et al. (2001a) evaluates prediction 1 as 0% accurate and 2-4 as 100% accurate (two helices correct); Jayasinghe et al. (2001b) gives 1-2 = 0% and 3-4 = 100%; Tusnady & Simon (1998) gives 1-4 = 50% (one helix right, one not); [74] gives 1-2 = 0% and 3-4 = 50%; [75] gives 1-3 = 0% and 4 = 50%; the score that we refer to in this manuscript gives 1 = 0% and 2-4 = 50%. For comparison, we also provided a few other scores in the Supplementary Material (note that we, however, did not count helices twice in any of those definitions).

With this concept, we can compile the percentage of correctly predicted transmembrane helices:

   (Eq. 2)

estimates the likelihood that an actual membrane helix is correctly predicted. While this score can also be compiled for a single protein, it would be misleading to compile the score for each protein in a data set and then to average over all proteins. Rather, the number should be compiled by 'pooling' all membrane helices from an entire data set. Over-predictions are measured by the corresponding score:

   (Eq. 3)

estimates to the likelihood that a predicted TM is correctly predicted. These two scores are merged into a score that describes for which percentage of the proteins all TM segments are correctly predicted:

   (Eq. 4)

Thus, Qok becomes 100 if and only if for all proteins in the set both and reach 100%. Finally, we need to evaluate the accuracy of predicting the topology correctly:

   (Eq. 5)

Measuring per-residue accuracy. While the per-segment scores capture most of what experts would intuitively consider as important features of TMH prediction methods, we also need to monitor a number of per-residue scores that evaluate how accurately particular residues are predicted. In particular, the example of P2 and P3 above would yield 0 for all per-segment scores although the predictions somehow capture important information. The simplest per-residue score is the two-state per-residue accuracy Q2 that measures the percentage of residues predicted correctly in either of the two states T (membrane helix) or N (not membrane):

   ( Eq. 6)

Typically, most residues in membrane proteins are in globular regions [73] . Thus, non-membrane residues tend to dominate Q2. This problem can be overcome by simply measuring the percentage of residues correctly predicted in membrane segments:

   (Eq. 7)

Similar to the per-segment scores, over-predictions can be captured by the corresponding score:

   (Eq. 8)

and are the corresponding percentages for non-membrane residues. Finally, we monitored the Matthew's correlation index [118] that attempts to capture both over- and under-prediction of residues in transmembrane helices by one single score. This index is defined as following:

   (Eq. 9)

where pT is the number of residues correctly predicted as membrane helix (TMH), nT is the number of residues correctly predicted as non-TMH, while uT and oT are the number of residues under- and over-predicted, respectively.

Estimating error for per-residue accuracy: standard error. For globular proteins, prediction accuracy varies considerably between different proteins [119, 26] . The corresponding distributions can be approximated by Gaussian distributions. Thus, we can estimate the standard error of score Q by the simple rule-of-thumb:

   (Eq. 10)

where s is the standard deviation for score Q based on a data set of Nprot-large proteins. This set has to be sufficiently large to actually observe a normal distribution. Assume that we only have a much smaller data set of Nprot-set proteins, we can then still approximate the standard error by using the standard deviation compiled over the large data set. While this concept is easy to apply to evaluations of globular prediction methods [10, 85] , for the situation of membrane proteins, we simply do not have a sufficient number of high-resolution structures to 'once and for all' estimate s. There is no 'clean' solution to this problem. Here, we used the following approximation:

    ( Eq. 11 )

that is, we used the maximal possible standard error. Assume that s = 20 for a set of 13 proteins, s = 10 for a set of 36 proteins, and s = 15 for a set of 27 proteins. Then we used s = 20 for the first, and s = 15 for the other two.

Estimating error for per-segment accuracy: bootstrap experiment. The above concept to estimate the error in evaluating performance is not applicable for the per-segment scores, since these are not distributed normally. To illustrate the problem for the topology prediction: scores can be 1 (correct topology) or 0 (incorrect) for one protein. The score TOPO (eqn. 5 ) averages over all proteins hence provides one single final value, rather than a distribution. One way to still estimate the error in such a situation is the bootstrap experiment [120, 121] . The principle procedure is the following (Fig. Sketch2). (1) Assume we have a set of N=36 proteins, each with correct or incorrect topology. (2) Choose a random subset of K<N proteins, and compile the average (TOPO) over these K proteins. (3) Repeat M times and estimate the error based on the resulting distribution of averages. In other words, the bootstrapping experiment attempts to estimate how sensitively a score depends on a particular data set chosen. Albeit often surprisingly powerful, bootstrapping is a more coarse-grained approximation. In particular, we used the following parameters to estimate errors for per-segment scores: M = 100 (100 random picks), and K = int(N/2), i.e. for each random pick we chose half of the proteins available in the respective sets. Finally, we applied the same approximation as depicted in Eq. 11, i.e. reported a rather conservative estimate for the error.

Ranking methods. Given methods A and B evaluated on a set with N proteins, when can we conclude that the performance of A (Q(A)) is significantly better than that of B (Q(B))? The error estimates provide an answer to this question: We cannot distinguish between A and B if:

  ∆Q = Q(A) - Q(B) ≤ SE(Q)  (Eq. 12)

Thus, we can rank only if A and B differ by more than the error. For example, when a method correctly predicts 75% of the residues in a test set of 16 proteins with a standard deviation of 10%, a difference relative to another method that is smaller than 2.5% (i.e., ∆Q = 10/sqrt(16)) is not significant. Thus, we cannot distinguish between two methods that predict correctly 75% and 73% of all residues, respectively. We used this estimate to rank methods in the following way. Assume four methods have accuracy levels of A=75, B=73, C=71, and D=68. D can be distinguished from all other methods (∆Q > 2.5 to all). Hence, it ranks last. C can be distinguished from A (∆Q = 4 > 2.5). However, A cannot be distinguished from B (∆Q = 2 < 2.5), and B cannot be distinguished from C (∆Q=2 < 2.5). This situation results in a dilemma that has four different possible solutions: (I) A, B and C get the same rank ascertaining that no two methods are ranked differently that cannot be distinguished. (II) A and B get rank 1, and C rank 2 assuring that no two methods are ranked equally that can be distinguished. (III) A gets rank 1, B rank 2 and C rank 3, ignoring that we cannot distinguish between A and B, nor between B and C. (IV) Do not rank. None of these solutions is 'correct'. Here, we applied solutions (IV) and (I). For the example given solution (I) implied that A, B, and C ranked first; D ranked second. However, this simplification ignored another intrinsically insurmountable problem: What if method A is significantly better than method B in terms of Q2 and significantly worse in terms of Qok? Occasionally, the following ad-hoc solution is presented to such a problem: rank all methods on all scores and compile averages over ranks (Tables 3 and 5).


Electronic versions of data

All data sets and a few additional results are available through our Web site at: 
http://cubic.bioc.columbia.edu/papers/2002_htm_eval/data and http://cubic.bioc.columbia.edu/papers/2002_htm_eval/appendix/.

 

Acknowledgements

Thanks to Jinfeng Liu (Columbia) for computer assistance and the collection of genome data sets; to Jinfeng Liu and Dariusz Przybylski (Columbia) for providing preliminary information and programs. Particular thanks to Volker Eyrich (Columbia) for making the META-PredictProtein server available! The work of BR was supported by the grants 1-P50-GM62413-01 and RO1-GM63029-01 from the National Institute of Health. Last, not least, thanks to all those who deposit their experimental data in public databases, and to those who maintain these databases.

 


 

 

 

References

1.Cid, H., Bunster, M., Canales, M.and Gazitua, F. (1992). Hydrophobicity and structural classes in proteins.Prot. Engin., 5, 373-375.
2.Kessel, A. & Ben-Tal, N. (2002).Free energy determinants of peptide association with lipid bilayers. InPeptide-lipid interactions (Simon, S. & McIntosh, T., eds.), pp. in press,Academic Press, San Diego.
3.Bairoch, A. & Apweiler, R.(2000). The SWISS-PROT protein sequence database and its supplement TrEMBL in2000. Nucl. Acids Res., 28, 45-48.
4.Berman, H. M., Westbrook, J., Feng,Z., Gillliland, G., Bhat, T. N. et al. (2000). The Protein Data Bank. Nucl.Acids Res., 28, 235-242.
5.Altschul, S. F. & Gish, W. (1996).Local alignment statistics. Meth. Enzymol., 266, 460-480.
6.Bull, H. B. a. B., K. (1974).Surface tension of amino acid solutions: A hydrophobicity scale of the aminoacid residues. Arch. Biochem. and Biophys., 161, 665-670.
7.Kabsch, W. & Sander, C. (1983).Dictionary of protein secondary structure: pattern recognition of hydrogenbonded and geometrical features. Biopolymers, 22, 2577-2637.
8.Eisenberg, D., Weiss, R. M. &Terwilliger, T. C. (1984). The hydrophobic moment detects periodicity inprotein hydrophobicity. Proc. Natl. Acad. Sci. U.S.A., 81, 140-144.
9.Eisenberg, D. & McLachlan, A. D.(1986). Solvation energy in protein folding and binding Nature, 319, 199-203.
10.Eyrich, V., Martí-Renom, M.A., Przybylski, D., Fiser, A., Pazos, F. et al. (2001). EVA: continuousautomatic evaluation of protein structure prediction servers. Bioinformatics,17, 1242-1243.
11.Eyrich, V., Martí-Renom, M.A., Przybylski, D., Fiser, A., Pazos, F. et al. (2001). EVA: continuousautomatic evaluation of protein structure prediction servers. 2001, .
12.Fauchere, J. L. & Pliska, V.(1983). Hydrophobic parameters pi of amino-acid side chains from thepartitioning of N-acetyl-amino-acid amides. Eur. J. Med. Chem., 18, 369-375.
13.Engelman, D. M., Steitz, T. A.& Goldman, A. (1986). Identifying nonpolar transbilayer helices in aminoacid sequences of membrane proteins. Annu. Rev. Biophys. Biophys. Chem., 15,321-353.
14.Prabhakaran, M. (1990). Thedistribution of physical, chemical and conformational properties in signal andnascent peptides. Biochem. J., 269, 691-696.
15.von Heijne, G. & Blomberg, C.(1979). Trans-membrane translocation of proteins: The direct transfer model.Eur. J. Biochem., 97, 175-181.
16.Tusnady, G. E. & Simon, I.(1998). Principles governing amino acid composition of integral membraneproteins: application to topology prediction. J. Mol. Biol., 283, 489-506.
17.Hopp, T. P. & Woods, K. R.(1981). Prediction of protein antigenic determinants from amino acid sequences.Proc. Natl. Acad. Sci. U.S.A., 78, 3824-3828.
18.Kyte, J. & Doolittle, R. F.(1982). A simple method for displaying the hydrophathic character of a protein.J. Mol. Biol., 157, 105-132.
19.Lawson, E. Q., Sadler, A. J.,Harmatz, D., Brandau, D. T., Micanovic, R. et al. (1984). A simple experimentalmodel for hydrophobic interactions in proteins. J. Biol. Chem., 259, 2910-2912.
20.Levitt, M. (1976). A simplifiedrepresentation of protein conformations for rapid simulation of protein folding.J. Mol. Biol., 104, 59-107.
21.Sander, C. & Schneider, R.(1991). Database of homology-derived structures and the structural meaning ofsequence alignment. Proteins, 9, 56-68.
22.Jones, D. T., Taylor, W. R. &Thornton, J. M. (1994). A model recognition approach to the prediction ofall-helical membrane protein structure and topology. Biochem., 33, 3038-3049.
23.Eyrich, V. & Rost, B. (2000).The META-PredictProtein server. .
24.Nakashima, H., Nishikawa, K. &Ooi, T. (1990). Distinct character in hydrophobicity of amino acid compositionof mitochondrial proteins. Proteins, 8, 173-178.
25.Bernstein, F. C., Koetzle, T. F.,Williams, G. J. B., Meyer, E. F., Brice, M. D. et al. (1977). The Protein DataBank: a computer based archival file for macromolecular structures. J. Mol.Biol., 112, 535-542.
26.Rost, B. (1996). PHD: predictingone-dimensional protein structure by profile based neural networks. Meth.Enzymol., 266, 525-539.
27.Rost, B., Casadio, R. &Fariselli, P. (1996). Topology prediction for helical transmembrane proteins at86% accuracy. Prot. Sci., 5, 1704-1718.
28.Przybylski, D. & Rost, B.(2002). Alignments grow, secondary structure prediction improves. Proteins, 46,195-205.
29.Altschul, S., Madden, T., Shaffer,A., Zhang, J., Zhang, Z. et al. (1997). Gapped Blast and PSI-Blast: a newgeneration of protein database search programs. Nucl. Acids Res., 25,3389-3402.
30.Radzicka, A. & Wolfenden, R.(1988). Comparing the polarities of the amino acids: Side-chain distributioncoefficients between the vapor phase, cyclohexane, 1-octanol, and neutralaqueous solution. Biochem., 27, 1664-1670.
31.Roseman, M. A. (1988).Hydrophilicity of polar amino acid side-chains is markedly reduced by flankingpeptide bonds. J. Mol. Biol., 200, 513-522.
32.Nielsen, H., Engelbrecht, J.,Brunak, S. & von Heijne, G. (1997). Identification of prokaryotic andeukaryotic signal peptides and prediction of their cleavage sites. Prot.Engin., 10, 1-6.
33.Hirokawa, T., Boon-Chieng, S. &Mitaku, S. (1998). SOSUI: classification and secondary structure predictionsystem for membrane proteins. Bioinformatics, 14, 378-379.
34.Juretic, D., Zucic, D., Lucic, B.& Trinajstic, N. (1998). Preference functions for prediction of membrane-buriedhelices in integral membrane proteins. Comput. Chem., 22, 279-94.
35.Sweet, R. M. & Eisenberg, D.(1983). Correlation of sequence hydrophobicities measures similarity inthree-dimensional protein structure. J. Mol. Biol., 171, 479-488.
36.Persson, B. & Argos, P. (1996).Topology prediction of membrane proteins. Prot. Sci., 5, 363-371.
37.Sonnhammer, E. L. L., von Heijne,G. & Krogh, A. (1998). A hidden Markov model for predicting transmembranehelices in protein sequences. In Sixth International Conference on IntelligentSystems for Molecular Biology (ISMB98) (Glasgow, J., Littlejohn, T., Major, F.,Lathrop, R., Sankoff, D. et al., eds.), pp. 175-182, AAAI Press, Montreal,Canada.
38.Krogh, A., Larsson, B., von Heijne,G. & Sonnhammer, E. L. (2001). Predicting transmembrane protein topologywith a hidden Markov model: application to complete genomes. J. Mol. Biol.,305, 567-580.
39.Hofmann, K. & Stoffel, W.(1993). TMBASE - a database of membrane spanning protein segments. Biol. Chem.Hoppe-Seyler, 374, 166.
40.von Heijne, G. (1992). Membraneprotein structure prediction. J. Mol. Biol., 225, 487-494.
41.Cserzö, M., Wallin, E., Simon,I., von Heijne, G. & Elofsson, A. (1997). Prediction of transmembranea-helices in prokaryotic membrane proteins: the dense alignment surface method.Prot. Engin., 10, 673-676.
42.Wolfenden, R., Andersson, L.,Cullis, P. M. & Southgate, C. C. B. (1981). Affinities of amino acid sidechains for solvent water. Biochem., 20, 849-855.
43.Wimley, W. C., Creamer, T. P. &White, S. H. (1996). Solvation energies of amino acid side-chains and backbonein a family of host-guest pentapeptides. Biochem., 35, 5109-5124.
44.Wimley, W. C., Gawrisch, K.,Creamer, T. P. & White, S. H. (1996). A direct measurement of salt-bridgesolvation energies using a peptide model system: implications for proteinstability. Proc. Natl. Acad. Sci. U.S.A., 93, 2985-2990.
45.White, S. H. & Wimley, W. C.(1999). Membrane protein folding and stability: physical principles. Annu. Rev.Biophys. Biomol. Struct., 28, 319-365.
46.White, S. (2001). Membrane proteinsof known structure. .
47.Stack, J. H., Horazdovsky, B. &Emr, S. D. (1995). Receptor-mediated protein sorting to the vacuole in yeast:roles for a protein kinase, a lipid kinase and GTP-binding proteins. Annu RevCell Dev Biol, 11, 1-33.
48.Chapman, R., Sidrauski, C. &Walter, P. (1998). Intracellular signaling from the endoplasmic reticulum tothe nucleus. Annu Rev Cell Dev Biol, 14, 459-85.
49.Le Borgne, R. & Hoflack, B.(1998). Protein transport from the secretory to the endocytic pathway inmammalian cells. Biochim. Biophys. Ac., 1404, 195-209.
50.Chen, X. & Schnell, D. J.(1999). Protein import into chloroplasts. Trends in Cell Biolology, 9, 222-227.
51.Hettema, E. H., Distel, B. &Tabak, H. F. (1999). Import of proteins into peroxisomes. Biochim. Biophys.Ac., 1451, 17-34.
52.Pahl, H. L. (1999). Signaltransduction from the endoplasmic reticulum to the cell nucleus. Physiol Rev, 79,683-701.
53.Truscott, K. N. & Pfanner, N.(1999). Import of carrier proteins into mitochondria. Biol. Chem., 380, 1151-6.
54.Bauer, M. F., Hofmann, S., Neupert,W. & Brunner, M. (2000). Protein translocation into mitochondria: the roleof TIM complexes. TICB, 10, 25-31.
55.Ito, A. (2000). Mitochondrialprocessing peptidase: multiple-site recognition of precursor proteins. TICB,10, 25-31.
56.Soltys, B. J. & Gupta, R. S.(2000). Mitochondrial proteins at unexpected cellular locations: export ofproteins from mitochondria from an evolutionary perspective. Int Rev Cytol,194, 133-96.
57.Thanassi, D. G. & Hutltgren, S.J. (2000). Multiple pathways allow protein secretion across the bacterial outermembrane. Curr. Opin. Cell Biol., 12, 420-430.
58.Heusser, C. & Jardieu, P.(1997). Therapeutic potential of anti-IgE antibodies. Curr Opin Immunol, 9,805-813.
59.Bettler, B., Kaupmann, K. &Bowery, N. (1998). GABAB receptors: drugs meet clones. Curr Opin Neurobiol, 8,345-350.
60.Moreau, J. L. & Huber, G.(1999). Central adenosine A(2A) receptors: an overview. Brain Res Brain ResRev, 31, 65-82.
61.Saragovi, H. U. & Gehring, K.(2000). Development of pharmacological agents for targeting neurotrophins andtheir receptors. Trends Pharmacol Sci, 21, 93-98.
62.Sedlacek, H. H. (2000). Kinaseinhibitors in cancer therapy: a look ahead. Drugs, 59, 435-476.
63.Moller, S., Kriventseva, E. V.& Apweiler, R. (2000). A collection of well characterised integral membraneproteins. Bioinformatics, 16, 1159-1160.
64.McGovern, K., Ehrmann, M. &Beckwith, J. (1991). Decoding signals for membrane proteins using alkalinephosphatase fusions. EMBO J., 10, 2773-2782.
65.Hennessey, E. S. &Broome-Smith, J. K. (1993). Gene-fusion techniques for determiningmembrane-protein topology. Curr. Opin. Str. Biol., 3, 524-531.
66.Traxler, B., Boyd, D. &Beckwith, J. (1993). The topological analysis of integral membrane proteins. J.Membrane Biol., 132, 1-11.
67.van Geest, M. & Lolkema, J. S.(2000). Membrane topology and insertion of membrane proteins: search fortopogenic signals. Microbiol. Mol. Biol. Rev., 64, 13-33.
68.McGuigan, J. E. (1994). Antibodiesto complementary peptides as probes for receptors. Immunomethods, 5, 158-166.
69.Jermutus, L., Ryabova, L. A. &Pluckthun, A. (1998). Recent advances in producing and selecting functionalproteins by using cell-free translation. Curr Opin Biotechnol, 9, 534-548.
70.Morris, G. E., Sedgwick, S. G.,Ellis, J. M., Pereboev, A., Chamberlain, J. S. et al. (1998). An epitopestructure for the C-terminal domain of dystrophin and utrophin. Biochem., 37,11117-11127.
71.Amstutz, P., Forrer, P., Zahnd, C.& Pluckthun, A. (2001). In vitro display technologies: novel developmentsand applications. Curr Opin Biotechnol, 12, 400-405.
72.Wallin, E. & von Heijne, G.(1998). Genome-wide analysis of integral membrane proteins from eubacterial,archaean, and eukaryotic organisms. Prot. Sci., 7, 1029-1038.
73.Liu, J. & Rost, B. (2001). Comparingfunction and structure between entire proteomes. Prot. Sci., 10, 1970-1979.
74.Möller, S., Croning, D. R.& Apweiler, R. (2001). Evaluation of methods for the prediction of membranespanning regions. Bioinformatics, 17, 646-653.
75.Ikeda, M., Arai, M., Lao, D. M.& Shimizu, T. (2001). Transmembrane topology prediction methods: Are-assessment and improvement by a consensus method using a dataset ofexperimentally-characterized transmembrane topologies. In Silico Biol. , 1,http://www.bioinfo.de/isb/2001/02/0003/.
76.Jayasinghe, S., Hristova, K. &White, S. H. (2001). MPtopo: A database of membrane protein topology. Prot.Sci., 10, 455-458.
77.Jayasinghe, S., Hristova, K. &White, S. H. (2001). Energetics, stability, and prediction of transmembranehelices. J. Mol. Biol., 312, 927-934.
78.Edelman, J. (1993). Quadraticminimization of predictors for protein secondary structure: application totransmembrane a-helices. J. Mol. Biol., 232, 165-191.
79.Rost, B., Casadio, R., Fariselli,P. & Sander, C. (1995). Prediction of helical transmembrane segments at 95%accuracy. Prot. Sci., 4, 521-533.
80.Rost, B. & Sander, C. (1993).Prediction of protein secondary structure at better than 70% accuracy. J. Mol.Biol., 232, 584-599.
81.Rost, B. (2002). Enzyme functionless conserved than anticipated. J. Mol. Biol., 318, 595-608.
82.Persson, B. & Argos, P. (1994).Prediction of transmembrane segments in proteins utilising multiple sequencealignments. J. Mol. Biol., 237, 182-192.
83.Neuwald, A. F., Liu, J. S. &Lawrence, C. E. (1995). Gibbs motif sampling: detection of bacterial outermembrane protein repeats. Prot. Sci., 4, 1618-1631.
84.Johnson, J. M. & Church, G. M.(1999). Alignment and structure prediction of divergent protein families:periplasmic and outer membrane proteins of bacterial efflux pumps. J. Mol.Biol., 287, 695-715.
85.Rost, B. & Eyrich, V. (2001).EVA: large-scale analysis of secondary structure prediction. Proteins, 45 Suppl5, S192-S199.
86.Marti-Renom, M. A., Madhusudhan, M.S., Fiser, A., Rost, B. & Sali, A. (2002). Reliability of assessment ofprotein structure prediction methods. Structure, 10, 435-440.
87.Moult, J., Pedersen, J. T., Judson,R. & Fidelis, K. (1995). A large-scale experiment to assess proteinstructure prediction methods. Proteins, 23, ii-iv.
88.Moult, J., Hubbard, T., Bryant, S.H., Fidelis, K. & Pedersen, J. T. (1997). Critical assessment of methods ofprotein structure prediction (CASP): Round II. Proteins, Suppl 1, 2-6.
89.Moult, J., Hubbard, T., Bryant, S.H., Fidelis, K. & Pedersen, J. T. (1999). Critical assessment of methods ofprotein structure prediction (CASP): round II. Proteins, Suppl 1, 2-6.
90.Lo Conte, L., Brenner, S. E.,Hubbard, T. J., Chothia, C. & Murzin, A. G. (2002). SCOP database in 2002:refinements accommodate structural genomics. Nucl. Acids Res., 30, 264-267.
91.Jording, D. & Puhler, A.(1993). The membrane topology of the Rhizobium meliloti C4-dicarboxylatepermease (DctA) as derived from protein fusions with Escherichia coli K12alkaline phosphatase (PhoA) and b-galactosidase (LacZ). Mol. Gen. Genet., 241,106-114.
92.Lewis, M. J., Chang, J. A. &Simoni, R. D. (1990). A topological analysis of subunit A from escherichia coliF1F0-ATP synthase predicts eight transmembrane segments. J. Biol. Chem., 265,10541-10550.
93.Wang, R., Seror, S., Blight, M.,Pratt, J., Broome-Smith, J. et al. (1991). Analysis of the MembraneOrganization of an Escherichia coli Protein Translocator, HlyB, a Member of aLarge Family of Prokaryote and Eukaryote Surface Transport Proteins. J. Mol.Biol., 217, 441-454.
94.Gentschev, I. & Goebel, W.(1992). Topological and functional studies on HlyB of Escherichia coli. Mol.Gen. Genet., 232, 40-48.
95.Miroux, B., Frossard, V.,Raimbault, S., Ricquier, D. & Bouillaud, F. (1993). The topology of thebrown adipose tissue mitochondrial uncoupling protein determined withantibodies against its antigenic sites revealed by a library of fusionproteins. EMBO J., 12, 3739-3745.
96.Wolfenden, R. V., Cullis, P. M.& Southgate, C. C. F. (1979). Water, protein folding, and the genetic code.Science, 206, 575-577.
97.Rost, B., Casadio, R. &Fariselli, P. (1996). Refining neural network predictions for helical transmembraneproteins by dynamic programming. In Fourth International Conference onIntelligent Systems for Molecular Biology (States, D., Agarwal, P.,Gaasterland, T., Hunter, L. & Smith, R. F., eds.), pp. 192-200, Menlo Park,CA: AAAI Press, St. Louis, M.O., U.S.A..
98.Rost, B. (2001). Protein secondarystructure prediction continues to rise. J. Struct. Biol., 134, 204-218.
99.Keightley, J., Zimmerman, B.,Mather, M., Springer, P., Pastusztyn, A. et al. (1995). Molecular genetic andprotein chemical characterization of the cytochrome ba3 from thermusthermophilus HB8*. J. Biol. Chem., 270, 20345-20358.
100.Moller, J., Ning, G., Maunscbach,A., Fujimoto, K., Asai, K. et al. (1997). Probing of the membrane topology ofsarcoplasmic reticulum Ca+2-ATPase with sequence-specific antibodies. J. Biol.Chem., 272, 29015-29032.
101.Juul, B., Turc, H., Durand, M., deGracia, A., Denoroy, L. et al. (1995). Do transmembrane segments in proteolyzedsarcoplasmic reticulum Ca(2+)-ATPase retain their functional Ca2+ bindingproperties after removal of cytoplasmic fragments by proteinase K? J. Biol.Chem., 270, 20123-20134.
102.Promponas, V. J., Palaios, G. A.,Pasquier, C. M., Hamodrakas, J. S. & Hamodrakas, S. J. (1999). CoPreTHi: aWeb tool which combines transmembrane protein segment prediction methods. InSilico Biol. , 1, 159-62.
103.Nilsson, I. & von Heijne, G.(1998). Breaking the camel's back: proline-induced turns in a modeltransmembrane helix. J. Mol. Biol., 284, 1185-9.
104.Ota, K., Sakaguchi, M., vonHeijne, G., Hamasaki, N. & Mihara, K. (1998). Forced transmembraneorientation of hydrophilic polypeptide segments in multispanning membraneproteins. Mol Cell, 2, 495-503.
105.Monne, M., Gafvelin, G., Nilsson,R. & von Heijne, G. (1999). N-tail translocation in a eukaryotic polytopicmembrane protein: synergy between neighboring transmembrane segments. Eur. J.Biochem., 263, 264-269.
106.Monne, M., Nilsson, I., Elofsson,A. & von Heijne, G. (1999). Turns in transmembrane helices: determinationof the minimal length of a "helical hairpin" and derivation of afine-grained turn propensity scale. J. Mol. Biol., 293, 807-814.
107.Murzin, A. G., Brenner, S. E.,Hubbard, T. & Chothia, C. (1995). SCOP: A structural classification of proteinsdatabase for the investigation of sequences and structures. J. Mol. Biol., 247,536-540.
108.Lo Conte, L., Ailey, B., Hubbard,T. J., Brenner, S. E., Murzin, A. G. et al. (2000). SCOP: a structuralclassification of proteins database. Nucl. Acids Res., 28, 257-259.
109.Nielsen, H., Engelbrecht, J., vonHeijne, G. & Brunak, S. (1996). Defining a similarity threshold for afunctional protein sequence pattern: the si