bottom - CUBIC-papers - CUBIC

Title: DSSPcont: continuous secondary structure assignments for proteins
Author:Phil Carter & Burkhard Rost
Quote: Nucl Acids Res, 2003, 31, 3293-3295

Abbreviations used


 

DSSPcont: continuous secondary structure assignments for proteins

Phil Carter 1, 2, 4, Claus A. F. Andersen 1, 5& Burkhard Rost 1, 2, 3

1 CUBIC, Dept. of Biochemistry and Molecular Biophysics, Columbia University, 650 West 168th Street BB217, New York, NY 10032, USA
2 Columbia University Center for Computational Biology and Bioinformatics (C2B2), Russ Berrie Pavilion, 1150 St. Nicholas Avenue, New York, NY 10032, USA
3 North East Structural Genomics Consortium (NESG), Department of Biochemistry and Molecular Biophysics, Columbia University, 650 West 168th Street BB217, New York, NY 10032, USA
4 Structural Bioinformatics Group, Department of Biological Sciences, Imperial College, London, UK
5 BASF AG, Carl-Bosch-Stra§e 38, 67056 Ludwigshafen, Germany, claus.andersen@basf-ag.de

This article is published in (Nucleic Acids Research, issue, 2003 and pages) © copyright Oxford University Press (2003). OUP is the only authorised source. All copying of this article including placing on another website requires the written permission of the copyright owner.

 

Table of contents



 


Abstract

The DSSP program automatically assigns the secondary structure for each residue from the three-dimensional co-ordinates of a protein structure to one of eight states. However, discrete assignments are incomplete in that they cannot capture the continuum of thermal fluctuations. Therefore, DSSPcont (http://cubic.bioc.columbia.edu/services/DSSPcont) introduces a continuous assignment of secondary structure that replaces 'static' by 'dynamic' states. Technically, the continuum results from calculating weighted averages over ten discrete DSSP assignments with different hydrogen bond thresholds. A DSSPcont assignment for a particular residue is the average of ten DSSP assignments, each at a different hydrogen bond threshold. The continuous assignments have two important features: (1) They reflect the structural variations due to thermal fluctuations as detected by NMR spectroscopy. (2) They reproduce the structural variation between many NMR models from one single model. Therefore, functionally important variation can be extracted from a single X-ray structure using the continuous assignment procedure.

 

Key words: protein secondary structure assignment, protein motion, protein structure comparison, protein function.

 

From discrete to continuous secondary structure assignment. The automatic assignment of protein secondary structure from three-dimensional co-ordinates of protein structures is an important and, in principle, a simple bioinformatics tool. Assignments are used to visualise structures, to speed up computationally expensive structural comparisons, and to improve sequence searches. Secondary structure is more conserved than sequence information. Statistics about secondary structure occurance can be incorporated into a profile used for homology searches [1, 2] . This can yield improved accuracy over standard search tools using sequence-based information alone [3, 4, 5, 1, 6] . Hence, secondary structure assignments are important to assure the optimal yield of experimental structures and to cleverly select the targets for structural genomics. Although a conceptually simple task, the assignment of secondary structure is not always well defined [2] . In fact, assignments vary between different NMR models of the same protein and between X-ray structures of homologues [7] . Previously, we argued that such differences are not a problem of the assignment scheme, rather that they carry important information if adequately processed. Indeed, the variations between different NMR models correlate with thermal disorder [7] . The DSSP program developed by Kabsch and Sander [8] identifies secondary structure as described by Pauling and colleagues for three helix types and two extended sheet types [9, 10] . DSSP has become the standard in the field. DSSPcont constitutes a relatively straightforward extension of DSSP by adding continuous assignments ( Fig. 1 ). Because the continuous assignment of secondary structure reproduces the observed variation between high-quality NMR models, it also correlates with mobility related to protein function [7] . Thus, continuous secondary structure assignments can recognise conformational variations from a single X-ray structure and thereby may assist predictions of functionally important residues. More generally, it may help to pave the way to automatically generate valid hypotheses from protein structures. Finally, the continuous assignment appeared to describe ends of regular secondary structure segments (helices and strand) more accurately than discrete assignments. Often these caps carry important information about function and structure. Hence, the continuum may sharpen the tools that already profit from discrete assignments.

Algorithm used to generate DSSPcont. We assigned a continuum of secondary structure by running DSSP with nine different hydrogen bond thresholds (from -1.0kcal/mol to -0.2kcal/mol) [7] . To score a given weighting scheme, we used the different models reported in NMR structure ensembles and calculated the average difference between single model assignments and the mean assignment. The best weighting scheme consequently ensured that the assignment extracted as much information as possible from a single NMR model given. The 100 best weighting schemes were all similar for helix {GHI}, strand {EB} and other {LST}. This similarity indicated that the weighting scheme had a well-defined stable global optimum. The most dominant weights were found close to the default DSSP hydrogen bond threshold of -0.5 kcal/mol. The weight for the -0.2 kcal/mol threshold was consistently low, while the adjacent threshold at -0.3 kcal/mol was consistently high. This prompted us to insert another threshold at -0.25 kcal/mol. To fine-tune the weighting scheme, we performed a simple gradient descent optimisation for 50, 100, 150 and 211 proteins. The DSSPcont assignment is therefore constructed by applying nine hydrogen bond thresholds from -0.2 kcal/mol in steps of 0.1 down to -1 kcal/mol, and in addition the tenth value of -0.25 kcal/mol. The result of the averaging procedure is that a single residue is no longer assigned a single 'state', rather the continuous secondary structure of a residue is characterised by a vector with propensities for the eight different DSSP states. More flexible residues have high propensities for more than one 'state', while 'more frozen' residues have non-zero values only for one particular state. This implies in particular that DSSPcont distinguishes well-defined from rigged helix/strand caps. Furthermore, DSSPcont distinguishes non-regular states that are flexible from those that are not.

Interface to Web site. The DSSPcont server can be accessed through a web interface for use on PDB formatted protein structures (http://cubic.bioc.columbia.edu/services/DSSPcont). Users may also access a DSSPcont database of pre-calculated assignments for all PDB [11] records; this database is updated weekly with all new PDB entries. The interface is very simple, requiring submission of a PDB identifier for the pre-calculated assignments. To run the DSSPcont algorithm on a userÕs own protein, a file containing the protein can be uploaded or the user can Ôcut and pasteÕ the protein description into the web interface. The DSSPcont predictions for all PDB entries have been integrated into the Sequence-Retrieval-System SRS [12] . This enables to search by ÒIDÓ, ÒCompound NameÓ, ÒSourceÓ, ÒAuthor NameÓ, ÒNumber of ResiduesÓ, ÒNumber of ChainsÓ, ÒTotal Number of Disulphide BridgesÓ, ÒNumber of Intrachain BridgesÓ, ÒInterchain Disulphide BridgesÓ, ÒProtein Surface AccessibilityÓ, ÒTotal Number of Hydrogen BondsÓ, ÒNumber of Hydrogen Bonds in Parallel BridgesÓ, and ÒHydrogen Bonds in Antiparallel BridgesÓ. The flat files for these DSSPcont assignments can be downloaded and used locally.

Output of DSSPcont. The algorithm simply adds columns for the continuous assignment (as percentages for each of the eight states distinguished by DSSP) to the DSSP format [8] . DSSP assigns eight states: 310-helix (represented by G), alpha-helix (H), pi-helix (I), helix-turn (T), extended beta sheet (E), beta bridge (B), bend (S), and other/loop (L). Eight columns are added to the standard DSSP output, each representing one of these DSSP states. In the example shown in Fig. 1 there are 23 NMR models for the 1c3y fragment. Fig. 1 b) shows the DSSPcont assignments for model 1. DSSP assigns the state of other/loop to residue 20 which is a valine. DSSPcont however is more detailed, predicting a 68% likelihood that it is involved in other/loop, but also a 32% probability of a helix turn. Each residue in the protein is assigned a percentage for each of the eight states. The core of the helixÕs residues (24-28) are assigned as H by default DSSP although the entire a-helix switched to a 310-helix when applying a hydrogen bond threshold of -1 kcal/mol. A 'fuzzy' helix capping, as seen here, is common and was observed for approximately one in four N-caps and half the C-caps in our data sets. Dissecting the continuous assignment shows that a 0.1 kcal/mol looser hydrogen bond threshold in the default DSSP would extend the helix by one residue (residue 29). If the default threshold instead had been tightened by 0.2 kcal/mol, the helix would lose one residue (residue 28). A more detailed online explanation of the DSSPcont format can be found at http://cubic.bioc.columbia.edu/services/DSSPcont/DSSPcont.html.



Fig. 1
fig1.gif

Fig. 1. : DSSPcont assignment for 1c3y fragment. The variations between the secondary structure assignments for different NMR models of the same protein illustrate the impact of fluctuations on structure and highlight the difficulty of predicting protein structure. (a) The default DSSP assignments for all 23 models of the THP12-carrier protein (PDB identifier 1c3y (8)). The structure models were calculated using 13C/15N labelled protein and 3D/4D NMR spectroscopy with 13 NOE's per residue. (b) DSSPcont assignments for the first NMR model alone.




xxx x now remove

 



Acknowledgements

Thanks to Chris Sander (Sloan Kettering, New York) for the permission to use DSSP, to Gerrit Vriend (Nijmegen, Netherlands) for maintaining DSSP, to Arthur G. Palmer (Columbia University) and to S¿ren Brunak (Technical University of Denmark) for their invaluable contributions that were at the base of the scientific development of DSSPcont. Thanks to Jinfeng Liu (Columbia University) for computer assistance. The work was supported by the grants 1-P50-GM62413-01 and RO1-GM63029-01 from the National Institute of Health (NIH). Last, not least, thanks to all those who deposit their experimental data in public databases, and to those who maintain these databases.

References

1.Holm, L. & Sander, C. (1999).Protein folds and families: sequence and structure alignments. Nucleic AcidsResearch, 27, 244-247.
2.Andersen, C. A. F. & Rost, B.(2003). Automatic secondary structure assignment. In Structural bioinformatics(Bourne, P. & Weissig, H., eds.), pp. 339-361, John Wiley, .
3.Rost, B. (1995). TOPITS: ThreadingOne-dimensional Predictions Into Three-dimensional Structures. In Third InternationalConference on Intelligent Systems for Molecular Biology (Rawlings, C., Clark,D., Altman, R., Hunter, L., Lengauer, T. et al., eds.), pp. 314-321, Menlo Park, CA: AAAI Press, Cambridge, England.
4.Fischer, D. & Eisenberg, D.(1996). Fold recognition using sequence-derived properties. Protein Science, 5,947-955.
5.Rost, B., Schneider, R. &Sander, C. (1997). Protein fold recognition by prediction-based threading.Journal of Molecular Biology, 270, 471-480.
6.Jennings, A. J., Edge, C. M. &Sternberg, M. J. (2001). An approach to improving multiple alignments ofprotein sequences using predicted secondary structure. Protein Engineering, 14,227-231.
7.Andersen, C. A., Palmer, A. G.,Brunak, S. & Rost, B. (2002). Continuum secondary structure capturesprotein flexibility. Structure (Camb), 10, 175-84..
8.Kabsch, W. & Sander, C. (1983).Dictionary of protein secondary structure: pattern recognition of hydrogenbonded and geometrical features. Biopolymers, 22, 2577-2637.
9.Pauling, L. & Corey, R. B.(1951). Configurations of Polypeptide Chains with Favored Orientations AroundSingle Bonds: Two New Pleated Sheets. Proc. Natl. Acad. Sci. USA, 37, 720-740.
10.Pauling, L., Corey, R. B. &Branson, H. R. (1951). Two Hydrogen-Bonded Helical Configurations of thePolypeptide Chain. Proc. Natl. Acad. Sci. USA, 37, 205-211.
11.Berman, H. M., Westbrook, J., Feng,Z., Gilliland, G., Bhat, T. N. et al. (2000). The Protein Data Bank. NucleicAcids Res, 28, 235-42..
12.Etzold, T. & Argos, P. (1993).SRS--an indexing and retrieval tool for flat file data libraries. Comput ApplBiosci, 9, 49-57..

Contact:    rost@columbia.edu Version:    Apr 5, 2003
top - CUBIC-papers - CUBIC