bottom - CUBIC-papers - CUBIC

Title: Bioinformatics in structural genomics
Author:Burkhard Rost, Barry Honig and Alfonso Valencia
Quote: Bioinformatics, 2002, 897

This article is published in (Bioinformatics, 18, 2002, 897) © copyright Oxford University Press (2002). OUP is the only authorised source. All copying of this article including placing on another website requires the written permission of the copyright owner.


Bioinformatics in structural genomics

Burkhard Rost 1,2 ^, Barry Honig 2,3 & Alfonso Valencia 4

  1. Department of Biochemistry and Molecular Biophysics, Columbia University, 650 West 168th Street, New York, NY 10032, USA
  2. Columbia University Center for Computational Biology and Bioinformatics (C2B2), RussBerrie Pavilion, 1150 St. Nicholas Avenue, New York, NY 10032, USA
  3. Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Columbia University, 630 West 168th Street, New York, NY 10032, USA
  4. Protein Design Group, CNB-CSIC, Cantoblanco, Madrid 28049, Spain

The goal of structural genomics initiatives is to significantly expand the structural “coverage” of sequence space. A number of specific objectives have been included within this broad definition. These include; the determination of the detailed three-dimensional structure for at least one representative of each protein fold that occurs in nature, the determination of enough structures so that all others can be built with homology models and the determination of enough structures so that functional information can be inferred from models of all others. Projects are being launched at different pace and with very different scopes in the USA, Japan and Europe (Table). The initial phase has shifted focus slightly away from 'determining as many structures as possible' to 'finding large-scale semi-automatic solutions for the immediate technical bottlenecks'. These currently appear to be protein expression, purification and crystallisation.

The different initiatives have adopted various approaches to the problems of how to select the experimental targets, how to analyse the resulting structures, and how to optimally benefit from the structural information generated to learn about function. Obviously, many of the initial tasks of structural genomics may profit from exploring the potential of bioinformatics. However, the precise way in which bioinformatics is embedded into existing initiatives and the percentage of resources allocated to bioinformatics activities differs substantially between the efforts. For example, the North-East Structural Genomics consortium (NESG) relies entirely on bioinformatics to select the targets that are experimentally pursued. In contrast, the first European structural genomics project hardly explored the potentials of bioinformatics.

The most obvious initial application of bioinformatics tools to aid structural genomics has been the task of ranking the experimental targets. The task at hand is to somehow cluster all known proteins into families and to label all those families for which we do not yet have high-resolution information about the associated structures. However, computational biology is also required to optimally explore the information gained by solving a single structure. Examples are the transfer of structural information to homologues through comparative modelling and/or threading techniques. Another set of tools that aim to profit from the wealth of structures added through structural genomics aims at predicting aspects of function and at identifying functionally similar proteins. The problem of how functional specificity relates to protein structure is of very complex nature. Studying this adaptation requires combining experimental and predictive methods. A particular example of such combinations could be the predictions of structural consequences for different sequence variants as for instance found in Single Nucleotide Polymorphisms (SNPs). Another example could be the case of predicting protein-protein interactions (proteomics). Since the large-scale determination of large complexes of interacting proteins will not be part of the initial activities in structural genomics, we might venture to aim at obtaining a coarse-grained picture of protein-protein interactions by combining structures determined in context of structural genomics, interactions determined in context of functional genomics with tools from computational biology such as docking programs.

It a very early phase of structural genomics projects, the Juan March foundation (http://www.march.es) hosted a meeting on "Structural genomics and bioinformatics" in Madrid (March, 2001). This meeting served as a forum to discuss some of the controversial issues at the interface of experimental structural genomics and bioinformatics; it addressed the challenges for bioinformatics resulting from structural genomics in two ways: (1) How can bioinformatics help structural genomics initiatives? (2) How can bioinformatics profit from the flood of new structures? The talks included reports from different experimental persepectives (CM Dobson, Oxford; JM Carazo, Madrid; CD Lima, New York; A McDermott, New York) and a representative collection of topics on bioinformatics and computational biology, including the analysis of sequence space (M Linial, Jerusalem; SI O'Donoghue, Heidelberg; B Rost, New York; C Sander, Boston), the distribution of protein structures and folds (L Holm, Cambridge; A Murzin, Cambridge; C Orengo, London), the current status of the protein structure prediction methods in homology modelling (M Peitsch, Basel; A Sali, New York), threading (D Jones, London) and protein interactions (M Kanehisa, Kyoto; A Valencia, Madrid). At the same time a number of presentations addressed the issues related with the prediction of protein function at various levels (T Gaasterland, New York; F Gago, Madrid; B Honig, New York; M Orozco, Barcelona; M Sippl, Salzburg; J Thornton, London). Almost all speakers are currently actively involved with one of the existing structural genomics initiatives.

For this issue of Bioinformatics, we selected five papers addressing the key problems in structural genomics mentioned above: (1) target selection (3 papers), (2) homology modelling, and (3) structural basis of protein function. J. Liu & B. Rost present a re-estimate of the number of proteins that are targets for structural genomics on eukaryotes ("Target space for structural genomics revisited"). E. Portugaly & M. Linial present a refined version of their original method to cluster protein sequence space through pair-relations ("Selecting targets for structural determination by navigating in a graph of protein families"). F. Abascal & A. Valencia describe a clustering scheme applicable to fine-grained classifications of protein families ("Clustering of proximal sequence space for the identification of protein families"). E. Portugaly & M. Linial present a refined version of their original method to cluster the space of all proteins through pair-relations ("Selecting targets for structural determination by navigating in a graph of protein families"). M. Peitsch describes how comparative modelling can extend the impact for a single experimental structure ("Use of protein models"). Finally, X. Fradera, X. De La Cruz, C. H.T.P. Silva, J. L. Gelpí, F. J. Luque and M. Orozco explore the adaptation of binding sites by comparing protine bound to different substrates ("How dependent are binding sites on the bound ligand?").

Table: Current initiatives in structural genomics
Initiative (PI, country) URL Focus
USA (NIH) http://www.nigms.nih.gov/funding/psi/psi_research_centers.html  
BSGC (S-H Kim, USA) http://www.strgen.org Mycoplasma pneumoniae
CESG (JL Markley, USA) http://www.uwstructuralgenomics.org Arabidopsis thaliana
JCSG (I Wilson, USA) http://www.jcsg.org Caenorhabditis elegans
MCSG (A Joachimiak, USA) http://www.mcsg.anl.gov Disease related and 'easy' proteins
NESG (G Montelione, USA) http://www.nesg.org Eukaryotes
NYSGRC (SK Burley, USA) http://www.nysgrc.org Enzymes
SECSG (B-C Wang, USA) http://secsg.org Pyrococcus furiosus
SGPP (WGJ Hol, USA) http://depts.washington.edu/sgpp Pathogenic protozoa
TBSGC (T Terwilliger, USA) http://www.doe-mbi.ucla.edu/TB/ Mycobacterium tuberculosis
 
Non-US    
PSF (U Heinemann, Germany) Homo sapiens
Spine (D Stuart, EU) http://europa.eu.int/comm/research/press/2002/pr1803en.html#ann3 500 targets of medical interest
SRG (S Yokoyama, Japan) http://www.rsgi.riken.go.jp/ Thermus thermophilus, Mus musculus
Toronto (C Arrowsmith, Canada) http://www.uhnres.utoronto.ca/proteomics/ Bacteria, Archae, Yeast
YSG (J Janin, France) http://genomics.eu.org/ Saccharomyces cerevisia


Contact:    rost@columbia.edu Version:    Jul 12, 2002
top - CUBIC-papers - CUBIC