Summary:
Our main goal is to predict important aspects of protein
structure and function using sequence information, evolutionary information and
results from other predictions. We apply whichever type of algorithm is needed
to solve a problem from modern machine learning (neural networks, SVMs,
tree-algorithms, Bayesian classifiers) to established statistical means.
Protein prediction:
The lab's research is driven by a conviction that protein
and DNA sequences encode a significant core of information about the ultimate
structure and function of genetic material and its gene products. Research
goals of the lab involve using protein and DNA sequences along with evolutionary
information to predict aspects of the proteins relevant to the advance of
biomedical research. Examples are the prediction of coarse-grained aspects of
protein function such as the type of enzymatic activity (ECGO), the
prediction of interaction partners (ISIS, DISIS, PiNAT), subcellular
localization (LOCtree, LOCnet, PredictNLS), and of functional effects of point
mutations/SNPs (SNAP), the prediction of disordered regions (NORSp, Ucon,
IUcon), membrane spanning segments (PROF/PHDhtm), aspects of protein
secondary structure (PROF/PHD, DSSPcont) and solvent accessibility (PROF/PHD),
internal residue-residue contacts (PROFcon), the identification of domain-like
functional and structural subunits (CHOP, CHOPnet), as well as the clustering
of proteins into families (CHOP).
RNA:
We have also ventured outside the world of proteins, into
the amazingly large world of long non-coding RNAs. In particular, we developed
a method that distinguishes coding from non-coding regions (RIKEN). According to our estimates there are as
many of these long non-coding regions in mouse as there are proteins without
even considering the large universe of short sRNAS!
Our research spans from molecular details (ISIS, MD) to the
level of systems biology (PiNAT); it involves the identification of binding
sites and the prediction of roles by distinguishing cell cycle kinases from
other kinases. We have been developing de novo prediction methods as well as alignment methods for database comparisons
(AGAPE, ConsensusBLAST).
Comparative proteomics:
Most of our work aims at providing tools to annotate entire
genomes, i.e. the means for comparative genomics. Another significant research
focus is to improve the effectiveness and efficiency of structural genomics
projects' ability to determine the structures of proteins on a large scale.
Availibilty:
We dedicate unusual amounts of resources to the maintenance
of internet servers that make the fruits of our research available to the
biomedical community at large. This includes, PredictProtein, the first
Internet server for protein structure prediction, and META-PP, as well as more
recent databases and resources.