Appendix to: "Adaptation of protein surfaces to subcellular location"

Miguel A. Andrade£, Se·n I. O'Donoghueß, & Burkhard Rostß

ß European Molecular Biology Laboratory, D-69012 Heidelberg, Germany; odonoghue@embl-heidelberg.de; rost@columbia.edu

£ European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, United Kingdom; andrade@ebi.ac.uk

The authors' names have been listed alphabetically.

Journal of Molecular Biology, in press


Contents of the Appendix


WWW:



Figure legends

Figure 1s: Prediction of subcellular location from structure

The figure shows the distributions of surface composition vectors for the proteins in the non-located data set. The surface composition vectors have been projected onto the plane defined by the three average surface composition vectors for the homology data set, i.e. the same plane as in Figure 3. Comparing the positions of these vectors to the clusters in Figure 3 allows us to predict the proteins' location based on the homology data set. Only vectors that fall into the strongly predicted regions are shown. The vectors are marked with the PDB code of the corresponding structure; the codes are coloured to indicate the predicted subcellular location - nuclear (green), cytoplasmic (red), or extracellular (blue). The axes are labelled in as Figure 3.


Figure 2s: Surface composition vectors from the glycosylated data set

Surface composition vectors for the glycosylated data set projected onto the same plane as in Figure 3. These axes are labelled in as Figure 3. Vector positions are marked with a g (blue). All the glycosylated proteins are extracellular, but the vectors occur in all three regions of the projection, mostly in the cytoplasmic and extracellular region. This is consistent with the previously proposed hypothesis that glycosylation is a general mechanism to alter the surface properties of proteins that evolved initially in the cell interior, so that they are adapted to the extracellular environment.


Figure 3s: Exposure distributions for each amino acid type

Relative frequencies (y-axis) at which each amino acid type occurs with a given relative solvent accessibility (x-axis). The distributions are plotted separately for nuclear (green), cytoplasmic (red), and extracellular (blue) proteins. The clearest differences are seen for Asp, Lys, Asn, and Arg.


Figure 4s: Length distributions for each subcellular location

Shows the distribution of polypeptide chain lengths for proteins from each of the three major location classes. Note that the counts (vertical axes) differ between the three plots.