1. Content of the Database
The NMP-db consists of two parts. The first part is the actual literature based NMP-db, containing all nuclear matrix proteins (NMPs) that were originally found in PubMed. The second part is a database of homologues to the NMP-db proteins. This database is called NMP-db(hom) and contains proteins which have at least an HSSP-value of 55 to one of the proteins in the NMP-db.
The NMP-db holds informations about the protein names, their organism and the cell-type in which NM association was observed. Also links to the respective PubMed abstracts are given in each entry.
Additionally, we provide information about predictions of secondary structure, solvent accessibility, coiled-coil regions and domain-architecture, as well as the sequence, links to PDB (database of 3D-structures), molecular weight, theoretical pI, links to SWISS-2DPAGE, OMIM, PEP and many other databases.
If available, we also list regions in a protein sequence that are known to be cruicial for NM-targeting.
Finally, links to the S/MARt DB allow users to find DNA-regions that bind to NMPdb proteins.
2. Structure and Fields of the Database
The following fields can be accessed in the database:
ID: The database ID of a protein. An ID starts with the class of the protein (3 letters) followed by an underscore and the five digit accession-number of the protein. For NMP-db(hom) proteins, the ID starts with "HOM", followed by an underscore and the fife-digit accession-number of it's homologue, then another underscore and a three digit number.
AC: The accession number of the protein. This is a simple number for NMP-db proteins and H[number]_[number] for NMP-db(hom) proteins.
CL: The nuclear matrix class of the protein. Can be:
NUS: nuclear shell (also called nuclear lamina)DT: Creation date of the entry
INM: internal nuclear matrix (part of the intermediate filament structure of the NM)
ASC: tightly associated with the nucelar matrix
MIX: only associaterd at certain times of the cell-cycle or in certain cell-types (depends mostly on protein-modification)
UNK: unknown class (no information about nuclear-matrix association type available from paper)
UD: Dates of important updates of this entry
NA: Names for the protein (usually taken from SwissProt)
GN: Gene names for the protein (usually taken from SwissProt)
OS: Organism of the protein
CT: Cell type of the protein as found in the literature (only for NMP-db proteins)
P1: PubMed article IDs that refer directly to the described protein (matching organism and cell-type; only for NMP-db proteins)
P2: PubMed article IDs that refer generally to the described protein (not necessarily matching organism and/or cell-type; only for NMP-db proteins)
PN: Protein names that are used in the articles under P1 and P2 (some articles don't use any SwissProt names; only for NMP-db proteins)
SP: SWISSPROT/TrEMBL IDs (UniProt)
2D: SWISS-2DPAGE ID and spots on gels (Database of 2D Gel Electrophoresis Photographs)
SM: S/MARt DB: Database for DNA-regions that bind to the nuclear matrix (scaffold/matrix attachment regions S/MARs)
FN: Functional description of the protein (imported from SWISSPROT)
KW: Keywords describing the protein (imported from SWISSPROT)
PD: PDB IDs (Protein Data Bank)
GB: Genbank IDs
PE: PEP ID (Prediction of Entire Proteomes)
OM: OMIM ID (Online Mendelian Inheritance in Man)
CC: A comment (NMP-db(hom)-proteins have 'nuclear matrix by homology' as their first comment)
HL: ID of the homologue in the NMP-db (field exists only for NMP-db(hom)-proteins)
HS: HSSP-value for alignment between NMP-db(hom) protein and its NMP-db homologue (field exists only for NMP-db(hom)-proteins)
SS: Sequence source (source-database of the given sequence)
SI: Sequence Information; number of AAs and mol.-weight
SQ: The protein sequence from its given source
SE: Prof-sec predictions for the sequence (secondary structure prediction; H=Helix; E=Strand; L=Loop)
SA: Prof-acc predictions for the sequence (solvent accessibility prediction; b=buried; i=intermediate; e=exposed)
HT: Prof-htm predictions for the sequence (Transmembrane alpha helix prediction; H=TM-Helix; L=Rest)
CO: Coils predictions for the sequence (Coiled-coils prediction; C=Coil-region)
NO: NORSp predictions for the sequence (Non-regular secondary structure; n=nors region)
DO: CHOP predictions for the sequence (Domain-architecture of the protein)
PT: PubMed IDs of papers mentioning the nuclear matrix targeting signal (NMTS) under NT
AT: Aminoacids and domains of the nuclear matrix targeting signal (NMTS) under NT
NT: Sequence of the nuclear matrix targeting signal (NMTS) (if this sequence stretch is missing in the protein, it will not locate to the NM anymore)
//: End of the entry
3. How to browse NMPdb
You can browse through the NMPdb instead of searching for a specific protein by going to the browse page and look for a feature (organism, molecular weight, etc.) that you would like specify. Simply click on a link and you will get to a page that lists all database entries in a sorted fashion matching the criteria you picked.
4. How to use the Advanced-Search
The advanced search lets you specify up to three search terms, for which you want to look in different fields of the database. If you need less terms, leave the remaining query-fields empty. The search-terms can be connected by the following operators:
AND: the term that follows this operator has to occur in the specified field(s)For each search field you can additionally set whether you want it to be case sensitive.
OR: either the preceding or the following term to this operator has to appear in the specified field(s)
AND NOT: the search-term following this operator must not show up in the specified field(s)
Lastly pick a database that you want to search in: NMP-db, NMP-db(hom) or both.
For example: if you want to search for all proteins that bind to DNA in human, you could try the following search query:
Keywords: DNA-binding AND Organism: human (case-insensitive)
5. Download the Database
Go to the download-page and pick a database and a compression type (zip or tar.gz). Then download the files and uncompress them on your machine. The databases are simple ASCII-files.
If you use the NMP-db, please cite the following paper in your publication:
Mika S., Rost B.
NMPdb: database of Nuclear-Matrix Proteins
Nucleic Acids Research (submitted)