banner rostlab-logo
 
Research

Publications

Talks

Services



Software

Web Services

Downloads

Downloads





Group

People

Contact

Positions

Internal




CUBIC: NLProt / Index
    Page Index:
    Submit     License     Data     Help
The program
  • Submit-Form: use NLProt to submit a text
  • License your copy of the command-line version of NLProt (Windows and Linux)
  • read the help pages
  • Data used for developing NLProt
  • Brief description of the program
    NLProt  is a tool for finding protein-names in natural language-text. It is based on Support Vector Machines (SVMs), which are trained on contextual-features of named entities in scientific language. Additionally, simple filtering rules and a protein-name dictionary are used to increase performance.
    NLProt reached a precicion (accuracy) of 70% at a recall (coverage) of 85% after running it on the 166 most recent abstracts of EMBL and Cell (Nov/Dec 2003). When run from the command line, NLProt takes about 1 second per abstract to finish.
    Contact
    E-Mail: mika@cubic.bioc.columbia.edu
    ©2008 rostlab.org
    1130 St. Nicholas Ave, 8th. floor - (212) 851-4669
    columbia.edu | biochemistry | biosof