CHOPPER
Contents
Intro
CHOP (Liu J & Rost B 2004 Proteins 55(3):678-688) is a method of dissecting proteins into domain-like fragments based on sequence homology. It is developed by Jinfeng. It analyses the protein for its homology to PDB domains (SCOP, CATH, and PrISM domains), Pfam domains and SWISS-PROT proteins.
I have developed a related method, CHOPnet, which is one of the first de novo domain boundary prediction methods based on an artificial neural network (Liu J & Rost B 2004 Nucleic Acids Res 32: 3522-3530).
I've tried to combine these two methods into a single package 'profchop'.
References
- Liu J & Rost B (2004) CHOP proteins into structural domain-like fragments. Proteins, 55(3):678-688 MEDLINE Paper text
- Liu J & Rost B (2004) CHOP: Domain Dissection Based on Homology Nucleic Acids Research submitted.
Installation with aptitude (Debian, Ubuntu, etc.)
Software Installation
- If you have not done so until now, add the rostlab repository to the list of your syanptic package manager. This is how it's done: Debian_repository#sources.list.d
- aptitude update
- aptitude (search for rostlab keyring and install by marking the package with a '+' and hit 'g' twice to install)
- aptitude update (to determine all rostlab packages to install)
- aptitude install profchop. Here's a step by step guide Debian_repository#Installing_a_package_step_by_step
Running CHOPPER
Please see the CHOPPER man page:
man profchop
Availability/Web server
This program is currently available only as a standalone package available from our debian repository.
Help
Here is the excerpt from the README file in the package.
CHOPPER is a prediction method for protein domain boundaries. It has two components, a homology based method (CHOP) and a neural network approach (CHOPnet). CHOPPER takes a fasta sequence as input, and generates XML output of the domain prediction. It can also output an ASCII table, a HTML table, and a CASP DP format by -of option. The user has the option to turn off either CHOP ( -nochop ) or CHOPnet ( -nochopnet ). Using ' -h ' on the command line will give you the detailed options as shown below: chopper.pl: running CHOP and CHOPnet for domain prediction Usage: chopper.pl [options] -i in_file -o out_file Opt: -h print this help -i <file> input file (REQUIRED) -o <file> output file (REQUIRED) -of <string> format of the output (xml|casp|txt|html), default=xml -keepxml always keep XML output (default=TRUE) -id <string> identifier of input protein (default: taken from input fasta) -(no)chop run CHOP prediction (default=TRUE) -(no)chopnet run CHOPnet prediction (default=TRUE) -(no)debug print debug info(default=nodebug) -printconf print all current options and exit
Input sequence
>YOL113W SKM1, Chr XV from 104325-106292 MKGVKKEGWISYKVDGLFSFLWQKRYLVLNDSYLAFYKSDKCNEEPVLSVPLTSITNVSR IQLKQNCFEILRATDQKENISPINSYFYESNSKRSIFISTRTERDLHGWLDAIFAKCPLL SGVSSPTNFTHKVHVGFDPKVGNFVGVPDSWAKLLQTSEITYDDWNRNSKAVIKALQFYE DYNGLDTMQFNDHLNTSLDLKPLKSPTRYIINKRTNSIKRSVSRTLRKGKTDSILPVYQS ELKPFPRPSDDDYKFTNIEDNKVREEGRVHVSKESTADSQTKQLGKKEQKVIQSHLRRHD NNSTFRPHRLAPSAPATKNHDSKTKWHKEDLLELKNNDDSNEIIMKMKTVAIDVNPRPYF QLVEKAGQGASGAVYLSKRIKLPQENDPRFLKSHCHRVVGERVAIKQIRLSEQPKKQLIM NELLVMNDSRQENIVNFLEAYIIDDEELWVIMEYMEGGCLTDILDAVARSNTGEHSSPLN ENQMAYIVKETCQGLKFLHNKKIIHRDIKSDNILLNSQGLVKITDFGFCVELTEKRSKRA TMVGTPYWMAPEIVNQKGYDEKVDVWSLGIMLIEMIEGEPPYLNEDPLKALYLIANNGSP KLRHPESVSKQTKQFLDACLQVNVESRASVRKLLTFEFLSMACSPEQLKVSLKWH
TEXT output
Result of CHOP prediction (Jinfeng Liu & Burkhard Rost) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jinfeng Liu & Burkhard Rost Proteins. (2004) in press ________________________________________________________ # Query : yol113w # Length : 655 Fragments Homologue(region) E_value Method ---------- -------------------- ---------- -------------------- 4-118 PF00169(1-92) 7.5e-19 HMMER/Pfam_ls 123-161 1eesB1(8-46) 7e-09 BLAST/prism_trim 162-363 NULL NULL NULL 364-453 1f3mC1(5-76) 2e-15 BLAST/prism_trim 479-523 1erk_2(12-56) 4e-04 BLAST/prism_trim 548-633 1bygA3(10-94) 3e-04 BLAST/prism_trim //
HTML output
Fragments | Homologue(region) | E value | Method | Database |
4-118 | PF00169 (1-92) | 7.5e-19 | HMMER | Pfam_ls |
123-161 | 1ees B1(8-46) | 7e-09 | BLAST | prism_trim |
162-363 | ||||
364-453 | 1f3m C1(5-76) | 2e-15 | BLAST | prism_trim |
479-523 | 1erk_2(12-56) | 4e-04 | BLAST | prism_trim |
548-633 | 1byg A3(10-94) | 3e-04 | BLAST | prism_trim |
Information
CHOP has been applied to more than 60 completely-sequenced proteomes. Here are some statistics.
Organism | Number of proteins | Number of fragments | Chopped proteins | Single-domain Chopped proteins |
Aeropyrum pernix K1 | 2692 | 3931 | 979(36%) | 270(27%) |
Achaeoglobus fulgidus | 2394 | 4240 | 1583(66%) | 561(35%) |
Halobacterium sp. (strain NRC-1) | 2058 | 3757 | 1295(62%) | 383(29%) |
Methanosarcina acetivorans | 4532 | 8592 | 2611(57%) | 707(27%) |
Methanococcus jannaschii | 1762 | 3142 | 1196(67%) | 430(35%) |
Methanopyrus kandleri | 1683 | 2853 | 1019(60%) | 348(34%) |
Methanobacterium thermoautotrophicum | 1862 | 3388 | 1249(67%) | 427(34%) |
Pyrococcus abyssi | 1763 | 3273 | 1322(74%) | 451(34%) |
Pyrococcus furiosus | 2061 | 3647 | 1401(67%) | 468(33%) |
Pyrococcus horikoshii | 2063 | 3431 | 1163(56%) | 371(31%) |
Sulfolobus solfataricus | 943 | 1568 | 571(60%) | 182(31%) |
Sulfolobus tokodaii | 2826 | 4497 | 1520(53%) | 514(33%) |
Thermoplasma acidophilum | 1475 | 2633 | 1003(68%) | 332(33%) |
Thermoplasma volcanium | 1525 | 2686 | 1001(65%) | 316(31%) |
Aquifex aeolicus | 1522 | 3039 | 1202(78%) | 397(33%) |
Bacillus subtilis | 4093 | 7611 | 2632(64%) | 791(30%) |
Bifidobacterium longum | 1728 | 3676 | 1260(72%) | 259(20%) |
Borrelia burgdorferi | 850 | 1671 | 572(67%) | 168(29%) |
Brucella melitensis | 2059 | 3980 | 1394(67%) | 363(26%) |
Campylobacter jejuni | 1633 | 3001 | 1137(69%) | 382(33%) |
Caulobacter crescentus | 3737 | 7117 | 2466(65%) | 653(26%) |
Chlamydia pneumoniae | 1052 | 1938 | 632(60%) | 197(31%) |
Chlorobium tepidum | 2251 | 4182 | 1407(62%) | 411(29%) |
Chlamydia trachomatis | 894 | 1765 | 610(68%) | 183(30%) |
Clostridium acetobutylicum | 3843 | 7350 | 2515(65%) | 669(26%) |
Clostridium perfringens | 2723 | 5241 | 1869(68%) | 532(28%) |
Deinococcus radiodurans | 3102 | 5883 | 1996(64%) | 488(24%) |
Escherichia coli | 4280 | 8225 | 3044(71%) | 903(29%) |
Fusobacterium nucleatum | 2058 | 3742 | 1320(64%) | 407(30%) |
Haemophilus influenzae | 1708 | 3320 | 1312(76%) | 463(35%) |
Helicobacter pylori | 1549 | 2803 | 1033(66%) | 341(33%) |
Lactococcus lactis (subsp. lactis) | 2266 | 4153 | 1539(67%) | 484(31%) |
Leptospira interrogans | 4726 | 7569 | 1978(41%) | 501(25%) |
Listeria innocua | 2968 | 5454 | 2012(67%) | 638(31%) |
Listeria monocytogenes | 2845 | 5402 | 2065(72%) | 669(32%) |
Mycoplasma genitalium | 470 | 991 | 367(78%) | 107(29%) |
Mycobacterium leprae | 1605 | 3346 | 1109(69%) | 254(22%) |
Mycoplasma pneumoniae | 688 | 1386 | 484(70%) | 128(26%) |
Mycobacterium tuberculosis | 4186 | 8233 | 2566(61%) | 568(22%) |
Neisseria meningitidis | 2061 | 3688 | 1267(61%) | 386(30%) |
Oceanobacillus iheyensis | 3491 | 6540 | 2407(68%) | 748(31%) |
Pasteurella multocida | 2014 | 4057 | 1633(81%) | 556(34%) |
Pseudomonas aeruginosa | 5562 | 11058 | 4032(72%) | 1107(27%) |
Rickettsia conorii | 1374 | 2283 | 702(51%) | 241(34%) |
Rickettsia prowazekii | 834 | 1678 | 642(76%) | 223(34%) |
Staphylococcus aureus | 2622 | 4785 | 1752(66%) | 571(32%) |
Streptomyces coelicolor | 7889 | 15121 | 4888(61%) | 980(20%) |
Streptococcus pyogenes | 1845 | 3395 | 1230(66%) | 387(31%) |
Synechococcus elongatus | 2473 | 4928 | 1688(68%) | 458(27%) |
Synechocystis PCC6803 | 3166 | 6356 | 2123(67%) | 582(27%) |
Thermotoga maritima | 1844 | 3608 | 1391(75%) | 440(31%) |
Treponema pallidum | 1031 | 2104 | 693(67%) | 173(24%) |
Ureaplasma urealyticum | 611 | 1129 | 380(62%) | 104(27%) |
Vibrio cholerae | 2735 | 5521 | 1966(71%) | 561(28%) |
Xanthomonas campestris (pv. citri) | 4312 | 8271 | 2819(65%) | 740(26%) |
Xylella fastidiosa | 2766 | 4738 | 1389(50%) | 355(25%) |
Arabidopsis thaliana | 25528 | 61241 | 16619(65%) | 1716(10%) |
Caenorhabditis elegans | 20244 | 45427 | 11736(57%) | 1530(13%) |
Drosophila melanogaster | 14314 | 33601 | 8310(58%) | 850(10%) |
Saccharomyces cerevisiae | 6349 | 13334 | 3474(54%) | 466(13%) |
Homo sapiens | 36750 | 93619 | 22733(61%) | 3098(13%) |
Questions
Please see the FAQ section or contact the maintainer