Difference between revisions of "More challenges for machine learning protein protein interactions"

From Rost Lab Open
(Created page with "== Data set == All test set and cross-validation splits are available [http://www.rostlab.org/~hampt/profppikernel/profppikernel-1.0.0.tar.gz here]. The fasta files with all ...")
 
Line 1: Line 1:
 
== Data set ==
 
== Data set ==
All test set and cross-validation splits are available [http://www.rostlab.org/~hampt/profppikernel/profppikernel-1.0.0.tar.gz here]. The fasta files with all sequences used can be downloaded [http://www.rostlab.org/~hampt/profppikernel/profppikernel-1.0.0.tar.gz here] (Human) and [http://www.rostlab.org/~hampt/profppikernel/profppikernel-1.0.0.tar.gz here] (Yeast).
+
All test set and cross-validation splits are available [http://www.rostlab.org/~hampt/ppichallenges/dataset_challenges.tar.gz here]. The fasta files with all sequences used can be downloaded [http://www.rostlab.org/~hampt/ppichallenges/human.fasta here] (Human) and [http://www.rostlab.org/~hampt/ppichallenges/yeast.fasta here] (Yeast).
   
 
* In folder "negativeSimTest", sequence similarity between negative training and test PPIs has either been allowed (nonRed) or not (nonRed_noSim).
 
* In folder "negativeSimTest", sequence similarity between negative training and test PPIs has either been allowed (nonRed) or not (nonRed_noSim).

Revision as of 20:16, 21 November 2014

Data set

All test set and cross-validation splits are available here. The fasta files with all sequences used can be downloaded here (Human) and here (Yeast).

  • In folder "negativeSimTest", sequence similarity between negative training and test PPIs has either been allowed (nonRed) or not (nonRed_noSim).
  • In folder "redundancyTest" there are three types of training PPI redundancy (nonRed, iaRed, seqRed) for each data set.

Every test has been repeated 10 times from the start, resulting in 10 "split_" subfolders in each category. In each subfolder, there are either 10 training sets (for cross-validations) or only one (for test on new data). For each training set, there may be up to 3 test sets (C1-C3). Positive and negative PPIs are always in separate files.


Contact

For questions, please contact hampt@rostlab.org