Difference between revisions of "More challenges for machine learning protein protein interactions"

From Rost Lab Open
Line 1: Line 1:
== Data set ==
+
== Availability ==
 
All test set and cross-validation splits used in our analyses are available [http://www.rostlab.org/~hampt/ppichallenges/dataset_challenges.tar.gz here]. The fasta files with all sequences used can be downloaded [http://www.rostlab.org/~hampt/ppichallenges/human.fasta here] (Human) and [http://www.rostlab.org/~hampt/ppichallenges/yeast.fasta here] (Yeast).
 
All test set and cross-validation splits used in our analyses are available [http://www.rostlab.org/~hampt/ppichallenges/dataset_challenges.tar.gz here]. The fasta files with all sequences used can be downloaded [http://www.rostlab.org/~hampt/ppichallenges/human.fasta here] (Human) and [http://www.rostlab.org/~hampt/ppichallenges/yeast.fasta here] (Yeast).
   

Revision as of 00:06, 5 December 2014

Availability

All test set and cross-validation splits used in our analyses are available here. The fasta files with all sequences used can be downloaded here (Human) and here (Yeast).

Folder structure and naming conventions

  • In folder "negativeSimTest", sequence similarity between negative training and test PPIs has either been allowed (subfolder "nonRed") or not (subfolder "nonRed_noSim").
  • In folder "redundancyTest" the three subfolders "nonRed", "iaRed" and "seqRed" correspond to the three kinds of redundancy amongst training PPIs.

Every analysis has been repeated 10 times from the start, resulting in 10 "split_" folders for every subfolder mentioned above. In each "split_" folder, there are either 10 training sets (for cross-validations) or only one (for tests on new data). For each training set, there may be up to 3 test sets (C1-C3). Positive and negative PPIs are always in separate files.


Contact

For questions, please contact hampt@rostlab.org