Distributed Data Mining Lab Course SoSe 18



Master Lab Course 10 P, IN2106




Dr. Lothar Richter, Dmitrii Nechaev


weekly meeting of 2 hours, time slot: Wednesday 13 - 15, room 01.09.034


01.09.034 for the weekly meeting




Links to the wiki and to the slack channel have been added

There are two identical pre-meetings: Tue, Jan 30th, 2-3 pm and Thu, Feb 1st, 3-4 pm. Room FMI 01.09.034


The character of this lab course will be highly explorative and technical oriented and covers the following (among others). Since the final syllabus is stll under development the mentioned topics might still change:

  • Hadoop File System
  • Exploration and Comparison of Hadoop and/or Spark
  • Installation/Configuration
  • Installation, Configuration and Application of the  MLlib framework
  • MapReduce
Topic Short Description Date Presenter
HDFS Hadoop File System: Fundamental layer for data storage and distribution May 2nd Alli Kareem
Hadoop Framework to allow for the distributed processing of large data sets across clusters of computers using simple programming models like map/reduce    
Spark A fast and general engine for large-scale data processing    
Ambari Cluster and Configuration Management    
Chef Simple Task Automation    
Scala Object Oriented Functional Programming Languag    
Meso/Yarn/more Schedulers and Additional Automation Tools    
MLlib Spark's Machine Learning Library    
Hbase, Cassandra, Hive Big Data Data Storage Solutions    
GraphX, Giraph/Pregel Graph Mining Systems    
H20, Zeppelin Machine Learning & Predicting Analytics    



Open Topics: Storm, Kafka, Flink, Zookeeper, Docker(-Cloud)


  • Basic experience in Data Mining / Machine Learning
  • Sound Linux administration/ command line skills 


  • the wiki could be reached this link
  • the slack channel can be found here


Jan 30th / Feb 1st Premeeting