Distributed Data Mining Lab Course SoSe 18

 

Type:

Master Lab Course 10 P, IN2106

Ects:

10

Supervisors

Dr. Lothar Richter, Dmitrii Nechaev

Rotation:

weekly meeting of 2 hours, time slot: Wednesday 13 - 15, room 01.09.034

Rooms:

01.09.034 for the weekly meeting

Language:

English

Announcements:

Links to the wiki and to the slack channel have been added

There are two identical pre-meetings: Tue, Jan 30th, 2-3 pm and Thu, Feb 1st, 3-4 pm. Room FMI 01.09.034

Content

The character of this lab course will be highly explorative and technical oriented and covers the following (among others). Since the final syllabus is stll under development the mentioned topics might still change:

  • Hadoop File System
  • Exploration and Comparison of Hadoop and/or Spark
  • Installation/Configuration
  • Installation, Configuration and Application of the  MLlib framework
  • MapReduce
Topic Short Description Date Presenter
HDFS Hadoop File System: Fundamental layer for data storage and distribution May 2nd Alli Kareem
Hadoop Framework to allow for the distributed processing of large data sets across clusters of computers using simple programming models like map/reduce    
Spark A fast and general engine for large-scale data processing    
Ambari Cluster and Configuration Management    
Chef Simple Task Automation    
Scala Object Oriented Functional Programming Languag    
Meso/Yarn/more Schedulers and Additional Automation Tools    
MLlib Spark's Machine Learning Library    
Hbase, Cassandra, Hive Big Data Data Storage Solutions    
GraphX, Giraph/Pregel Graph Mining Systems    
H20, Zeppelin Machine Learning & Predicting Analytics    
       
       
       
       
....    

 

 

Open Topics: Storm, Kafka, Flink, Zookeeper, Docker(-Cloud)

Prerequisites

  • Basic experience in Data Mining / Machine Learning
  • Sound Linux administration/ command line skills 

Resources

  • the wiki could be reached this link
  • the slack channel can be found here

Slides

Jan 30th / Feb 1st Premeeting