Distributed Data Mining Lab Course SoSe 21

  •  

Type:

Master Lab Course 6 P (IN2106, IN4176)

Ects:

10

Supervisors

Dr. Lothar Richter

Rotation:

weekly meeting of 2 hours, time slot: Wednesday 13 - 15, online

Rooms:

online

Language:

English

Announcements:

Kick-off will be on Wednesday April 21st, 1-3 pm. Details to the online meeting will be provided.

There are no more spots to recycle. Booked out.

The pre-meeting will take place online: Tue, Feb 2nd, 4.00 pm

Please join this call

Content

The character of this lab course will be highly explorative and technical oriented and covers the following (among others). Since the syllabus is continuously evolving and updating the mentioned topics might still change:

  • Hadoop File System
  • Exploration and Comparison of Hadoop, Spark and Dask
  • Installation/Configuration
  • Installation, Configuration and Application of the  MLlib framework
  • MapReduce
  • Simple applications

Prerequisites

  • Basic experience in Data Mining / Machine Learning
  • Sound Linux administration/ command line skills
  • Good command of at least one of these programming lanuages: Java, Scala, Python

Resources

Slides

Data Sets