Lecture (2V+1Ü, 4 ECTS-LP) "Distributed Data Management" (Module Description), Course Number INF-24-53-V-7
- Level: Master
- Language: English
This course addresses fundamental concepts of distributed data management. Emphasis is put on novel approaches/paradigms to managing Big Data. The course aims at a mixture of system issues and hands on experience (like Hadoop/HDFS) and on fundamental algorithms and techniques (such as consistent hashing or Bloom filters).
- Big Data, Cloud Computing
- MapReduce (Hadoop, HDFS, …)
- Various algorithms on top of MR
- NoSQL Stores (MongoDB, Amazon Dynamo, Riak, ...)
- (State Machine) Replication, Paxos
- (Eventual) Consistency Models
- Synopses: Bloomfilter, count-min sketch, KMV, ...
- Distributed Data Stream Processing: STREAM, Storm, ...
- Gossip protocols, consistent hashing
Time and Location
- KIS entry
- Thursday, 15:30-17:00.
- Room 42-110
- Begin: 20.04.2017
- KIS entry
- Wednesday, 13:45-15:15
- Room 46-110
- Begin: 26.04.2017
|20.04.2017||All news will be posted in OLAT from now on.|
|23.03.2017||Regulations for qualification to the final exam are posted. Please read carefully.|
|22.03.2017||Room for exercise changed to 46-110 and also time slot slightly moved.|
|16.02.2017||Website is online.|
Please read carefully.
Students need to successfully participate in the exercise sessions, according to the regulations below, in order to get admitted to the final exam.
- There will be 6 exercise sheets.
- The teaching assistant presents the solutions and answers questions.
- There is no mandatory attendance of the exercise sessions; still, we would be happy to see a lively participation.
- Each sheet consists of 3 assignments, which makes 18 assignments in total. Each assignment is equivalent to one point.
- A student needs to reach a total of at least 13 points throughout the semester to qualify for the final exam.
- Solutions to exercise sheets have to be submitted in OLAT.
- Students can work alone or in groups of max. two, determined with the first submission, and upload, individually, the same solution in OLAT, with names of both members on all sheets.
- Students need to mark in OLAT which individual assignments they have managed to solve correctly.
- Students can only mark an assignment as solved, if they have managed to complete more than ⅔ of the assignment.
- If the solution of an assignment is not done correctly to an extent of ⅔ or more, the point for that assignment will be not given. That means, if you did not work on more than ⅔, don’t put the mark at all.
- If it is obvious that the mark has been placed in a dishonest attempt to obtain a point without proper engagement with the assignment, the entire sheet is assessed with zero points. For instance, if the marked exercise is not done at all or clearly below ⅔ solved.
- Copying solutions from other groups or taking solutions from previously published solution sheets, if clearly identifiable, will cause all involved groups to get immediately disqualified from the course, independent of the number of points accomplished regularly.
Slides will be made available roughly 24 hours before the lecture. Note that they are then still subject to change. The final version is uploaded after the lecture.
- Lecture 1: Introduction, Regulations, MapReduce Fundamentals pdf
- Lecture 2: SQL/Joins in MapReduce, Bloom Filter, Hadoop, Secondary Sort pdf
- Lecture 3: Graph Processing, N-Gram Mining, and Min-Hashing in MapReduce pdf
- Lecture 4: Min-Hashing in MapReduce, Pig and Hive, Spark pdf
- Lecture 5: Spark, NoSQL, Replication and Distributed Agreement pdf
- Lecture 6: Paxos, Lamport Timestamps, State Machine Replication pdf
- Lecture 7: CAP Theorem, BASE, Consistency Models, Vector Clocks pdf
- Lecture 8: Consistent Hashing, Rumor Spreading, Merkle Trees, Amazon Dynamo pdf
- Lecture 9: Data Stream Processing: Model, Sampling, Sketches pdf
- Lecture 10: STREAM, CQL, Storm pdf
- Lecture 11: Cloud Computing pdf