Information Retrieval and Data Mining (IRDM) SoSe 2015

Lecture (2V+1Ü, 4 ECTS-LP) "Information Retrieval and Data Mining" (Module Description), Course Number INF-24-52-V-7

  • Level:  Master
  • Language: English

 

 Time and Location

  • Lecture:
    • KIS entry
    • Wednesday, 10:00-11:30.
    • Room 42-110
    • Begin: 22.04.2015
  • Exercise:
    • KIS entry
    • Tuesday, 15:30-17:00.
    • Room 46-210
    • Begin: 05.05.2015

 

 News

Date News             
 18.12.2015

The students that still need to take the IRDM exam, can use the following doodle poll to pick a time slot. The selection is first-come, first-served. Select only one slot, specify your name, and please try to remember the time and date you picked. You need to register also at the examination office.

13.08.2015

For scheduling re-exams, please contact Prof. Michel directly.

24.06.2015

We allocated a couple of additional slots for oral exams in the week August 10-14 to handle students with exceptional cases of conflicts to the already posted standard slots. If you think that your case is exceptional, please contact Heike Neu via email stating your conflicts, by the end of this week (i.e., end of June 28). We will decide on an individual basis whether or not we assign a new time slot.

24.06.2015

We drop assignment 2(b) and 3(b) in exercise sheet 5.

18.06.2015

Exam in German language? If you want to have the oral exam being held in German language, then please send an email to Heike Neu (neu@cs.....) stating that.

16.06.2015

Exam registration is enabled now. If you are already qualified or still have the chance to qualify for the exam, use this doodle form to pick a slot for your IRDM exam. Select only one slot and specify your name. First-come, first-served. Write down the time and date of your slot before you click on the save button. You need to register also at the examination office. This registration link is only for the IRDM lecture, there is a separate one for the DDM lecture on its website. Registration is possible until end of June 2015.

15.06.2015

As already announced in the lecture, we will have ORAL EXAMS, around end of July / beginning of August 2015. Registration instructions will follow soon.

11.06.2015

There is a mistake in sheet4 Assignment 3(c). Please notice that the comparison of hub and authority vector should be done with PageRank stationary state distribution determined in 2(a).

08.06.2015

There is no lecture on June 10.

01.06.2015

To clarify confusions on when to place a mark for an assignment that consists of multiple parts: Please mark an assignment as "done" if and only if you have accomplished some solution for each of the parts.

22.04.2015

Students who attended the Data Mining course in last winter and took the exam, can also take this course.

22.04.2015

The presentation date is incorrect in Sheet 1. The presentation of the first assignment sheet will take place on 05.05.2015.

08.04.2015

Regulations for qualification to the final exam are posted. Please read carefully.

08.04.2015 To participate in this lecture, specifically for the exercises, you need to register in the KIS tool to this lecture, see KIS link above.

 

 Regulations

 Please read carefully.

Students need to successfully participate in the exercise sessions, according to the regulations below, in order to get admitted to the final exam. 

  • There will be 5 exercise sheets.
  • Each sheet consists of 3 assignments, which makes 15 assignments in total.
  • Each assignment is equivalent to one point.
  • A student needs to reach a total of at least 10 points throughout the semester to qualify for the final exam.
  • Solutions to exercise sheets do not have to be handed in.
  • Instead, at the beginning of each exercise session, the teaching assistant (TA) asks each student to mark on a sheet the assignments that he or she solved and can present.
  • Then, for each assignment, the TA selects students among the ones that placed a mark for this respective assignment, to present the solution.
  • This selection is done solely on discretion of the TA and does not require any justification.
  • If the presented solution is correct, at least to the largest extent, the student retains the full point.
  • Else if the presented solution is wrong but it is apparent that the student has spent time in solving it, zero points are given on the assignment.
  • Else it should be obvious that the mark has been placed in a dishonest attempt to obtain a point without proper engagement with the assignment, in which case the entire sheet is assessed with zero points.
  • For student that were not called to present, and, hence, the above cases do not apply, each made mark will translate to one point.
  • This assessment, i.e., in which of the three cases the performance of the students falls into, is, again, solely done on discretion of the TA.
  • Next to these 15 obligatory assignments there might be additional optional assignments, for which the above regulations do not apply.
    Nonetheless, we would be happy to still see a lively participation in presenting and discussion their solutions.

People

Contents (tentative)

  • Boolean Information Retrieval (IR), TF-IDF, IR evaluation
  • Probabilistic IR, BM25
  • Hypothesis testing
  • Statistical language models, latent topic models
  • Relevance feedback, novelty & diversity
  • PageRank, HITS
  • Spam detection, social networks
  • Inverted lists
  • Index compression, top-k query processing
  • Frequent itemsets & association rules
  • Hierarchical, density-based, and co-clustering
  • Decision trees and Naive Bayes
  • Support vector machines 

Slides

  • Lecture 1: Introduction pdf
  • Lecture 2: Edit Distances, TF*IDF, Relevance pdf
  • Lecture 3: Probability theory recap, probabilistic retrieval models pdf
  • Lecture 4: Probabilistic retrieval models, Language Models pdf
  • Lecture 5: Language Models, Latent Topic Models pdf
  • Lecture 6: Latent Topic Models, Link Analysis: PageRank pdf
  • Lecture 7: Link Analysis: HITS, Personalized PageRank, SimRank, Click Graphs pdf
  • Lecture 8: Indexing, Compression pdf
  • Lecture 9: Compression, Query Processing pdf
  • Lecture 10: Query Processing, Data Mining: Frequent Itemset and Association Rule Mining pdf
  • Lecture 11: Frequent Itemset and Association Rule Mining, Clustering pdf
  • Lecture 12: Classification: Decision Trees, Naive Bayes Classifier, Support Vector Machines pdf

Errata

If you find potential mistakes in the script, please contact us. The script is kept up to date accordingly.

  • Lecture 3, slide 28. Misplaced closing parenthesis in equation.
  • Lecture 6, slide 18. Point under Social Network Analysis. This is the definition of betweenness centrality, not closeness centrality
  • Lecture 6, slide 45. Corrected misaligned equations
  • Lecture 3, slide 41. Corrected table.

 

Exercise Sheets

  • Sheet 1 pdf
  • Sheet 2 pdf 
  • Sheet 3 pdf (Term-Document Matrix for Assignment 3 file)
  • Sheet 4 pdf 
  • Sheet 5 pdf 
  • Sheet 6 pdf 

 

Acknowledgements

 The course material is to a large extent based on material by Klaus Berberich, Martin Theobald, Pauli Miettinen and Gerhard Weikum, MPI Informatik, Saarbrücken.

Literature

  • Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze. Introduction to Information Retrieval, Cambridge University Press, 2008
  • Larry Wasserman. All of Statistics, Springer, 2004. 
  • Stefan Büttcher, Charles L. A. Clarke, Gordon V. Cormack. Information Retrieval: Implementing and Evaluating Search Engines
  • Anand Rajaraman and Jeffrey D. Ullman. Mining of Massive Datasets, Cambridge University Press, 2011. 
  • - supplementary literature references will be given in the lecture

 

(c) AG DBIS, TU Kaiserslautern, 2015