In this new project, being offered in Wintersemester 2014/15 for the first time, we consider building a full-fledged Web Search engine.
In this project, a Web Search Engine is to be developed. The core tasks are roughly the following:
- Implement an HTML Parser.
- Design and Implement a Web Crawler.
- Design the required database schema to store the contents of visited pages and the link structure.
- Write an SQL-based query processor to execute Google-style keyword queries.
- Devise/Create index structures to accelerate the querying performance.
- Implement alternate query processors using threshold algorithms.
- Realize alternate methods to compute the score of how well a document matches the query.
- For this, implement Google's Pagerank algorithm and integrate it in the scoring model.
- Implement an HTML-based user interface and a Web service
- Use the Web services of your fellow student to realize a meta search engine.
Some illustrations of the first project run in Wintersemester 2014/15
- On Wednesday, October 29th, at 13:00, room 36/336, a kickoff meeting will take place.
- In this meeting, we will discuss organizational aspects, provide pointers to reference material/literature, and hand out and discuss the first exercise sheet.
- The participation in this meeting is mandatory.
We will introduce the main concepts of the required techniques/tools when handing out the individual exercise sheets. In addition, the following are standard books for databases and information retrieval you might want to consult. We will also give specific pointers to Web sources during the semester.
- Introduction to Information Retrieval, by Christopher D. Manning, Prabhakar Raghavan Hinrich Schütze, 2008.
- Information Retrieval: Implementing and Evaluating Search Engines,by Stefan Büttcher, Charles L. A. Clarke, Gordon V. Cormack.
- Datenbanksysteme: Eine Einführung (German), by Alfons Kemper and André Eickler.
- Database Management Systems, by Raghu Ramakrishnan and Johannes Gehrke.