In this project, we consider building a full-fledged Web Search engine.
Details on the registration procedure can be found further below.
Prerequisites (beyond plain attendance of database systems and information retrieval courses):
- Substantial programming skills in Java 1.7+ beyond tutorials; we expect you have already written non-trivial code for at least one larger project.
- Well-founded practical skills in working with relational database systems, including installing one, and experience in writing non-trivial SQL queries. We will use Postgresql 9.x in this project.
- Practical knowledge and experience with user-defined functions (UDFs)
- Experience with Apache Tomcat installation and Servlets and/or Java Server Pages programming.
- (Debian) Linux command line skills: Installing software, configuring Tomcat, ....
If you lack one or the other skill mentioned above, you might want to consult literature and tutorials already before the project starts. Specifically, if you lack programming skills, it is difficult to obtain them on-the-fly during the project.
In this project, a Web Search Engine is to be developed. The core tasks are roughly the following:
- Implement an HTML Parser.
- Design and Implement a Web Crawler.
- Design the required database schema to store the contents of visited pages and the link structure.
- Write an SQL-based query processor to execute Google-style keyword queries.
- Devise/Create index structures to accelerate the querying performance.
- Implement alternate query processors using threshold algorithms.
- Realize alternate methods to compute the score of how well a document matches the query.
- For this, implement Google's Pagerank algorithm and integrate it in the scoring model.
- Implement an HTML-based user interface and a Web service
- Use the Web services of your fellow student to realize a meta search engine.
Registration information published
Information on the Registration Process
- This project is offered in Wintersemester 2017/18
- The number of participants is limited.
- Registration is not done on a first-come, first-served basis.
- Registration due September 30 (no extensions).
- Soon after the end of the registration we will let you know whether or not you got a slot in the project.