In this project, we consider building a full-fledged Web Search engine.
The registration deadline is the 17th of October.
Prerequisites (beyond plain attendance of database systems and information retrieval courses):
- Substantial programming skills in Java 1.8+ beyond tutorials; we expect you have already written non-trivial code for at least one larger project.
- Well-founded practical skills in working with relational database systems, including installing one, and experience in writing non-trivial SQL queries. We will use current Postgresql in this project.
- Practical knowledge and experience with user-defined functions (UDFs)
- Experience with Apache Tomcat installation and Servlets and/or Java Server Pages programming.
- (Debian) Linux command line skills: Installing software, configuring Tomcat, ...
If you lack one or the other skill mentioned above, you might want to consult literature and tutorials already before the project starts. Specifically, if you lack programming skills, it is difficult to obtain them on-the-fly during the project.
In this project, a Web Search Engine is to be developed. The core tasks are roughly the following:
- Implement an HTML parser.
- Design and implement a web crawler.
- Design the required database schema to store the contents of visited pages and the link structure.
- Write an SQL-based query processor to execute Google-style keyword queries.
- Devise/Create index structures to accelerate the querying performance.
- Implement alternate query processors using threshold algorithms.
- Realize alternate methods to compute the score of how well a document matches the query.
- For this, implement Google's Pagerank algorithm and integrate it into the scoring model.
- Implement an HTML-based user interface and a Web service
- Use the Web services of your fellow student to realize a meta search engine.
Information on the Registration Process
- This project is offered in Wintersemester 2021/22
- The number of participants is limited.
- Registration is not done on a first-come, first-served basis.
- In order to register, download this json template registration file, rename it to yourmatriculationnumber.json, edit it to reflect your information, and send it as an attachment via email to Damjan Gjurovski. Make sure the file is valid JSON and ASCII or UTF-8 encoded, the latter without byte order mark. Please use your official university email account @cs.uni-kl.de, @rhrk.uni-kl.de, or @student.uni-kl.de to register and to send the email.
- Registration due October 17.
- Soon after the end of the registration, we will let you know whether or not you got a slot in the project.