In this project, we consider building a full-fledged Web Search engine.
Details on the registration procedure can be found further below.
Prerequisites (beyond plain attendance of database systems and information retrieval courses):
- Substantial programming skills in Java 1.8+ beyond tutorials; we expect you have already written non-trivial code for at least one larger project.
- Well-founded practical skills in working with relational database systems, including installing one, and experience in writing non-trivial SQL queries. We will use current Postgresql in this project.
- Practical knowledge and experience with user-defined functions (UDFs)
- Experience with Apache Tomcat installation and Servlets and/or Java Server Pages programming.
- (Debian) Linux command line skills: Installing software, configuring Tomcat, ...
If you lack one or the other skill mentioned above, you might want to consult literature and tutorials already before the project starts. Specifically, if you lack programming skills, it is difficult to obtain them on-the-fly during the project.
In this project, a Web Search Engine is to be developed. The core tasks are roughly the following:
- Implement an HTML parser.
- Design and implement a web crawler.
- Design the required database schema to store the contents of visited pages and the link structure.
- Write an SQL-based query processor to execute Google-style keyword queries.
- Devise/Create index structures to accelerate the querying performance.
- Implement alternate query processors using threshold algorithms.
- Realize alternate methods to compute the score of how well a document matches the query.
- For this, implement Google's Pagerank algorithm and integrate it into the scoring model.
- Implement an HTML-based user interface and a Web service
- Use the Web services of your fellow student to realize a meta search engine.
Information on the Registration Process
- It will be published soon.