WeChat:lovexc60
Grade: 60 points.
PROJECT DESCRIPTION
In this assignment, you will reimplement the indexes from group assignment 1 using PyLucene. You will also reimplement the vector space model with cosine similarity for retrieving the top K documents (ranked retrieval) from the collection of document provided as an attachment. The specifications are:
Part A: Exact Top K Vector Space Retrieval [10]
Part B: Cosine Similarity and Rocchio’s algorithm [40pts]
Part B: Experimental study [35pts]
Extra Credit: Pseudo Relevance Feedback [20pts]
One of the challenges of user feedback is that the user may not be willing to provide feedback. In such cases, pseudo relevance feedback can be used. You will compare the performance of your user feedback based system from Part A against a pseudo feedback mechanism (implemented using LyLucene), where the top 3 results of the system are considered to be relevant. Run this system with the same queries from your experimental study in Part B. Compare its recall, precision and MAP values against the system using user feedback.
Other instructions:
Attachments:
Submission
Deadline for submission: 11/04/2022 11:59 PM
Each team will also schedule a presentation with the instructor (Week of Nov. 7). Your presentation will last about 10 minutes. The instructor will email you to set up the presentations.
Anyone who misses the final presentation will not receive a grade for the assignment.
Late/re-submission