预定/报价
COMP814 Text Mining
COMP814 Text Mining
yet2024-09-05 17:31:58

This is assignment must be done in pairs to develop teamwork skills as well. If you are new to coding then it is recommended that you pair up with someone with some coding skills. Only one person from the pair needs to submit the assignment. You should have the 2 authors names as well as the ID as indicated in the IEEE template.

The assignment should be written up in a maximum of 12 pages excluding reference and appendices.

Objective

1.   To be able to carry out a typical text mining task based on an objective.

2.   To document the methodology and the findings in an appropriately formatted

scientific paper suitable for publication in a conference. The format of paper is given as a Latex template file.

Task Resources

You will be using models and code snippets that you developed as part of the labs in the  python environment. You will use the dataset provided on Canvas as a zipped file named AssignmentBlogData.zip.

Your dataset consists of a set of 19,320 xml formatted text files. These files contain blogs

collected from an anonymous blogging site which have been annotated with various types of anonymised metadata. The metadata has been integrated into the filenames. The text in each  of the files contains the blogs corresponding to a blogger (as described in the metadata) with  blog dates ranging from approximately 2001 to 2004.