This is assignment must be done in pairs to develop teamwork skills as well. If you are new to coding then it is recommended that you pair up with someone with some coding skills. Only one person from the pair needs to submit the assignment. You should have the 2 authors names as well as the ID as indicated in the IEEE template.
The assignment should be written up in a maximum of 12 pages excluding reference and appendices.
Objective
1. To be able to carry out a typical text mining task based on an objective.
2. To document the methodology and the findings in an appropriately formatted
scientific paper suitable for publication in a conference. The format of paper is given as a Latex template file.
Task Resources
You will be using models and code snippets that you developed as part of the labs in the python environment. You will use the dataset provided on Canvas as a zipped file named AssignmentBlogData.zip.
Your dataset consists of a set of 19,320 xml formatted text files. These files contain blogs
collected from an anonymous blogging site which have been annotated with various types of anonymised metadata. The metadata has been integrated into the filenames. The text in each of the files contains the blogs corresponding to a blogger (as described in the metadata) with blog dates ranging from approximately 2001 to 2004.