INF 510

This is the first of 3 “milestones” for the final project. Please read this entire document

carefully.

This project can be done individually or in pairs (no groups larger than 2 are allowed). The
amount of effort for a pair project will be roughly double an individual project. You are on your
own to set up a group if you so desire (feel free to use the Piazza forum, etc.). Once you have
turned in this first assignment as a group, you must keep that group throughout the project.
In general, the focus of the project is to show you can acquire, model, store and process
multiple sources of data, and build reliable pipelines to do so. For this course, actual “analysis”
of the data is secondary; you’ll be expected to say something about the data, but your
conclusions are not the focus of the project.

Your project will be scored on a number of factors, including (but not limited to!) the
complexity and size of your datasets, the quality of your pipeline and modeling code, and the
writeup of your research statement and conclusions. It’s a sizable amount of work, but it’s your
chance to actually do something substantial with data, so I hope you have fun with it!
The project contains three “milestones”, outlined below. Each one will be turned in separately,
with the final submission being your final project. Note that the first two milestones are
submitted via our course website. The final submission is to be done via GitHub.
Briefly, milestones are:

1). Data set and problem selection
2). Data acquisition and modeling infrastructure
3). Research conclusions and writeup