CMPUT206 Image Classification with CNN eClass
Datasets
You will be working with two datasets in this assignment – one for classification and one for sliding window detection.
NotMNIST-RGB:
This is a customized RGB version of the NotMNIST dataset. The original dataset contains 28x28 grayscale images showing ten letters of the alphabet from A to J in various fonts. This has been modified to add a random background and foreground colour to each image. There are 12000 images in the train set and 6700 in the test set.
The test set is not released to simulate real world conditions where trained models need to be deployed to work on previously unseen data and therefore must be trained to avoid overfitting to the training data.
There is no separate validation set so the provided code allows you to split the train set into training and validation subsets with a customizable ratio.
Example images from the dataset (both original and RGB) with corresponding labels:
To reiterate, you will only be working on the RGB version of the dataset.
Visualizations of all the images in the training dataset can be found here.
Images from the original NotMNIST dataset that were used for generating the RGB images are also provided.
NotMNIST-DL:
This is a simple object detection dataset constructed by placing two random images from the NotMNIST RGB dataset at random locations on a black background within a 64 x 64 image.
In order to make the detection task even easier, following two constraints are also observed:
1. the two images are always of different letters so that the same letter is never repeated twice in the same image
2. the two images never have an overlap > 0.1 in terms of intersection over union (IOU)
Sliding window detection tends to be slow so the test set has only 500 images constructed using 1000 random images from the NotMNIST-RGB test set.
We also provide a training set with 6000 images constructed using the entire NotMNIST-RGB training set though you cannot directly use it for training.
In order to do that, you need to extract 28x28 patches from these images as explained in the next section.
As with NotMNIST RGB, there is no separate validation set and the test is not released.
my wechat:_0206girl
Don't hesitate to contact me