Project Logs

Home About Me Project Documentation Interests

Projects List

Project Logs for Simple Shapes using Convolutional Neural Network project

Project Details

Introduction:
Pattern classification and recognition, which is a field of machine learning, has been one of the most challenging tasks for a computer. This is because computers do not learn the same way as humans. Humans can generalize and have temporary memory, making it difficult for machines to be hard-coded in the natural technicalities of the human brain. However, neural networks embody a similar organization of how neurons interact with one another. Each weight of a single neuron in the neural network represents how strong the neuron’s knowledge is about the data input. The neurons are connected to each other in each layer and in between layers, which is also called a fully connected layer. As each input data is propagated forward through the hidden layers of the neural network, there are activation functions which act as a summation of the input values multiplied by the weights for the learning features to be feed into the next layer. As it reaches the last output layer, which is the main classifier because it determines the probability of the class, the data is backpropagated using sophisticated multivariable calculus to update the weights.
Without the weight updates, the neural network would not be able to learn. Each epoch describes a single forward and backward pass. Since the project is about classifying images of simple shapes, which includes circle, triangle, rectangle and square, a convolutional neural network will be implemented. A convolutional neural network does not need feature extraction in the preprocessing of the input images. The layers of the convolutional neural network act as a feature detector by extracting actual pixels from the data image, gathering features from filters and maxpooling. Then, the convolutional neural network will be connected to a traditional fully connected neural network to perform image classification.

Problem, Solution and Enhancements:
There are a lot of problems that need to be solved in the simple shapes image classification project. The hardest part of this simple shapes image classification problem is to be able to get the convolutional neural network to generalize well without overfitting or underfitting the network model. It is hard to generalize because the input data is never perfect, and it is redundant for the convolutional neural network to process background data without the class object. The input data is created by using Python’s Cairo module, where single shapes of each class is drawn on random locations, rotations and scales on the 300 pixels by 300 pixels image.
The final solution to deal with the unnecessary background data is to use the Sobel algorithm to detect edges, crop the image, and paste it on the 200 pixels by 200 pixels canvas with the same background color. This must be done separately before the main convolutional neural network loader file because I need to manually check if data image is good. Good data means that the square and rectangles are not shifted to the side, the triangles are not so small and that the circles do not overextend the boundary, making the image indecipherable. Then, the 200 pixels by 200 pixels cropped image will be feed into the convolutional neural network without the extra preprocessing function in the Keras’ Image Generator.
A typical CPU will not be able to execute a convolutional neural network. The convolutional neural network will be trained in Google Cloud’s virtual machine because a typical job submitted to the Google Cloud gives a memory error if the data input is around 8000 images for training. Increasing the number of GPUs for the job would not work because each GPU is assigned a fixed memory limit for a Google Cloud job. A virtual machine will allow the user to manually install Tensorflow from source. This is because it is very time consuming to train a convolutional neural network. It usually takes about two hours for training. Tensorflow is installed from source, which enables AWX instructions with GPU CUDA support for the Tesla K80. This speeds up the computation of the convolutional neural network by three times as without installing Tensorflow from source. Hyperparameter tuning of the dropout layers is an option for the virtual machine to be implemented with Google Cloud’s storage bucket of the project.

Summary:
During training of the convolutional neural network, the network is not able to go below a loss of 0.38 or reach an accuracy higher than 80% for any epoch. Based on the documentation, the network looks like it is being overfitted because the validation accuracy is lower than the training accuracy. I find that very peculiar and originally thought it was because I was using an incorrect network architecture and loss optimizer for the loss function. I tested on many versions of different network architectures, activation functions and optimizers, but the network still won’t go below 0.38. The major mistake I made was that I assumed I had perfect data images since I made my own data. I didn’t check if the data images are reliable. I had to manually check every training and validation image data to ensure that the data is clear for the neural network to process. After fixing the data images, the training of the convolutional neural network is being underfitted because the validation accuracy is twice as much as the training accuracy and the validation loss is half of the training loss as the first several epochs. This is shown in training output documentation:
208/207 [==============================] - 62s - loss: 0.8218 - acc: 0.6593 - val_loss: 0.4303 - val_acc: 0.8694
I tried increasing the layers of the convolutional neural network and reducing the dropout and maxpool layers and it seems to work. The current convolutional neural network architecture is described in the below chart. It also trains in PNG data instead of JPG because the compressed pickled files are much smaller (65MB).

Adam optimizer of 200 by 200 images (16,000 for training, 1,500 for validation)

Num of filters 64 64 64 256 256 256 4
Layer Type CONV_2D CONV_2D MAXPOOL CONV_2D MAXPOOL Flatten() Dense() Dense() Dense() Dense()
Conv. Size (3,3) (6,6) (6,6) (6,6) (6,6) Dropout(0.15) Dropout(0.15)
Padding valid valid valid
activation relu relu relu relu relu relu softmax

Complete Description of tasks mentioned in the proposal but not accomplished with reasons:
Image classification and detection are two separate things in terms of getting the code to work. I assumed that they have buildable and very similar architectures, but they are completely different. This project only does image classification not image detection since image detection requires a recurrent convolutional neural network to be implemented, such as the FAST CNN. I am also not able to get the model to generalize well and I am still confused about why the model couldn’t learn well. The final code also does not account for intersections between shapes as stated in the interim status reports. This is because the loss of the convolutional neural network during training wouldn’t go down, so I had to simplify my dataset to consist of a single shape per image data instead of two variations of the same shape in an image. This is the simplified version of the current dataset and the cropped version of it, which is the actual input data for the convolutional neural network.
Evaluation:

I think I did considerably well considering that I did not have any prior knowledge or experience in machine learning and convolutional neural networks. I had to learn a new programming language Python 2.7 and Keras from scratch. I created a program to draw simple shapes, my own pickle files to store numpy arrays, merge and load these arrays to prepare for the training of the main convolutional neural network file. I did all of this by myself without anybody helping me with the code. One of my friends just offered me advice on multiplying 255 by the float value for the Image module to get the correct background color of the image. My friend who does Tensorflow gave me a link for a tutorial on compressing image data using save from pickle. I had to use joblib to compress the data since it does a great job dumping huge lists of numpy image data arrays. That code is not copied, it is a reference within the save part, which is roughly five lines of code. In the main convolutional neural network file, I had to also use a tutorial on how to prepare the file to be executed in the cloud. The tutorial is in https://github.com/clintonreece/keras-cloud-ml-engine.
I learned that is extremely difficult to get the convolutional neural network to generalize well during training. The network is overfitted and underfitted, but that balance is hard to get a grasp of. This is because they are so many factors to consider, which are specifically convolutional neural network architectures, image input data, loss functions, activation functions, number of filters and neurons per layer. I also got a bit sidetracked by trying to do much in the project, classifying multi-output labels of intersections between shapes. I should use programs to analyze the image data, but the network is too complicated to get important information. This is because it is difficult to get the gradients of the hidden layers.