Body Part Detection
Predict body part from an X-RayThe main purpose of the Body part detection model is to classify the human body parts from an X-Ray image. This model is designed to classify 7 human body parts (Elbow, Finger, Forearm, Hand, Humerus, Shoulder, Wrist).
Diving into the model:
We used Keras Python library running on top of Tensorflow to train our model. This is Sequential architecture It allows to build a model layer by layer.
‘add()’ function is used to add layers to our model. First 2 layers are Conv2D layers. These are convolution layers that will deal with our input images, which are seen as 2-dimensional matrices. Number of nodes in the first layer is 64 and 32 in the second layer. This number can be adjusted to be higher or lower, depending on the size of the dataset.
The kernel size of a keras convolution layer is defined as height x width. It is an integer or tuple/list of 2 integers, specifying the height and width of the 2D convolution window. Kernel size is the size of the filter matrix for our convolution. So a kernel size of 3 means we will have a 3x3 filter matrix. Activation functions are attached to each neuron in the network, and determines whether it should be activated (“fired”) or not, based on whether each neuron’s input is relevant for the model’s prediction. The activation function used for the first 2 layers is the ReLU, or Rectified Linear Activation.
Our first layer also takes in an input shape. This is the shape of each input image, 64,64,3. In between the Conv2D layers and the dense layer, there is a ‘Flatten’ layer. Flatten serves as a connection between the convolution and dense layers.
‘Dense’ is the output layer type we used. Dense is a standard layer type that is used in many cases for neural networks. We will have 7 nodes in our output layer, one for each possible outcome (0–6). Our model is a multi-class classification model (Multiclass or multinomial classification is the problem of classifying instances into one of three or more classes).
This is a seven class classification. Along with Dense and Flatten layers we also used MaxPooling2D Layers. Max pooling operation for 2D spatial data. Downsamples the input representation by taking the maximum value over the window defined by pool_size for each dimension along the features axis.
The activation is ‘softmax’. Softmax makes the output sum up to 1 so the output can be interpreted as probabilities. The model will then make its prediction based on which option has the highest probability.Next, we need to compile our model. Compiling the model takes three parameters: optimizer, loss function and metrics.The optimizer controls the learning rate. We will be using ‘adam’ as our optimizer. Adam is generally a good optimizer to use for many cases. The Adam optimizer adjusts the learning rate throughout training.
The learning rate determines how fast the optimal weights for the model are calculated. A smaller learning rate may lead to more accurate weights (up to a certain point), but the time it takes to compute the weights will be longer.
We will use ‘categorical_crossentropy’ for our loss function. This is the most common choice for classification. A lower score indicates that the model is performing better and we considered Accuracy as our metric.
Understanding the Data:
To train our model, we have used MURA (musculoskeletal radiographs), which is a large dataset of bone X-rays. It is one of the largest public radiographic image datasets.. It consists of images of 7 different parts of a human body. Using train_test split function we have divided the whole dataset into train and test with 80% of the data as train data and the remaining 20% as test data.
Cross-Validation using GridSearchCV and RandomizedSearchCV:
Hyperparameter optimization is a big part of deep learning. Grid search and RandomizedSearch are model hyperparameter optimization techniques. In scikit-learn this technique is provided in the GridSearchCV and RandomizedSearchCV classes respectively. Keras models can be used in scikit-learn by wrapping them with the KerasClassifier or KerasRegressor class.
To use these wrappers we just need to define a function that creates and returns the Keras sequential model, then we can pass this function to the build_fn argument when constructing the KerasClassifier class.
When constructing GridSearchCV and RandomizedSearchCV classes we need to provide a dictionary of hyperparameters to evaluate in the param_grid argument. This is a map of the model parameter name and an array of values to try. We took a dictionary containing Epochs=[10,20,25,30,35,40], Batch_size=[20,30,40,50,60] and Optimizer values= [‘SGD’, ‘RMSprop’, ‘Adagrad’, ‘Adadelta’, ‘Adam’, ‘Nadam’]. Among all the possible given values optimal hyperparameters are selected as epochs=20, batch_size=60 and optimizer as ‘Adam’.
Training the Model:
The way to feed our training data into the script is very easy: The Keras ImageDataGenerator expects the train and validation images to be structured into a train and validation folder where the images are sorted per class in subfolders with each name representing the image class. Otherwise we can just read the images from the path of the datasets along with their corresponding target class names and later we can divide the dataset into train and test data points using train_test_split function.
To train, we will use the ‘fit()’ function on our model with the following parameters: training data (train_X), target data (train_y), validation data, and the number of epochs. For our validation data, we will use the test set provided to us in our dataset, which we have split into X_test and y_test.
The number of epochs is the number of times the model will cycle through the data. The more epochs we run, the more the model will improve, up to a certain point. After that point, the model will stop improving during each epoch. We have considered epochs=20 as the optimal hyperparameters.
We have considered the CrossEntropyLoss() as the loss function and from the plot we can observe the fluctuations in log values as the batch size changes. Training loss is always nearly equal to zero, but we got the max validation loss with batch_size=60. From the above plot we can observe that as the number of epochs increases Validation accuracy increases and the validation loss value decreases.
Running Inference:
A confusion matrix is a table that is often used to describe the performance of a classification model (or “classifier”) on a set of test data for which the true values are known. It allows the visualization of the performance of an algorithm. We have plotted a confusion matrix for test data points.
We have plotted this Confusion matrix based on the predicted values of the model on the test dataset using Heatmaps from Seaborn library. From the diagonal elements of our confusion matrix we can conclude that almost all the datasets of each class are predicted correctly, except some of the few outliers. The final Accuracy of our model is 90.7% after fine-tuning the model using Cross-validation techniques.