Handwritten digit recognition using CrysX-NN neural network trained on MNIST_Plus

In the last post, I showcased an app based on a neural network, trained on the MNIST dataset using the CrysX-NN library. The app allowed the user to enter their own handwritten digits via mouse/touch screen. Since, the model only employed neural networks without any kind of convolutional layer, the performance was sub-optimal, as one would expect. For instance, the recognition of 1 and 7 was extremely problematic. This was surprising, because the accuracy on the training set was more 98%. This goes to show that the performance on the testing set may not translate that well to the real-world.

However, the lack of a convolutional layer was not the only reason. As I had shown in one of my previous posts, handwritten digit recognition is not perfect, even when using a convolutional neural network, if the model was only trained on the MNIST training set. In that post, I had made the model more robust by adding my own variations of handwritten digits to the MNIST training set. I called this extended MNIST dataset as MNIST_Plus.

Therefore, I also tried to train a simple neural network without any convolutional layer on the MNIST_Plus dataset, and see the real-world performance.

Similar to the last time, the neural network has an input layer of size 784 (28×28 pixels = 784 pixels), 1 hidden layer of size 256 with the ReLU activation function, and finally an output layer of size 10 (for 0-9 digits) with the Softmax activation function to get the probabilities for each digit.

This model didn’t show any significant improvement on the testing MNIST dataset but the real-world performance was now much better.

Below is the streamlit app using the model described above trained on MNIST_Plus. You can write a digit using the mouse/touch input. The details of the model and the app are given below the demo.

Streamlit App

https://manassharma07-mnist-plus-mnist-plus-nn-crysx-app-fmvzzc.streamlitapp.com/

The app above is quite neat and also shows what the prediction process looks like.

The user’s handwritten input is captured in a 200×200 pixel box, which is then converted to an image. Subsequently, image processing is done to find a rectangle that completely encompasses the digit blob. Then this rectangular crop of the user input is further processed. Firstly it is converted to grayscale, then resized to a 22×22 image using BILINEAR interpolation. Then a padding of 3 pixels is applied on all the sides of the image to get a 28×28 image. This image still has pixel values in the range of 0-255. These are then normalized by dividing by 255. Then the pixel values are standardized by using the mean and standard deviation of the training MNIST_Plus dataset. Finally, the processed image is passed to the CNN model to make a prediction.

While it still not perfect and slightly inferior to the CNN model, there is a big improvement over the previous post. For example, the prediction of the digit 1 is not improved compared to before, however, it gets recognized as 2 if it has the short bottom line. The prediction is most accurate, if 1 is just drawn as a vertical line or with the top serif. Other than that it performs reasonably well with a few misclassification here and there.

There are two reasons for the bad performance of the MNIST trained model in real-world.
1. We don’t exactly know a lot of pre=processing steps used by those who collected it.
2. Some digits like 6, don’t have a lot of variations in the dataset.

Source code for the app

https://github.com/manassharma07/MNIST-PLUS/blob/main/mnist_plus_NN_crysx_app.py

from streamlit_drawable_canvas import st_canvas
import streamlit as st
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
import cv2
from crysx_nn import network

@st.cache
def create_and_load_model():
    nInputs = 784 # No. of nodes in the input layer
    neurons_per_layer = [256, 10] # Neurons per layer (excluding the input layer)
    activation_func_names = ['ReLU', 'Softmax']
    nLayers = len(neurons_per_layer)
    batchSize = 32 # No. of input samples to process at a time for optimization
    # Create the crysx_nn neural network model 
    model = network.nn_model(nInputs=nInputs, neurons_per_layer=neurons_per_layer, activation_func_names=activation_func_names, batch_size=batchSize, device='CPU', init_method='Xavier') 
    # Load the preoptimized weights and biases
    model.load_model_weights('NN_crysx_mnist_plus_98.11_streamlit_weights')
    model.load_model_biases('NN_crysx_mnist_plus_98.11_streamlit_biases')
    return model

model = create_and_load_model()

st.write('# MNIST_Plus Digit Recognition')
st.write('## Using a `CrysX-NN` neural network model')

st.write('### Draw a digit in 0-9 in the box below')
# Specify canvas parameters in application
stroke_width = st.sidebar.slider("Stroke width: ", 1, 25, 9)

realtime_update = st.sidebar.checkbox("Update in realtime", True)

# st.sidebar.markdown("## [CrysX-NN](https://github.com/manassharma07/crysx_nn)")
st.sidebar.write('\n\n ## Neural Network Library Used')
# st.sidebar.image('logo_crysx_nn.png')
st.sidebar.caption('https://github.com/manassharma07/crysx_nn')
st.sidebar.write('## Neural Network Architecture Used')
st.sidebar.write('1. **Inputs**: Flattened 28x28=784')
st.sidebar.write('2. **Hidden layer** of size **256** with **ReLU** activation Function')
st.sidebar.write('3. **Output layer** of size **10** with **Softmax** activation Function')
st.sidebar.write('Training was done for 10 epochs with Binary Cross Entropy Loss function.')
# st.sidebar.image('neural_network_visualization.png')

# Create a canvas component
canvas_result = st_canvas(
    fill_color="rgba(255, 165, 0, 0.3)",  # Fixed fill color with some opacity
    stroke_width=stroke_width,
    stroke_color='#FFFFFF',
    background_color='#000000',
    #background_image=Image.open(bg_image) if bg_image else None,
    update_streamlit=realtime_update,
    height=200,
    width=200,
    drawing_mode='freedraw',
    key="canvas",
)

# Do something interesting with the image data and paths
if canvas_result.image_data is not None:

    # st.write('### Image being used as input')
    # st.image(canvas_result.image_data)
    # st.write(type(canvas_result.image_data))
    # st.write(canvas_result.image_data.shape)
    # st.write(canvas_result.image_data)
    # im = Image.fromarray(canvas_result.image_data.astype('uint8'), mode="RGBA")
    # im.save("user_input.png", "PNG")
    
    
    # Get the numpy array (4-channel RGBA 100,100,4)
    input_numpy_array = np.array(canvas_result.image_data)
    
    
    # Get the RGBA PIL image
    input_image = Image.fromarray(input_numpy_array.astype('uint8'), 'RGBA')
    input_image.save('user_input.png')
    
    # Convert it to grayscale
    input_image_gs = input_image.convert('L')
    input_image_gs_np = np.asarray(input_image_gs.getdata()).reshape(200,200)
    all_zeros = not np.any(input_image_gs_np)
    if not all_zeros:
        # st.write('### Image as a grayscale Numpy array')
        # st.write(input_image_gs_np)
        
        # Create a temporary image for opencv to read it
        input_image_gs.save('temp_for_cv2.jpg')
        image = cv2.imread('temp_for_cv2.jpg', 0)
        # Start creating a bounding box
        height, width = image.shape
        x,y,w,h = cv2.boundingRect(image)


        # Create new blank image and shift ROI to new coordinates
        ROI = image[y:y+h, x:x+w]
        mask = np.zeros([ROI.shape[0]+10,ROI.shape[1]+10])
        width, height = mask.shape
    #     print(ROI.shape)
    #     print(mask.shape)
        x = width//2 - ROI.shape[0]//2 
        y = height//2 - ROI.shape[1]//2 
    #     print(x,y)
        mask[y:y+h, x:x+w] = ROI
    #     print(mask)
        # Check if centering/masking was successful
    #     plt.imshow(mask, cmap='viridis') 
        output_image = Image.fromarray(mask) # mask has values in [0-255] as expected
        # Now we need to resize, but it causes problems with default arguments as it changes the range of pixel values to be negative or positive
        # compressed_output_image = output_image.resize((22,22))
        # Therefore, we use the following:
        compressed_output_image = output_image.resize((22,22), Image.BILINEAR) # PIL.Image.NEAREST or PIL.Image.BILINEAR also performs good

        tensor_image = np.array(compressed_output_image.getdata())/255.
        tensor_image = tensor_image.reshape(22,22)
        # Padding
        tensor_image = np.pad(tensor_image, (3,3), "constant", constant_values=(0,0))
        # Normalization should be done after padding i guess
        tensor_image = (tensor_image - 0.1307) / 0.3081
        # st.write(tensor_image.shape) 
        # Shape of tensor image is (1,28,28)
        


        # st.write('### Processing steps:')
        # st.write('1. Find the bounding box of the digit blob and use that.')
        # st.write('2. Convert it to size 22x22.')
        # st.write('3. Pad the image with 3 pixels on all the sides to get a 28x28 image.')
        # st.write('4. Normalize the image to have pixel values between 0 and 1.')
        # st.write('5. Standardize the image using the mean and standard deviation of the MNIST_plus dataset.')

        # The following gives noisy image because the values are from -1 to 1, which is not a proper image format
        # im = Image.fromarray(tensor_image.reshape(28,28), mode='L')
        # im.save("processed_tensor.png", "PNG")
        # So we use matplotlib to save it instead
        plt.imsave('processed_tensor.png',tensor_image.reshape(28,28), cmap='gray')

        # st.write('### Processed image')
        # st.image('processed_tensor.png')
        # st.write(tensor_image.detach().cpu().numpy().reshape(28,28))


        ### Compute the predictions
        output_probabilities = model.predict(tensor_image.reshape(1,784).astype(np.float32))
        prediction = np.argmax(output_probabilities)

        top_3_probabilities = output_probabilities[0].argsort()[-3:][::-1]
        ind = output_probabilities[0].argsort()[-3:][::-1]
        top_3_certainties = output_probabilities[0,ind]*100

        st.write('### Prediction') 
        st.write('### '+str(prediction))

        st.write('MNIST_Plus Dataset (with more handwritten samples added by me) available as PNGs at: https://github.com/manassharma07/MNIST-PLUS/tree/main/mnist_plus_png')

        st.write('## Breakdown of the prediction process:') 

        st.write('### Image being used as input')
        st.image(canvas_result.image_data)

        st.write('### Image as a grayscale Numpy array')
        st.write(input_image_gs_np)

        st.write('### Processing steps:')
        st.write('1. Find the bounding box of the digit blob and use that.')
        st.write('2. Convert it to size 22x22.')
        st.write('3. Pad the image with 3 pixels on all the sides to get a 28x28 image.')
        st.write('4. Normalize the image to have pixel values between 0 and 1.')
        st.write('5. Standardize the image using the mean and standard deviation of the MNIST_Plus training dataset.')

        st.write('### Processed image')
        st.image('processed_tensor.png')



        st.write('### Prediction') 
        st.write(str(prediction))
        st.write('### Certainty')    
        st.write(str(output_probabilities[0,prediction]*100) +'%')
        st.write('### Top 3 candidates')
        # st.write(top_3_probabilities)
        st.write(str(top_3_probabilities))
        st.write('### Certainties %')    
        # st.write(top_3_certainties)
        st.write(str(top_3_certainties))


st.write('### Code used for training the neural network: [Jupyter Notebook](https://github.com/manassharma07/crysx_nn/blob/main/examples/NN_MNIST_plus_from_raw_png_crysx.ipynb)')    

Pretrained Neural Network weights and biases for the CrysX-NN model

Weights: https://github.com/manassharma07/MNIST-PLUS/blob/main/NN_crysx_mnist_plus_98.11_streamlit_weights.npz

Biases: https://github.com/manassharma07/MNIST-PLUS/blob/main/NN_crysx_mnist_plus_98.11_streamlit_biases.npz

You can download it and load it in your python code using:

model.load_model_weights('NN_crysx_mnist_plus_98.11_weights')
model.load_model_biases('NN_crysx_mnist_plus_98.11_biases')

Code used for training the model

https://github.com/manassharma07/crysx_nn/blob/main/examples/NN_MNIST_plus_from_raw_png_crysx.ipynb

Details of the Neural Network

Optimizer: Stochastic Gradient Descent
Learning Rate = 0.3
Number of epochs = 10
Batch size = 200
Loss function: Categorical Cross Entropy loss

Code snippet for creation and training of Neural network

    from crysx_nn import mnist_utils as mu
    import numpy as np

    # Download MNIST_orig and MNIST_plus datasets  (May take upto 5 min)
    mu.downloadMNIST()

    ## Load the training dataset from MNIST_orig in memory (May take upto 5 min)
    path = 'MNIST-PLUS-PNG/mnist_plus_png'
    trainData, trainLabels = mu.loadMNIST(path_main=path, train=True, shuffle=True)

    ## Normalize within the range [0,1.0]
    trainData = trainData/255 # Normalize

    trainData_mean = trainData.mean()
    trainData_std = trainData.std()

    ## Standardize the data so that it has mean 0 and variance 1
    trainData = (trainData - np.mean(trainData)) / np.std(trainData)

    ## Convert labels to one-hot vectors
    trainLabels = mu.one_hot_encode(trainLabels, 10)

    ## Flatten the input numpy arrays (nSamples,28,28)->(nSamples, 784)
    trainData = trainData.reshape(trainData.shape[0], 784)

    ## Let us create a NN using CrysX-NN now
    nInputs = 784 # No. of nodes in the input layer
    neurons_per_layer = [256, 10] # Neurons per layer (excluding the input layer)
    activation_func_names = ['ReLU', 'Softmax']
    nLayers = len(neurons_per_layer)
    nEpochs = 10
    batchSize = 200 # No. of input samples to process at a time for optimization

    from crysx_nn import network
    model = network.nn_model(nInputs=nInputs, neurons_per_layer=neurons_per_layer, activation_func_names=activation_func_names, batch_size=batchSize, device='CPU', init_method='Xavier') 

    model.lr = 0.3

    ## Check the model details
    model.details()
    model.visualize()

    ## Optimize/Train the network
    inputs = trainData.astype(np.float32)
    outputs = trainLabels.astype(np.float32)
    # Run optimization
    # model.optimize(inputs, outputs, lr=0.3,nEpochs=nEpochs,loss_func_name='CCE', miniterEpoch=1, batchProgressBar=True, miniterBatch=100)
    # To get accuracies at each epoch
    model.optimize(inputs, outputs, lr=0.3,nEpochs=nEpochs,loss_func_name='CCE', miniterEpoch=1, batchProgressBar=True, miniterBatch=100, get_accuracy=True)

    ## Error at each epoch
    print(model.errors)

    ## Accuracy at each epoch
    print(model.accuracy)

    ## Save model weights and biases
    # Save weights
    model.save_model_weights('NN_crysx_mnist_plus_98.11_weights')
    # Save biases
    model.save_model_biases('NN_crysx_mnist_plus_98.11_biases')

    ## Load model weights and biases from files
    model.load_model_weights('NN_crysx_mnist_plus_98.11_weights')
    model.load_model_biases('NN_crysx_mnist_plus_98.11_biases')

    ## Test data set
    path = 'MNIST-PLUS-PNG/mnist_plus_png'
    testData, testLabels = mu.loadMNIST(path_main=path, train=False, shuffle=True)


    ## Normalize within the range [0,1.0]

    testData = testData/255. # Normalize

    ## Standardize the data so that it has mean 0 and variance 1
    # Use the mean and std of training data **********
    testData = (testData - trainData_mean) / trainData_std


    ## Convert labels to one-hot vectors
    testLabels = mu.one_hot_encode(testLabels, 10)


    ## Flatten the input numpy arrays (nSamples,28,28)->(nSamples, 784)
    testData = testData.reshape(testData.shape[0], 784)

    ## Performance on Test data
    # Convert to float32 arrays
    inputs = testData.astype(np.float32)
    outputs = testLabels.astype(np.float32)
    
    predictions, error, accuracy = model.predict(inputs, outputs, loss_func_name='CCE', get_accuracy=True)
    print('Error:',error)
    print('Accuracy %:',accuracy*100)

PhD researcher at Friedrich-Schiller University Jena, Germany. I'm a physicist specializing in theoretical, computational and experimental condensed matter physics. I like to develop Physics related apps and softwares from time to time. Can code in most of the popular languages. Like to share my knowledge in Physics and applications using this Blog and a YouTube channel.
[wpedon id="7041" align="center"]

Leave a Reply

Your email address will not be published. Required fields are marked *