In the last post, I showcased an app based on a neural network, trained on the MNIST dataset using the CrysX-NN library. The app allowed the user to enter their own handwritten digits via mouse/touch screen. Since, the model only employed neural networks without any kind of convolutional layer, the performance was sub-optimal, as one would expect. For instance, the recognition of 1 and 7 was extremely problematic. This was surprising, because the accuracy on the training set was more 98%. This goes to show that the performance on the testing set may not translate that well to the real-world.
However, the lack of a convolutional layer was not the only reason. As I had shown in one of my previous posts, handwritten digit recognition is not perfect, even when using a convolutional neural network, if the model was only trained on the MNIST training set. In that post, I had made the model more robust by adding my own variations of handwritten digits to the MNIST training set. I called this extended MNIST dataset as MNIST_Plus.
Therefore, I also tried to train a simple neural network without any convolutional layer on the MNIST_Plus dataset, and see the real-world performance.
Similar to the last time, the neural network has an input layer of size 784 (28×28 pixels = 784 pixels), 1 hidden layer of size 256 with the ReLU activation function, and finally an output layer of size 10 (for 0-9 digits) with the Softmax activation function to get the probabilities for each digit.

This model didn’t show any significant improvement on the testing MNIST dataset but the real-world performance was now much better.
Below is the streamlit app using the model described above trained on MNIST_Plus. You can write a digit using the mouse/touch input. The details of the model and the app are given below the demo.
Streamlit App
https://mnist-plus-crysx.streamlit.app/
The app above is quite neat and also shows what the prediction process looks like.
The user’s handwritten input is captured in a 200×200 pixel box, which is then converted to an image. Subsequently, image processing is done to find a rectangle that completely encompasses the digit blob. Then this rectangular crop of the user input is further processed. Firstly it is converted to grayscale, then resized to a 22×22 image using BILINEAR interpolation. Then a padding of 3 pixels is applied on all the sides of the image to get a 28×28 image. This image still has pixel values in the range of 0-255. These are then normalized by dividing by 255. Then the pixel values are standardized by using the mean and standard deviation of the training MNIST_Plus dataset. Finally, the processed image is passed to the CNN model to make a prediction.
While it still not perfect and slightly inferior to the CNN model, there is a big improvement over the previous post. For example, the prediction of the digit 1 is not improved compared to before, however, it gets recognized as 2 if it has the short bottom line. The prediction is most accurate, if 1 is just drawn as a vertical line or with the top serif. Other than that it performs reasonably well with a few misclassification here and there.
There are two reasons for the bad performance of the MNIST trained model in real-world.
1. We don’t exactly know a lot of pre=processing steps used by those who collected it.
2. Some digits like 6, don’t have a lot of variations in the dataset.
Source code for the app
https://github.com/manassharma07/MNIST-PLUS/blob/main/mnist_plus_NN_crysx_app.py
from streamlit_drawable_canvas import st_canvas
import streamlit as st
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
import cv2
from crysx_nn import network
@st.cache
def create_and_load_model():
nInputs = 784 # No. of nodes in the input layer
neurons_per_layer = [256, 10] # Neurons per layer (excluding the input layer)
activation_func_names = ['ReLU', 'Softmax']
nLayers = len(neurons_per_layer)
batchSize = 32 # No. of input samples to process at a time for optimization
# Create the crysx_nn neural network model
model = network.nn_model(nInputs=nInputs, neurons_per_layer=neurons_per_layer, activation_func_names=activation_func_names, batch_size=batchSize, device='CPU', init_method='Xavier')
# Load the preoptimized weights and biases
model.load_model_weights('NN_crysx_mnist_plus_98.11_streamlit_weights')
model.load_model_biases('NN_crysx_mnist_plus_98.11_streamlit_biases')
return model
model = create_and_load_model()
st.write('# MNIST_Plus Digit Recognition')
st.write('## Using a <code>CrysX-NN</code> neural network model')
st.write('### Draw a digit in 0-9 in the box below')
<h1>Specify canvas parameters in application</h1>
stroke_width = st.sidebar.slider("Stroke width: ", 1, 25, 9)
realtime_update = st.sidebar.checkbox("Update in realtime", True)
<h1>st.sidebar.markdown("## <a href="https://github.com/manassharma07/crysx_nn">CrysX-NN</a>")</h1>
st.sidebar.write('\n\n ## Neural Network Library Used')
<h1>st.sidebar.image('logo_crysx_nn.png')</h1>
st.sidebar.caption('https://github.com/manassharma07/crysx_nn')
st.sidebar.write('## Neural Network Architecture Used')
st.sidebar.write('1. <strong>Inputs</strong>: Flattened 28x28=784')
st.sidebar.write('2. <strong>Hidden layer</strong> of size <strong>256</strong> with <strong>ReLU</strong> activation Function')
st.sidebar.write('3. <strong>Output layer</strong> of size <strong>10</strong> with <strong>Softmax</strong> activation Function')
st.sidebar.write('Training was done for 10 epochs with Binary Cross Entropy Loss function.')
<h1>st.sidebar.image('neural_network_visualization.png')</h1>
<h1>Create a canvas component</h1>
canvas_result = st_canvas(
fill_color="rgba(255, 165, 0, 0.3)", # Fixed fill color with some opacity
stroke_width=stroke_width,
stroke_color='#FFFFFF',
background_color='#000000',
#background_image=Image.open(bg_image) if bg_image else None,
update_streamlit=realtime_update,
height=200,
width=200,
drawing_mode='freedraw',
key="canvas",
)
<h1>Do something interesting with the image data and paths</h1>
if canvas_result.image_data is not None:
<pre><code># st.write('### Image being used as input')
# st.image(canvas_result.image_data)
# st.write(type(canvas_result.image_data))
# st.write(canvas_result.image_data.shape)
# st.write(canvas_result.image_data)
# im = Image.fromarray(canvas_result.image_data.astype('uint8'), mode=&quot;RGBA&quot;)
# im.save(&quot;user_input.png&quot;, &quot;PNG&quot;)
# Get the numpy array (4-channel RGBA 100,100,4)
input_numpy_array = np.array(canvas_result.image_data)
# Get the RGBA PIL image
input_image = Image.fromarray(input_numpy_array.astype('uint8'), 'RGBA')
input_image.save('user_input.png')
# Convert it to grayscale
input_image_gs = input_image.convert('L')
input_image_gs_np = np.asarray(input_image_gs.getdata()).reshape(200,200)
all_zeros = not np.any(input_image_gs_np)
if not all_zeros:
# st.write('### Image as a grayscale Numpy array')
# st.write(input_image_gs_np)
# Create a temporary image for opencv to read it
input_image_gs.save('temp_for_cv2.jpg')
image = cv2.imread('temp_for_cv2.jpg', 0)
# Start creating a bounding box
height, width = image.shape
x,y,w,h = cv2.boundingRect(image)
# Create new blank image and shift ROI to new coordinates
ROI = image[y:y+h, x:x+w]
mask = np.zeros([ROI.shape[0]+10,ROI.shape[1]+10])
width, height = mask.shape
# print(ROI.shape)
# print(mask.shape)
x = width//2 - ROI.shape[0]//2
y = height//2 - ROI.shape[1]//2
# print(x,y)
mask[y:y+h, x:x+w] = ROI
# print(mask)
# Check if centering/masking was successful
# plt.imshow(mask, cmap='viridis')
output_image = Image.fromarray(mask) # mask has values in [0-255] as expected
# Now we need to resize, but it causes problems with default arguments as it changes the range of pixel values to be negative or positive
# compressed_output_image = output_image.resize((22,22))
# Therefore, we use the following:
compressed_output_image = output_image.resize((22,22), Image.BILINEAR) # PIL.Image.NEAREST or PIL.Image.BILINEAR also performs good
tensor_image = np.array(compressed_output_image.getdata())/255.
tensor_image = tensor_image.reshape(22,22)
# Padding
tensor_image = np.pad(tensor_image, (3,3), &quot;constant&quot;, constant_values=(0,0))
# Normalization should be done after padding i guess
tensor_image = (tensor_image - 0.1307) / 0.3081
# st.write(tensor_image.shape)
# Shape of tensor image is (1,28,28)
# st.write('### Processing steps:')
# st.write('1. Find the bounding box of the digit blob and use that.')
# st.write('2. Convert it to size 22x22.')
# st.write('3. Pad the image with 3 pixels on all the sides to get a 28x28 image.')
# st.write('4. Normalize the image to have pixel values between 0 and 1.')
# st.write('5. Standardize the image using the mean and standard deviation of the MNIST_plus dataset.')
# The following gives noisy image because the values are from -1 to 1, which is not a proper image format
# im = Image.fromarray(tensor_image.reshape(28,28), mode='L')
# im.save(&quot;processed_tensor.png&quot;, &quot;PNG&quot;)
# So we use matplotlib to save it instead
plt.imsave('processed_tensor.png',tensor_image.reshape(28,28), cmap='gray')
# st.write('### Processed image')
# st.image('processed_tensor.png')
# st.write(tensor_image.detach().cpu().numpy().reshape(28,28))
### Compute the predictions
output_probabilities = model.predict(tensor_image.reshape(1,784).astype(np.float32))
prediction = np.argmax(output_probabilities)
top_3_probabilities = output_probabilities[0].argsort()[-3:][::-1]
ind = output_probabilities[0].argsort()[-3:][::-1]
top_3_certainties = output_probabilities[0,ind]*100
st.write('### Prediction')
st.write('### '+str(prediction))
st.write('MNIST_Plus Dataset (with more handwritten samples added by me) available as PNGs at: https://github.com/manassharma07/MNIST-PLUS/tree/main/mnist_plus_png')
st.write('## Breakdown of the prediction process:')
st.write('### Image being used as input')
st.image(canvas_result.image_data)
st.write('### Image as a grayscale Numpy array')
st.write(input_image_gs_np)
st.write('### Processing steps:')
st.write('1. Find the bounding box of the digit blob and use that.')
st.write('2. Convert it to size 22x22.')
st.write('3. Pad the image with 3 pixels on all the sides to get a 28x28 image.')
st.write('4. Normalize the image to have pixel values between 0 and 1.')
st.write('5. Standardize the image using the mean and standard deviation of the MNIST_Plus training dataset.')
st.write('### Processed image')
st.image('processed_tensor.png')
st.write('### Prediction')
st.write(str(prediction))
st.write('### Certainty')
st.write(str(output_probabilities[0,prediction]*100) +'%')
st.write('### Top 3 candidates')
# st.write(top_3_probabilities)
st.write(str(top_3_probabilities))
st.write('### Certainties %')
# st.write(top_3_certainties)
st.write(str(top_3_certainties))
</code></pre>
st.write('### Code used for training the neural network: <a href="https://github.com/manassharma07/crysx_nn/blob/main/examples/NN_MNIST_plus_from_raw_png_crysx.ipynb">Jupyter Notebook</a>')<br />
Pretrained Neural Network weights and biases for the CrysX-NN model
Biases: https://github.com/manassharma07/MNIST-PLUS/blob/main/NN_crysx_mnist_plus_98.11_streamlit_biases.npz
You can download it and load it in your python code using:
model.load_model_weights('NN_crysx_mnist_plus_98.11_weights')
model.load_model_biases('NN_crysx_mnist_plus_98.11_biases')
Code used for training the model
https://github.com/manassharma07/crysx_nn/blob/main/examples/NN_MNIST_plus_from_raw_png_crysx.ipynb
Details of the Neural Network
Optimizer: Stochastic Gradient Descent
Learning Rate = 0.3
Number of epochs = 10
Batch size = 200
Loss function: Categorical Cross Entropy loss
Code snippet for creation and training of Neural network
from crysx_nn import mnist_utils as mu
import numpy as np
<pre><code># Download MNIST_orig and MNIST_plus datasets (May take upto 5 min)
mu.downloadMNIST()
## Load the training dataset from MNIST_orig in memory (May take upto 5 min)
path = 'MNIST-PLUS-PNG/mnist_plus_png'
trainData, trainLabels = mu.loadMNIST(path_main=path, train=True, shuffle=True)
## Normalize within the range [0,1.0]
trainData = trainData/255 # Normalize
trainData_mean = trainData.mean()
trainData_std = trainData.std()
## Standardize the data so that it has mean 0 and variance 1
trainData = (trainData - np.mean(trainData)) / np.std(trainData)
## Convert labels to one-hot vectors
trainLabels = mu.one_hot_encode(trainLabels, 10)
## Flatten the input numpy arrays (nSamples,28,28)-&gt;(nSamples, 784)
trainData = trainData.reshape(trainData.shape[0], 784)
## Let us create a NN using CrysX-NN now
nInputs = 784 # No. of nodes in the input layer
neurons_per_layer = [256, 10] # Neurons per layer (excluding the input layer)
activation_func_names = ['ReLU', 'Softmax']
nLayers = len(neurons_per_layer)
nEpochs = 10
batchSize = 200 # No. of input samples to process at a time for optimization
from crysx_nn import network
model = network.nn_model(nInputs=nInputs, neurons_per_layer=neurons_per_layer, activation_func_names=activation_func_names, batch_size=batchSize, device='CPU', init_method='Xavier')
model.lr = 0.3
## Check the model details
model.details()
model.visualize()
## Optimize/Train the network
inputs = trainData.astype(np.float32)
outputs = trainLabels.astype(np.float32)
# Run optimization
# model.optimize(inputs, outputs, lr=0.3,nEpochs=nEpochs,loss_func_name='CCE', miniterEpoch=1, batchProgressBar=True, miniterBatch=100)
# To get accuracies at each epoch
model.optimize(inputs, outputs, lr=0.3,nEpochs=nEpochs,loss_func_name='CCE', miniterEpoch=1, batchProgressBar=True, miniterBatch=100, get_accuracy=True)
## Error at each epoch
print(model.errors)
## Accuracy at each epoch
print(model.accuracy)
## Save model weights and biases
# Save weights
model.save_model_weights('NN_crysx_mnist_plus_98.11_weights')
# Save biases
model.save_model_biases('NN_crysx_mnist_plus_98.11_biases')
## Load model weights and biases from files
model.load_model_weights('NN_crysx_mnist_plus_98.11_weights')
model.load_model_biases('NN_crysx_mnist_plus_98.11_biases')
## Test data set
path = 'MNIST-PLUS-PNG/mnist_plus_png'
testData, testLabels = mu.loadMNIST(path_main=path, train=False, shuffle=True)
## Normalize within the range [0,1.0]
testData = testData/255. # Normalize
## Standardize the data so that it has mean 0 and variance 1
# Use the mean and std of training data **********
testData = (testData - trainData_mean) / trainData_std
## Convert labels to one-hot vectors
testLabels = mu.one_hot_encode(testLabels, 10)
## Flatten the input numpy arrays (nSamples,28,28)-&gt;(nSamples, 784)
testData = testData.reshape(testData.shape[0], 784)
## Performance on Test data
# Convert to float32 arrays
inputs = testData.astype(np.float32)
outputs = testLabels.astype(np.float32)
predictions, error, accuracy = model.predict(inputs, outputs, loss_func_name='CCE', get_accuracy=True)
print('Error:',error)
print('Accuracy %:',accuracy*100)
</code></pre>
I’m a physicist specializing in computational material science with a PhD in Physics from Friedrich-Schiller University Jena, Germany. I write efficient codes for simulating light-matter interactions at atomic scales. I like to develop Physics, DFT, and Machine Learning related apps and software from time to time. Can code in most of the popular languages. I like to share my knowledge in Physics and applications using this Blog and a YouTube channel.

