# Understanding LeNet (LeCun, 1998)

As an attempt to understand Convolutional Neural Network (CNN/ConvNet) better, I was suggested to read the section about LeNet5 in the original paper and figure out where every numbers come from 🤔

###### Input layer

The input of this neural network is an image of size `32*32` pixels where each pixels are represented by an input neuron. Pixel value `0..255` is normalized to `-0.1..1.75` so that the mean is 0 and the variance is around 1.

The hidden layers (C1 up to F6) use Hyperbolic Tangent or `tanh` as activation function.

###### Convolution layer 1 (C1)

Since a local receptive fields of size `5*5` are chosen hence the shared weight size (kernel) is also `5*5` for each feature maps. Since the kernel has `1` bias and `6` feature maps are required therefore the number of trainable parameters (weights & biases) are

``trainable params = (weight * input maps + bias) * feature maps = (5 * 5 * 1 + 1) * 6 = 156 ``

Since the size of each feature maps are `28*28`,

``connections = (input + bias) * feature maps * feature map size = (5 * 5 + 1) * 6 * 28 * 28 = 122304 ``

The size of feature map `28x28` is the consequence of intentional overlapping pixels by `4` columns.

###### Subsampling layer 2 (S2)

In this layer the kernel size is `2*2` and weights are shared. The differences from C1 are no pixel overlapping and only `1` weight and `1` bias used per feature maps. Since the output of this layer is `6` feature maps (the same as input maps),

``trainable params = (weight + bias) * feature maps = (1 + 1) * 6 = 12 ``

Since the size of each feature maps is `14 * 14`,

``connections = (input + bias) * feature maps * feature map size = (2 * 2 + 1) * 6 * 14 * 14 = 5880 ``
###### Convolutional layer 3 (C3)

C3 layer is similar to C1 except that there are more than one input maps and each (output) feature maps are connected to different input maps. These are the arrangement of `16` feature maps of size `10*10`:

• First `6` feature maps are connected to `3` contiguous input maps each (overlapping 2 maps)
• Second `6` feature maps are connected to `4` contiguous input maps (overlapping 3 maps)
• Next `3` feature maps are connected to `4` discontinuous input maps (overlapping 1 map)
• Last `1` feature map are connected to all `6` input maps

Hence,

``trainable params = (weight * input maps + bias) * feature maps 1st group = (5 * 5 * 3 + 1) * 6 = 456 2nd group = (5 * 5 * 4 + 1) * 6 = 606 3rd group = (5 * 5 * 4 + 1) * 3 = 303 4th group = (5 * 5 * 6 + 1) * 1 = 151 all group = 456 + 606 + 303 + 151 = 1516 ``

then,

``connections = (input + bias) * feature maps * feature map size = trainable params * feature map size = 1516 * 10 * 10 = 151600 ``
###### Subsampling layer 4 (S4)

Similar to S2 except the number of feature maps is `16` (the same as input maps), and each of the is `5*5` pixels. Hence,

``trainable params = (weight + bias) * feature maps = (1 + 1) * 16 = 32 ``

and,

``connections = (input + bias) * feature maps * feature map size = (2 * 2 + 1) * 16 * 5 * 5 = 2000 ``
###### Convolution layer (C5)

The last convolution layer is similar to C3 except the number of feature maps is `120` and each of them are connected to all input maps. Hence,

``trainable params = (weight * input maps + bias) * feature maps = (5 * 5 * 16 + 1) * 120 = 48120 ``

then,

``connections = (input + bias) * feature maps * feature map size = trainable params * feature map size = 48120 * 1 * 1 = 48120 ``
###### Fully-connected layer (F6)

This layer is just a simple neural network layer with `84` output neurons. Hence,

``trainable params = connections = (input + bias) * output = (120 + 1) * 84 = 10164 ``
###### Output layer (F6)

Finally, the output layer consists of `10` Euclidean Radian Basis Function (RBF) units that matches the number of classes.

And we are done! 🍻  wpDiscuz