Calculating the shape of layers have always been a hard thing for me, today I found this:
For any convolutional layer, the output feature maps will have the specified depth (a depth of 10 for 10 filters in a convolutional layer) and the dimensions of the produced feature maps (width/height) can be computed as the input image width/height, W, minus the filter size, F, divided by the stride, S, all + 1. The equation looks like:
output_dim = (W-F)/S + 1, for an assumed padding size of 0. You can find a derivation of this formula, here.
For a pool layer with a size 2 and stride 2, the output dimension will be reduced by a factor of 2. Read the comments in the code below to see the output size for each layer.
So for example for an input of 28 x 28 pixels in grayscale (1,28,28) where 1 is the channels (since it is grayscale it is only 1) when applying one convolutional layer like this:
nn.Conv2d(1, 10, 3) where
nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding) so 1 channel in, 10 out and a kernel of 3 x 3 then the output layer will have a shape of (10,26,26)
10 because that is what we specified as output channel and 26 because it is the result of the formula
output_dim = (W-F)/S + 1 that being translated into
=((28-3)/1)+1 (28 pixels – 3 of the kernel (or filter) / by the stride which we didn’t specify so by default that is 1 and then + 1) = 26
If then we apply a MaxPooling layer with a kernel of 2 x 2 and stride of 2 x 2 then the formula is easier just divide it by 2!
So if we apply this
nn.MaxPool2d(2, 2) then we just need to divide the previous result
26 that is
26 / 2 = 13 that will give us a layer of shape
(10,13,13) 10 since we keep the same number of outputs so we still take 10 as number of inputs.
If at any moment the formula yields a result with decimals, the number will be rounded down (just get rid of the decimal part)