Convolutional Neural Networks

Neil D. Lawrence

2025-09-30

From Deep Networks to CNNs

Review: Deep networks, chain rule
Today: How convolutional networks exploit these concepts

Chain Rule for Layered CNN Architecture

CNN Chain Rule Overview

Layered Architecture: Each layer implements forward(), backward(), parameters
Spatial Operations: Convolution, pooling, flattening require specialized gradients
Composition: CNN built by composing ConvolutionalLayer, MaxPoolingLayer, FlattenLayer
Gradient Flow: Each layer computes its own gradients independently
Verification: Finite difference testing ensures mathematical correctness

CNN Layer Types and Their Gradients

ConvolutionalLayer: \(\frac{\partial L}{\partial \mathbf{X}}\), \(\frac{\partial L}{\partial \filters}\), \(\frac{\partial L}{\partial \biases}\)
MaxPoolingLayer: \(\frac{\partial L}{\partial \mathbf{X}}\) (no parameters)
FlattenLayer: \(\frac{\partial L}{\partial \mathbf{X}}\) (no parameters)
FullyConnectedLayer: Standard neural network gradients
LinearLayer: Linear transformation gradients

Convolutional Layer Forward Pass

Input: \(\mathbf{X}\) of shape \((B, C_{in}, H, W)\)
Filters: \(\filters\) of shape \((C_{out}, C_{in}, K_h, K_w)\)
Output: \(\outputMatrix\) of shape \((B, C_{out}, H_{out}, W_{out})\)
Operation: \(\outputMatrix[b,c,h,w] = \sum_{i,j,k} \mathbf{X}[b,k,h+i,w+j] \cdot \filters[c,k,i,j] + \biases[c]\)

Convolutional Layer Backward Pass

Output gradient: \(\frac{\partial L}{\partial \outputMatrix}\) from next layer
Filter gradient: \(\frac{\partial L}{\partial \filters[c,k,i,j]} = \sum_{b,h,w} \frac{\partial L}{\partial \outputMatrix[b,c,h,w]} \cdot \mathbf{X}[b,k,h+i,w+j]\)
Bias gradient: \(\frac{\partial L}{\partial \biases[c]} = \sum_{b,h,w} \frac{\partial L}{\partial \outputMatrix[b,c,h,w]}\)
Input gradient: \(\frac{\partial L}{\partial \mathbf{X}[b,k,h,w]} = \sum_{c,i,j} \frac{\partial L}{\partial \outputMatrix[b,c,h-i,w-j]} \cdot \filters[c,k,i,j]\)

Max Pooling Layer Gradients

Forward: \(\outputMatrix[b,c,h,w] = \max_{i,j \in \poolRegion} \mathbf{X}[b,c,h \cdot \stride + i, w \cdot \stride + j]\)
Backward: \(\frac{\partial L}{\partial \mathbf{X}[b,c,h,w]} = \begin{cases} \frac{\partial L}{\partial \outputMatrix[b,c,h_{out},w_{out}]} & \text{if } (h,w) \text{ was the max in pool region} \\ 0 & \text{otherwise} \end{cases}\)

Flatten Layer Gradients

Forward: \(\outputMatrix = \mathbf{X}.reshape(B, -1)\)
Backward: \(\frac{\partial L}{\partial \mathbf{X}} = \frac{\partial L}{\partial \outputMatrix}.reshape(\inputShape)\)

CNN Chain Rule Implementation

Layer-wise: Each layer computes its own gradients independently
Composition: LayeredNeuralNetwork coordinates gradient flow between layers
Spatial awareness: Gradients preserve spatial structure through convolution
Parameter updates: Each layer manages its own parameter gradients

Multi-Path Gradient Flow in CNNs

Convolutional path: \(\mathbf{X}\rightarrow \convOutput \rightarrow \poolOutput \rightarrow \flattenOutput\)
Parameter paths: \(\filters \rightarrow \convOutput\), \(\biases \rightarrow \convOutput\)
Spatial paths: Each spatial location has independent gradient flow
Channel paths: Each output channel has independent gradient computation

Activation Function Integration

ReLU in convolution: \(\frac{\partial L}{\partial \convOutput} = \frac{\partial L}{\partial \activationOutput} \odot \frac{\partial \phi}{\partial \convOutput}\)
ReLU gradient: \(\frac{\partial \reluFunction}{\partial x} = \begin{cases} 1 & \text{if } x > 0 \\ 0 & \text{if } x \leq 0 \end{cases}\)
Spatial activation: Applied element-wise across all spatial locations

CNN Gradient Verification

Finite differences: Compare analytical vs numerical gradients
Spatial testing: Verify gradients at different spatial locations
Channel testing: Verify gradients for different output channels
End-to-end testing: Verify complete CNN gradient flow

Layered CNN Architecture Benefits

Modularity: Each layer type has clear gradient responsibilities
Testability: Each layer can be tested independently
Composability: Layers can be combined in any order
Educational: Clear separation makes learning easier
Maintainability: Easy to add new layer types

Complete CNN Gradient Flow

Input: \(\mathbf{X}\) (batch of images)
Convolution: \(\convOutput = \reluFunction(\convolution(\mathbf{X}, \filters) + \biases)\)
Pooling: \(\poolOutput = \maxPool(\convOutput)\)
Flatten: \(\flattenOutput = \flatten(\poolOutput)\)
Dense: \(\denseOutput = \reluFunction(\denseOutput \cdot \denseWeights + \denseBiases)\)
Output: \(\outputMatrix = \denseOutput \cdot \outputWeights + \outputBiases\)

CNN Gradient Chain Rule

Output to dense: \(\frac{\partial L}{\partial \denseOutput} = \frac{\partial L}{\partial \outputMatrix} \cdot \outputWeights^\top\)
Dense to flatten: \(\frac{\partial L}{\partial \flattenOutput} = \frac{\partial L}{\partial \denseOutput} \cdot \denseWeights^\top\)
Flatten to pool: \(\frac{\partial L}{\partial \poolOutput} = \frac{\partial L}{\partial \flattenOutput}.reshape(\poolShape)\)
Pool to conv: \(\frac{\partial L}{\partial \convOutput} = \maxPoolGradient(\frac{\partial L}{\partial \poolOutput})\)
Conv to input: \(\frac{\partial L}{\partial \mathbf{X}} = \convolutionGradient(\frac{\partial L}{\partial \convOutput}, \filters)\)

Parameter Gradients in CNNs

Filter gradients: \(\frac{\partial L}{\partial \filters} = \sum_{b,h,w} \frac{\partial L}{\partial \convOutput[b,c,h,w]} \cdot \mathbf{X}[b,k,h+i,w+j]\)
Bias gradients: \(\frac{\partial L}{\partial \biases} = \sum_{b,h,w} \frac{\partial L}{\partial \convOutput[b,c,h,w]}\)
Dense gradients: Standard neural network parameter gradients
Output gradients: Final layer parameter gradients

Implementation Verification

Layer testing: Each layer tested independently with finite differences
Composition testing: Complete CNN tested end-to-end
Spatial testing: Gradients verified at different spatial locations
Channel testing: Gradients verified for different output channels
Parameter testing: All parameter gradients verified numerically

Educational Benefits

Clear separation: Each layer type has distinct gradient responsibilities
Modular learning: Students can understand each component independently
Visual debugging: Easy to see where gradients flow and where they don’t
Mathematical rigor: All gradients verified with finite differences
Practical implementation: Code directly maps to mathematical theory

Summary

Layered design: Each CNN layer type has specialized gradient computation
Spatial awareness: Gradients preserve spatial structure through convolution
Modular testing: Each layer can be verified independently
Educational clarity: Clear separation of concerns makes learning easier
Mathematical rigor: All gradients verified with finite differences
Practical implementation: Code directly implements the mathematical theory

Code Mapping

ConvolutionalLayer: Implements spatial convolution with filter and bias gradients
MaxPoolingLayer: Implements max pooling with sparse gradient distribution
FlattenLayer: Implements spatial-to-vector conversion with shape preservation
LayeredNeuralNetwork: Coordinates gradient flow between all layer types
Gradient testing: Comprehensive finite difference verification

Verification with Our Implementation

Gradient testing: Use finite_difference_gradient to verify analytical gradients
Spatial verification: Check gradients at different spatial locations
Channel verification: Verify gradients for different output channels
End-to-end testing: Complete CNN gradient flow verification
Parameter testing: All parameter gradients verified numerically

Convolutional Neural Networks

From Deep Networks to CNNs

Chain Rule for Layered CNN Architecture

CNN Chain Rule Overview

CNN Layer Types and Their Gradients

Convolutional Layer Forward Pass

Convolutional Layer Backward Pass

Max Pooling Layer Gradients

Flatten Layer Gradients

CNN Chain Rule Implementation

Multi-Path Gradient Flow in CNNs

Activation Function Integration

CNN Gradient Verification

Layered CNN Architecture Benefits

Complete CNN Gradient Flow

CNN Gradient Chain Rule

Parameter Gradients in CNNs

Implementation Verification

Educational Benefits

Summary

Code Mapping

Verification with Our Implementation

Simple CNN Implementation

Explore Different Image Patterns

Create and Test Convolutional Layer

Test Max Pooling Layer

Test Flatten Layer

Test CNN Gradient Flow

Build Complete CNN with Layered Architecture

Train CNN for Image Classification

Visualise CNN Feature Maps

Test Different CNN Architectures

Benefits of the New CNN Architecture

Compare CNN with Traditional Neural Networks

Thanks!

References