Convolutional Neural Networks

Neil D. Lawrence

From Deep Networks to CNNs

  • Review: Deep networks, chain rule
  • Today: How convolutional networks exploit these concepts

Chain Rule for Layered CNN Architecture

CNN Chain Rule Overview

  • Layered Architecture: Each layer implements forward(), backward(), parameters
  • Spatial Operations: Convolution, pooling, flattening require specialized gradients
  • Composition: CNN built by composing ConvolutionalLayer, MaxPoolingLayer, FlattenLayer
  • Gradient Flow: Each layer computes its own gradients independently
  • Verification: Finite difference testing ensures mathematical correctness

CNN Layer Types and Their Gradients

  • ConvolutionalLayer: \(\frac{\partial L}{\partial \mathbf{X}}\), \(\frac{\partial L}{\partial \filters}\), \(\frac{\partial L}{\partial \biases}\)
  • MaxPoolingLayer: \(\frac{\partial L}{\partial \mathbf{X}}\) (no parameters)
  • FlattenLayer: \(\frac{\partial L}{\partial \mathbf{X}}\) (no parameters)
  • FullyConnectedLayer: Standard neural network gradients
  • LinearLayer: Linear transformation gradients

Convolutional Layer Forward Pass

  • Input: \(\mathbf{X}\) of shape \((B, C_{in}, H, W)\)
  • Filters: \(\filters\) of shape \((C_{out}, C_{in}, K_h, K_w)\)
  • Output: \(\outputMatrix\) of shape \((B, C_{out}, H_{out}, W_{out})\)
  • Operation: \(\outputMatrix[b,c,h,w] = \sum_{i,j,k} \mathbf{X}[b,k,h+i,w+j] \cdot \filters[c,k,i,j] + \biases[c]\)

Convolutional Layer Backward Pass

  • Output gradient: \(\frac{\partial L}{\partial \outputMatrix}\) from next layer
  • Filter gradient: \(\frac{\partial L}{\partial \filters[c,k,i,j]} = \sum_{b,h,w} \frac{\partial L}{\partial \outputMatrix[b,c,h,w]} \cdot \mathbf{X}[b,k,h+i,w+j]\)
  • Bias gradient: \(\frac{\partial L}{\partial \biases[c]} = \sum_{b,h,w} \frac{\partial L}{\partial \outputMatrix[b,c,h,w]}\)
  • Input gradient: \(\frac{\partial L}{\partial \mathbf{X}[b,k,h,w]} = \sum_{c,i,j} \frac{\partial L}{\partial \outputMatrix[b,c,h-i,w-j]} \cdot \filters[c,k,i,j]\)

Max Pooling Layer Gradients

  • Forward: \(\outputMatrix[b,c,h,w] = \max_{i,j \in \poolRegion} \mathbf{X}[b,c,h \cdot \stride + i, w \cdot \stride + j]\)
  • Backward: \(\frac{\partial L}{\partial \mathbf{X}[b,c,h,w]} = \begin{cases} \frac{\partial L}{\partial \outputMatrix[b,c,h_{out},w_{out}]} & \text{if } (h,w) \text{ was the max in pool region} \\ 0 & \text{otherwise} \end{cases}\)

Flatten Layer Gradients

  • Forward: \(\outputMatrix = \mathbf{X}.reshape(B, -1)\)
  • Backward: \(\frac{\partial L}{\partial \mathbf{X}} = \frac{\partial L}{\partial \outputMatrix}.reshape(\inputShape)\)

CNN Chain Rule Implementation

  • Layer-wise: Each layer computes its own gradients independently
  • Composition: LayeredNeuralNetwork coordinates gradient flow between layers
  • Spatial awareness: Gradients preserve spatial structure through convolution
  • Parameter updates: Each layer manages its own parameter gradients

Multi-Path Gradient Flow in CNNs

  • Convolutional path: \(\mathbf{X}\rightarrow \convOutput \rightarrow \poolOutput \rightarrow \flattenOutput\)
  • Parameter paths: \(\filters \rightarrow \convOutput\), \(\biases \rightarrow \convOutput\)
  • Spatial paths: Each spatial location has independent gradient flow
  • Channel paths: Each output channel has independent gradient computation

Activation Function Integration

  • ReLU in convolution: \(\frac{\partial L}{\partial \convOutput} = \frac{\partial L}{\partial \activationOutput} \odot \frac{\partial \phi}{\partial \convOutput}\)
  • ReLU gradient: \(\frac{\partial \reluFunction}{\partial x} = \begin{cases} 1 & \text{if } x > 0 \\ 0 & \text{if } x \leq 0 \end{cases}\)
  • Spatial activation: Applied element-wise across all spatial locations

CNN Gradient Verification

  • Finite differences: Compare analytical vs numerical gradients
  • Spatial testing: Verify gradients at different spatial locations
  • Channel testing: Verify gradients for different output channels
  • End-to-end testing: Verify complete CNN gradient flow

Layered CNN Architecture Benefits

  • Modularity: Each layer type has clear gradient responsibilities
  • Testability: Each layer can be tested independently
  • Composability: Layers can be combined in any order
  • Educational: Clear separation makes learning easier
  • Maintainability: Easy to add new layer types

Complete CNN Gradient Flow

  • Input: \(\mathbf{X}\) (batch of images)
  • Convolution: \(\convOutput = \reluFunction(\convolution(\mathbf{X}, \filters) + \biases)\)
  • Pooling: \(\poolOutput = \maxPool(\convOutput)\)
  • Flatten: \(\flattenOutput = \flatten(\poolOutput)\)
  • Dense: \(\denseOutput = \reluFunction(\denseOutput \cdot \denseWeights + \denseBiases)\)
  • Output: \(\outputMatrix = \denseOutput \cdot \outputWeights + \outputBiases\)

CNN Gradient Chain Rule

  • Output to dense: \(\frac{\partial L}{\partial \denseOutput} = \frac{\partial L}{\partial \outputMatrix} \cdot \outputWeights^\top\)
  • Dense to flatten: \(\frac{\partial L}{\partial \flattenOutput} = \frac{\partial L}{\partial \denseOutput} \cdot \denseWeights^\top\)
  • Flatten to pool: \(\frac{\partial L}{\partial \poolOutput} = \frac{\partial L}{\partial \flattenOutput}.reshape(\poolShape)\)
  • Pool to conv: \(\frac{\partial L}{\partial \convOutput} = \maxPoolGradient(\frac{\partial L}{\partial \poolOutput})\)
  • Conv to input: \(\frac{\partial L}{\partial \mathbf{X}} = \convolutionGradient(\frac{\partial L}{\partial \convOutput}, \filters)\)

Parameter Gradients in CNNs

  • Filter gradients: \(\frac{\partial L}{\partial \filters} = \sum_{b,h,w} \frac{\partial L}{\partial \convOutput[b,c,h,w]} \cdot \mathbf{X}[b,k,h+i,w+j]\)
  • Bias gradients: \(\frac{\partial L}{\partial \biases} = \sum_{b,h,w} \frac{\partial L}{\partial \convOutput[b,c,h,w]}\)
  • Dense gradients: Standard neural network parameter gradients
  • Output gradients: Final layer parameter gradients

Implementation Verification

  • Layer testing: Each layer tested independently with finite differences
  • Composition testing: Complete CNN tested end-to-end
  • Spatial testing: Gradients verified at different spatial locations
  • Channel testing: Gradients verified for different output channels
  • Parameter testing: All parameter gradients verified numerically

Educational Benefits

  • Clear separation: Each layer type has distinct gradient responsibilities
  • Modular learning: Students can understand each component independently
  • Visual debugging: Easy to see where gradients flow and where they don’t
  • Mathematical rigor: All gradients verified with finite differences
  • Practical implementation: Code directly maps to mathematical theory

Summary

  • Layered design: Each CNN layer type has specialized gradient computation
  • Spatial awareness: Gradients preserve spatial structure through convolution
  • Modular testing: Each layer can be verified independently
  • Educational clarity: Clear separation of concerns makes learning easier
  • Mathematical rigor: All gradients verified with finite differences
  • Practical implementation: Code directly implements the mathematical theory

Code Mapping

  • ConvolutionalLayer: Implements spatial convolution with filter and bias gradients
  • MaxPoolingLayer: Implements max pooling with sparse gradient distribution
  • FlattenLayer: Implements spatial-to-vector conversion with shape preservation
  • LayeredNeuralNetwork: Coordinates gradient flow between all layer types
  • Gradient testing: Comprehensive finite difference verification

Verification with Our Implementation

  • Gradient testing: Use finite_difference_gradient to verify analytical gradients
  • Spatial verification: Check gradients at different spatial locations
  • Channel verification: Verify gradients for different output channels
  • End-to-end testing: Complete CNN gradient flow verification
  • Parameter testing: All parameter gradients verified numerically

24

Simple CNN Implementation

Explore Different Image Patterns

Create and Test Convolutional Layer

Test Max Pooling Layer

Test Flatten Layer

Test CNN Gradient Flow

Build Complete CNN with Layered Architecture

Train CNN for Image Classification

Visualise CNN Feature Maps

Test Different CNN Architectures

Benefits of the New CNN Architecture

Compare CNN with Traditional Neural Networks

Thanks!

References