Week 5: Convolutional Neural Networks

[jupyter][google colab][reveal]

Neil D. Lawrence

Abstract:

This lecture builds on deep neural networks to explore …

ML Foundations Course Notebook Setup

[edit]

We install some bespoke codes for creating and saving plots as well as loading data sets.

%%capture
%pip install notutils
%pip install git+https://github.com/lawrennd/ods.git
%pip install git+https://github.com/lawrennd/mlai.git
import notutils
import pods
import mlai
import mlai.plot as plot

From Deep Networks to CNNs

[edit]

In our previous lectures, we explored how composing layers of basis functions creates deep neural networks, and we examined the chain rule and automatic differentiation that makes training these networks possible. We’ve seen how we can consider structured data through convolutional neural networks, graph neural networks, recurrent networks. Today we’ll see how these foundations extend to one of the earliest architectural innovations in deep learning: the convolutional neural network.

Chain Rule for Layered CNN Architecture

[edit]

Our layered CNN architecture provides a clean separation of concerns where each layer type has its own gradient computation. This makes the chain rule more modular and easier to understand than traditional CNN implementations.

Each layer type in our CNN architecture has specific gradient computations. The convolutional layer is the most complex, requiring gradients for filters, biases, and input. Pooling and flattening layers are simpler but still require careful handling of spatial dimensions.

The convolutional forward pass applies filters across the input image, computing dot products between filter weights and local image regions. This creates feature maps that detect specific patterns in the input.

The convolutional backward pass requires careful indexing to ensure gradients flow correctly through the spatial dimensions. The filter gradient accumulates contributions from all spatial locations where the filter was applied.

Max pooling gradients are sparse - only the maximum positions in each pooling region receive gradients. This creates a natural form of attention where only the most important features contribute to the gradient flow.

The flatten layer simply reshapes the gradient to match the input shape. This is a straightforward operation but crucial for connecting convolutional layers to fully connected layers.

Our implementation uses a layered approach where each layer is responsible for its own gradient computation. The LayeredNeuralNetwork coordinates the flow of gradients between layers, ensuring that spatial information is preserved correctly.

CNNs have more complex gradient flow than standard neural networks because of the spatial structure. Each spatial location and channel can have independent gradients, creating a rich gradient landscape.

Activation functions in CNNs are applied element-wise across all spatial locations and channels. The ReLU activation creates sparse gradients where only positive activations contribute to the gradient flow.

Our implementation includes comprehensive gradient testing using finite differences. This ensures that the complex spatial gradient computations are mathematically correct.

The layered architecture makes CNN gradients much more manageable. Students can understand each component in isolation before seeing how they compose together. This is a significant advantage over monolithic CNN implementations.

The complete CNN forward pass shows how different layer types compose together. Each layer transforms the data in a specific way, and the gradients must flow back through each transformation correctly.

The CNN chain rule shows how gradients flow back through each layer type. Each layer implements its own gradient computation, and the LayeredNeuralNetwork coordinates the flow between layers.

Parameter gradients in CNNs include the specialized convolutional gradients for filters and biases, plus the standard gradients for fully connected layers. The convolutional gradients require spatial accumulation across all locations where each filter was applied.

Our implementation includes comprehensive testing that verifies the mathematical correctness of all gradient computations. This ensures that the complex spatial gradient flow is implemented correctly.

The layered CNN architecture makes the chain rule much more accessible to students. They can see exactly how each layer type contributes to the overall gradient computation, and the modular design makes it easy to understand the complete system.

The layered CNN architecture provides a clean, educational approach to understanding convolutional neural network gradients. Each layer type has clear responsibilities, and the modular design makes it easy to understand how the complete system works. The comprehensive testing ensures mathematical correctness while the clear separation of concerns makes the learning process much more accessible.

Students can directly map the mathematical theory to our implementation. Each layer class implements the specific gradient computations described in the chain rule, and the LayeredNeuralNetwork coordinates the flow between layers. The comprehensive testing ensures that the implementation matches the mathematical theory exactly.

Students can verify the chain rule implementation using our comprehensive gradient testing framework. The tests in test_convolutional_layers.py and test_neural_networks.py demonstrate how to use finite differences to verify that our analytical gradients match the mathematical theory for CNNs.

24

Simple CNN Implementation

[edit]
import numpy as np
from mlai import create_image_data

# Create synthetic image data
X_images, y_images = create_image_data(n_samples=200, image_size=16, n_classes=3)

print(f"Image data: {X_images.shape} -> {y_images.shape}")
print(f"Sample image shape: {X_images[0].shape}")
print(f"Image value range: [{X_images.min():.2f}, {X_images.max():.2f}]")
print(f"Classes: {np.unique(y_images)}")

Explore Different Image Patterns

# Let's explore the different types of synthetic images
print("Synthetic Image Patterns for CNN Learning:")
print("=" * 50)

# Show examples of each class
for class_id in range(3):
    class_indices = np.where(y_images == class_id)[0]
    sample_idx = class_indices[0]
    
    print(f"\nClass {class_id} (Pattern type {class_id}):")
    print(f"  Sample image shape: {X_images[sample_idx].shape}")
    print(f"  Image statistics: mean={X_images[sample_idx].mean():.3f}, std={X_images[sample_idx].std():.3f}")
    print(f"  Non-zero pixels: {np.count_nonzero(X_images[sample_idx])}")

print(f"\nThese different patterns test different CNN capabilities:")
print(f"- Horizontal lines: Tests horizontal edge detection")
print(f"- Vertical lines: Tests vertical edge detection") 
print(f"- Diagonal patterns: Tests diagonal feature detection")

Create and Test Convolutional Layer

from mlai import ConvolutionalLayer
# Create a basic convolutional layer
conv_layer = ConvolutionalLayer(input_channels=1, output_channels=4, kernel_size=3, padding=1)

# Test forward pass
X_test = np.random.randn(2, 1, 8, 8)
conv_output = conv_layer.forward(X_test)

print("Convolutional Layer Test:")
print(f"Input shape: {X_test.shape}")
print(f"Output shape: {conv_output.shape}")
print(f"Number of filters: {conv_layer.output_channels}")
print(f"Filter shape: {conv_layer.filters.shape}")
print(f"Layer parameters: {len(conv_layer.parameters)}")
print("This is now a proper Layer that can be composed with other layers!")

Test Max Pooling Layer

from mlai import MaxPoolingLayer
# Test max pooling layer
pool_layer = MaxPoolingLayer(pool_size=2, stride=2)

# Forward pass
pool_output = pool_layer.forward(conv_output)

print("Max Pooling Layer Test:")
print(f"Input shape: {conv_output.shape}")
print(f"Output shape: {pool_output.shape}")
print(f"Pooling reduces spatial dimensions by factor of 2")
print(f"Layer parameters: {len(pool_layer.parameters)} (no trainable parameters)")
print("Max pooling helps with translation invariance and reduces computation!")

Test Flatten Layer

from mlai import FlattenLayer
# Test flatten layer
flatten_layer = FlattenLayer()

# Forward pass
flatten_output = flatten_layer.forward(pool_output)

print("Flatten Layer Test:")
print(f"Input shape: {pool_output.shape}")
print(f"Output shape: {flatten_output.shape}")
print(f"Flattened size: {flatten_output.size}")
print(f"Layer parameters: {len(flatten_layer.parameters)} (no trainable parameters)")
print("Flatten layer converts spatial features to 1D for fully connected layers!")

Test CNN Gradient Flow

# Test gradient flow through CNN layers (demonstrating chain rule)
from mlai import MeanSquaredError

# Create a simple CNN
conv = ConvolutionalLayer(input_channels=1, output_channels=2, kernel_size=3, padding=1)
pool = MaxPoolingLayer(pool_size=2, stride=2)
flatten = FlattenLayer()

# Test input
X_test = np.random.randn(1, 1, 8, 8)

# Forward pass through CNN
conv_out = conv.forward(X_test)
pool_out = pool.forward(conv_out)
flatten_out = flatten.forward(pool_out)

# Create dummy loss
target = np.random.randn(flatten_out.shape[0], flatten_out.shape[1])
loss_fn = MeanSquaredError()
loss_value = loss_fn.forward(flatten_out, target)

# Backward pass (demonstrates chain rule through CNN)
loss_gradient = loss_fn.gradient(flatten_out, target)
flatten_grad = flatten.backward(loss_gradient)
pool_grad = pool.backward(flatten_grad)
conv_grad = conv.backward(pool_grad)

print("CNN Chain Rule Demonstration:")
print(f"Loss value: {loss_value:.4f}")
print(f"Input gradient shape: {conv_grad[0].shape}")
print(f"Input gradient norm: {np.linalg.norm(conv_grad[0]):.4f}")
print("This shows how gradients flow through convolution, pooling, and flattening")
print("The CNN layers implement the chain rule for spatial feature extraction!")

Build Complete CNN with Layered Architecture

# Create a complete CNN using the new layered architecture
from mlai import LayeredNeuralNetwork, FullyConnectedLayer, LinearLayer, ReLUActivation, ConvolutionalLayer, MaxPoolingLayer, FlattenLayer

# Define CNN architecture
cnn_layers = [
    ConvolutionalLayer(input_channels=1, output_channels=8, kernel_size=3, padding=1, activation=ReLUActivation()),
    MaxPoolingLayer(pool_size=2, stride=2),
    ConvolutionalLayer(input_channels=8, output_channels=16, kernel_size=3, padding=1, activation=ReLUActivation()),
    MaxPoolingLayer(pool_size=2, stride=2),
    FlattenLayer(),
    FullyConnectedLayer(16 * 4 * 4, 64, activation=ReLUActivation()),  # 16 channels * 4*4 spatial after pooling
    LinearLayer(64, 3),  # 3 classes (no activation for final layer)
]

# Create CNN using layered architecture
cnn = LayeredNeuralNetwork(cnn_layers)

print("Complete CNN Architecture:")
print(f"Number of layers: {len(cnn_layers)}")
print(f"Total parameters: {len(cnn.parameters)}")
print(f"Layer types: {[type(layer).__name__ for layer in cnn_layers]}")
print("This demonstrates the new modular CNN architecture!")
# Test forward pass with complete CNN
X_test = np.random.randn(2, 1, 16, 16)  # 16x16 images
cnn_output = cnn.forward(X_test)

print("Complete CNN Test:")
print(f"Input shape: {X_test.shape}")
print(f"Output shape: {cnn_output.shape}")
print(f"Model parameters: {len(cnn.parameters)}")
print("This demonstrates the new modular CNN architecture!")
print("Each layer can be composed and tested independently!")

Train CNN for Image Classification

X_images, y_images = create_image_data(n_samples=200, image_size=16, n_classes=3)
cnn_model, cnn_losses, cnn_accuracies = train_cnn_classification(X_images, y_images)

Figure: CNN Training Progress for image classification

Visualise CNN Feature Maps

Figure: CNN feature maps showing how the network learns to detect different patterns

Test Different CNN Architectures

# Test different CNN architectures to see how they handle different patterns
print("Testing Different CNN Architectures:")
print("=" * 50)

# Architecture 1: Simple CNN
simple_layers = [
    ConvolutionalLayer(input_channels=1, output_channels=4, kernel_size=3, padding=1, activation=ReLUActivation()),
    MaxPoolingLayer(pool_size=2, stride=2),
    FlattenLayer(),
    LinearLayer(4 * 8 * 8, 3),
]
simple_cnn = LayeredNeuralNetwork(simple_layers)

# Architecture 2: Deeper CNN
deep_layers = [
    ConvolutionalLayer(input_channels=1, output_channels=8, kernel_size=3, padding=1, activation=ReLUActivation()),
    MaxPoolingLayer(pool_size=2, stride=2),
    ConvolutionalLayer(input_channels=8, output_channels=16, kernel_size=3, padding=1, activation=ReLUActivation()),
    MaxPoolingLayer(pool_size=2, stride=2),
    FlattenLayer(),
    FullyConnectedLayer(16 * 4 * 4, 32, activation=ReLUActivation()),
    LinearLayer(32, 3),
]
deep_cnn = LayeredNeuralNetwork(deep_layers)

# Test both architectures
X_test = np.random.randn(1, 1, 16, 16)

print(f"Simple CNN:")
print(f"  Layers: {len(simple_layers)}")
print(f"  Parameters: {len(simple_cnn.parameters)}")
print(f"  Output shape: {simple_cnn.forward(X_test).shape}")

print(f"\nDeeper CNN:")
print(f"  Layers: {len(deep_layers)}")
print(f"  Parameters: {len(deep_cnn.parameters)}")
print(f"  Output shape: {deep_cnn.forward(X_test).shape}")

print(f"\nDeeper networks have more parameters but can learn more complex patterns!")

Benefits of the New CNN Architecture

# Demonstrate the benefits of the new modular CNN architecture
print("Benefits of the New CNN Architecture:")
print("=" * 50)

# 1. Composable layers
from mlai import LayeredNeuralNetwork, ConvolutionalLayer, MaxPoolingLayer, FlattenLayer, LinearLayer, ReLUActivation

# Create different layer combinations
cnn1 = LayeredNeuralNetwork([
    ConvolutionalLayer(input_channels=1, output_channels=8, kernel_size=3, activation=ReLUActivation()),
    MaxPoolingLayer(pool_size=2),
    FlattenLayer(),
    LinearLayer(8 * 7 * 7, 10)
])

cnn2 = LayeredNeuralNetwork([
    ConvolutionalLayer(input_channels=1, output_channels=16, kernel_size=5, activation=ReLUActivation()),
    MaxPoolingLayer(pool_size=2),
    ConvolutionalLayer(input_channels=16, output_channels=32, kernel_size=3, activation=ReLUActivation()),
    MaxPoolingLayer(pool_size=2),
    FlattenLayer(),
    LinearLayer(32 * 3 * 3, 10)
])

print(f"CNN 1 (Simple): {len(cnn1.parameters)} parameters")
print(f"CNN 2 (Complex): {len(cnn2.parameters)} parameters")
print(f"Layer types in CNN 1: {[type(l).__name__ for l in cnn1.layers]}")
print(f"Layer types in CNN 2: {[type(l).__name__ for l in cnn2.layers]}")

# 2. Independent layer testing
print("\nIndependent Layer Testing:")
conv_layer = ConvolutionalLayer(input_channels=1, output_channels=4, kernel_size=3)
pool_layer = MaxPoolingLayer(pool_size=2)

# Test each layer independently
X_test = np.random.randn(1, 1, 8, 8)
conv_output = conv_layer.forward(X_test)
pool_output = pool_layer.forward(conv_output)

print(f"Convolutional output shape: {conv_output.shape}")
print(f"Pooling output shape: {pool_output.shape}")
print("Each layer can be tested and debugged independently!")

# 3. Gradient testing
print("\nGradient Testing:")
print("All layers have comprehensive gradient testing using finite differences")
print("This ensures mathematical correctness of the CNN implementations")

# 4. Parameter management
print("\nParameter Management:")
print(f"Convolutional layer parameters: {len(conv_layer.parameters)}")
print(f"Pooling layer parameters: {len(pool_layer.parameters)} (should be 0)")
print("Each layer manages its own parameters with proper getter/setter methods")

print("\nThis demonstrates the power of the new modular CNN architecture!")
print("Layers are composable, testable, and mathematically verified!")

The new CNN architecture provides several key benefits:

1. Modularity and Composability: - Each layer is a self-contained unit with a consistent interface - CNN layers can be composed in any order to create complex architectures - Easy to experiment with different layer combinations

2. Independent Testing: - Each layer can be tested independently using our comprehensive gradient testing - Forward and backward passes are verified using finite differences - Mathematical correctness is ensured through numerical verification

3. Clean Separation of Concerns: - Convolution logic is separate from pooling logic - Each layer has a single responsibility - Easy to understand and debug individual components

4. Consistent Interface: - All layers implement the same forward(), backward(), and parameters interface - Works seamlessly with LayeredNeuralNetwork - Follows the same patterns as other neural network components

5. Educational Clarity: - Students can understand each component in isolation - Clear demonstration of how complex CNNs are built from simple components - Shows the power of composition over inheritance

6. CNN-Specific Benefits: - Spatial feature extraction through convolutional layers - Translation invariance through max pooling - Dimensionality reduction through flattening - Classification through fully connected layers

This modular approach makes CNN architectures much more accessible and maintainable!

Compare CNN with Traditional Neural Networks

# Compare CNN with traditional fully connected networks
print("CNN vs Traditional Neural Networks:")
print("=" * 50)

# Traditional fully connected network
fc_layers = [
    FlattenLayer(),
    FullyConnectedLayer(16 * 16, 128, activation=ReLUActivation()),
    FullyConnectedLayer(128, 64, activation=ReLUActivation()),
    LinearLayer(64, 3),
]
fc_network = LayeredNeuralNetwork(fc_layers)

# CNN network
cnn_layers = [
    ConvolutionalLayer(input_channels=1, output_channels=8, kernel_size=3, padding=1, activation=ReLUActivation()),
    MaxPoolingLayer(pool_size=2, stride=2),
    ConvolutionalLayer(input_channels=8, output_channels=16, kernel_size=3, padding=1, activation=ReLUActivation()),
    MaxPoolingLayer(pool_size=2, stride=2),
    FlattenLayer(),
    FullyConnectedLayer(16 * 4 * 4, 32, activation=ReLUActivation()),
    LinearLayer(32, 3),
]
cnn_network = LayeredNeuralNetwork(cnn_layers)

# Test input
X_test = np.random.randn(1, 1, 16, 16)

print(f"Fully Connected Network:")
print(f"  Parameters: {len(fc_network.parameters)}")
print(f"  Output shape: {fc_network.forward(X_test).shape}")

print(f"\nCNN Network:")
print(f"  Parameters: {len(cnn_network.parameters)}")
print(f"  Output shape: {cnn_network.forward(X_test).shape}")

print(f"\nKey Differences:")
print(f"- CNN has fewer parameters due to weight sharing")
print(f"- CNN preserves spatial structure through convolution")
print(f"- CNN is translation invariant through pooling")
print(f"- CNN is more efficient for image data")

CNN vs Traditional Neural Networks:

Convolutional Neural Networks (CNNs): - Weight sharing: Same filters applied across the entire image - Spatial structure: Preserves 2D relationships in images - Translation invariance: Robust to object position - Parameter efficiency: Fewer parameters than fully connected networks - Local connectivity: Each neuron connects to a local region

Traditional Fully Connected Networks: - Dense connections: Every input connects to every hidden unit - No spatial awareness: Treats pixels as independent features - Position sensitive: Different weights for each pixel position - Parameter intensive: Many more parameters required - Global connectivity: Each neuron sees all inputs

When to use CNNs: - Image classification and recognition - Computer vision tasks - Any data with spatial structure - When translation invariance is important

When to use fully connected networks: - Tabular data - Non-spatial features - When spatial relationships don’t matter - Small input dimensions

The choice depends on the nature of your data and the problem you’re solving!

Thanks!

For more information on these subjects and more you might want to check the following resources.

References