Perhaps I need not mention that tremendous progress has been made in artificial neural network research over the last decade. Despite this empirical fact, we are still working out the mathematical principles that underpin how neural nets learn. Given their relationship to the brain—and their burgeoning presence in modern technology—it is an exciting time to work on the foundational properties of neural networks.
In our recent paper, we propose a mathematical formula for the distance between two neural networks. In particular, we focus on two deep networks with the same architecture but different synaptic weights. Whilst this concept might seem esoteric, here are a few concrete reasons you might care about it:
In our paper, we focused on the implications for stable learning, and that is what we shall cover in this post. We leave the implications for generalisation as a fascinating avenue for further research.
Before we get down to the mathematical nitty-gritty, we shall first build a more hands-on understanding of deep learning.
If we want to study the distance between two neural networks, first we will need to choose the neural architecture.
For the purposes of this demonstration, we have restricted to networks with two inputs and one output. This lets us visualise the response of any neuron as a 2-dimensional heatmap, where the coordinates correspond to the network input, and the color corresponds to the neuron’s response.
Now please, build your network!
We want to train your network to classify a dataset of inputs as ±1. The batch size controls how many training examples we will show to your network per training step. Please choose your dataset and batch size!
Finally the time has come for your network to learn something. Choose your learning algorithm and step size, then hit play!
Hopefully your network found a good decision boundary. The spiral dataset is a little tricky, so you may need to adjust the learning settings or network architecture a couple of times.
Now, let’s find out how stable your trained network is to perturbing its synapses. For each network layer, let’s add some noise that is scaled relative to that layer. Just hit the big red button! Start by adding a relatively small amount of noise—say 1%—and then steadily increase the relative scale.
What we found—and what we hope you found too—is that the network function and decision boundary are stable for small relative perturbations, and unstable for large relative perturbations. That is the main takeaway from this section.
The work described in this post resulted from a close collaboration between myself, Arash Vahdat, Yisong Yue and Ming-Yu Liu. The visualisations were built using Tensorflow Playground by Daniel Smilkov and Shan Carter.