Perhaps I need not mention that tremendous progress has been made in artificial neural network research over the last decade. Despite this empirical fact, we are still working out the mathematical principles that underpin how neural nets learn. Given their relationship to the brain—and their burgeoning presence in modern technology—it is an exciting time to work on the foundational properties of neural networks.

In our recent paper, we propose a mathematical formula for the *distance between two neural networks*. In particular, we focus on two deep networks with the same architecture but different synaptic weights. Whilst this concept might seem esoteric, here are a few concrete reasons you might care about it:

- you want to know “how far” you can perturb the synapses of a network before destroying its properties. Knowing this is important for stable learning.
- you want to make sure that the network does not “move too far” in response to any one training example. Knowing this is important for the generalisation theory of Bousquet and Elisseeff.
- you want to measure the “width” of the minimum that learning settles into. Some machine learning researchers believe this concept is important for understanding generalisation.

In our paper, we focused on the implications for stable learning, and that is what we shall cover in this post. We leave the implications for generalisation as a fascinating avenue for further research.

Before we get down to the mathematical nitty-gritty, we shall first build a more hands-on understanding of deep learning.

If we want to study the distance between two neural networks, first we will need to choose the neural architecture.

For the purposes of this demonstration, we have restricted to networks with two inputs and one output. This lets us visualise the response of any neuron as a 2-dimensional heatmap, where the coordinates correspond to the network input, and the color corresponds to the neuron’s response.

Now please, build your network!

Click anywhere to edit.

Weight/Bias is 0.2.

We want to train your network to classify a dataset of inputs as ±1. The batch size controls how many training examples we will show to your network per training step. Please choose your dataset and batch size!

Finally the time has come for your network to learn something. Choose your learning algorithm and step size, then hit play!

Epoch

Hopefully your network found a good decision boundary. The spiral dataset is a little tricky, so you may need to adjust the learning settings or network architecture a couple of times.

Now, let’s find out how stable your trained network is to perturbing its synapses. For each network layer, let’s add some noise that is *scaled relative to that layer*. Just hit the big red button! Start by adding a relatively small amount of noise—say 1%—and then steadily increase the relative scale.

What we found—and what we hope you found too—is that the network function and decision boundary are stable for small relative perturbations, and unstable for large relative perturbations. That is the main takeaway from this section.

Coming soon!

Coming soon!

Coming soon!

The work described in this post resulted from a close collaboration between myself, Arash Vahdat, Yisong Yue and Ming-Yu Liu. The visualisations were built using Tensorflow Playground by Daniel Smilkov and Shan Carter.