The papers listed below constitute the bulk of my PhD work. Check out my Google Scholar page to see more of my collaborations.

2022

Kernel interpolation as a Bayes point machine
Jeremy Bernstein, Alex Farhang & Yisong Yue
[arXiv] [code] [discuss] [cite] preprint
@article{bpm, title={Kernel interpolation as a {B}ayes point machine}, author={Jeremy Bernstein and Alex Farhang and Yisong Yue}, journal={arXiv:2110.04274}, year={2022}}
Based on the idea that a single learner can approximate the majority decision of an ensemble, we derive PAC-Bayes risk bounds for kernel interpolation and further advance the understanding of generalisation in finite width neural networks.

2021

Computing the information content of trained neural networks
Jeremy Bernstein & Yisong Yue
[arXiv] [video] [code] [discuss] [cite] TOPML '21 contributed talk
@inproceedings{entropix, title={Computing the Information Content of Trained Neural Networks}, author={Jeremy Bernstein and Yisong Yue}, booktitle={Workshop on the Theory of Overparameterized Machine Learning}, year={2021}}
We derive a consistent estimator and a closed-form upper bound on the information content of an infinitely wide neural network. This yields a non-vacuous generalisation guarantee for networks with infinitely more parameters than data.

Learning by turning: neural architecture aware optimisation
Yang Liu*, Jeremy Bernstein*, Markus Meister & Yisong Yue
[arXiv] [poster] [video] [code] [discuss] [cite] ICML '21
@inproceedings{nero, title = {Learning by turning: neural architecture aware optimisation}, author = {Yang Liu and Jeremy Bernstein and Markus Meister and Yisong Yue}, booktitle = {International Conference on Machine Learning}, year = {2021}}
We propose training neural networks under hyperspherical constraints via per-neuron relative updates. The resulting Nero optimiser has strong out-of-the-box performance, with possible implications for generalisation theory.

2020

Learning compositional functions via multiplicative weight updates
Jeremy Bernstein, Jiawei Zhao, Markus Meister, Ming-Yu Liu, Anima Anandkumar & Yisong Yue
[arXiv] [poster] [video] [code] [discuss] [cite] NeurIPS '20
@inproceedings{madam, title={Learning compositional functions via multiplicative weight updates}, author={Jeremy Bernstein and Jiawei Zhao and Markus Meister and Ming-Yu Liu and Anima Anandkumar and Yisong Yue}, booktitle = {Neural Information Processing Systems}, year={2020}}
We show that a multiplicative rather than additive weight update respects the relative trust region of neural networks, matches recent findings about biological synapses, and provides the engineering benefit of low bit width weights.

On the distance between two neural networks and the stability of learning
Jeremy Bernstein, Arash Vahdat, Yisong Yue & Ming-Yu Liu
[arXiv] [poster] [video] [blog] [code] [discuss] [cite] NeurIPS '20
@inproceedings{fromage, title={On the distance between two neural networks and the stability of learning}, author={Jeremy Bernstein and Arash Vahdat and Yisong Yue and Ming-Yu Liu}, booktitle = {Neural Information Processing Systems}, year={2020}}
We formalise the idea that the neural network trust region is relative rather than absolute. This leads us to derive and analyse a measure of relative distance between compositional functions, from whence we obtain the Fromage optimiser.

2019

signSGD with majority vote is communication efficient and fault tolerant
Jeremy Bernstein, Jiawei Zhao, Kamyar Azizzadenesheli & Anima Anandkumar
[arXiv] [poster] [code] [discuss] [cite] ICLR '19
@inproceedings{majority, title = {sign{SGD} with majority vote is communication efficient and fault tolerant}, author = {Bernstein, Jeremy and Wang, Yu-Xiang and Azizzadenesheli, Kamyar and Anandkumar, Animashree}, booktitle = {International Conference on Learning Representations}, year = {2019}}
We show that when the parameter server aggregates gradient signs by majority vote, the resulting distributed optimisation scheme is both communication efficient and robust to potential network and machine errors.

2018

signSGD: compressed optimisation for non-convex problems
Jeremy Bernstein, Yu-Xiang Wang, Kamyar Azizzadenesheli & Anima Anandkumar
[arXiv] [poster] [slides] [video] [code] [discuss] [cite] ICML '18 long talk
@inproceedings{signum, title = {sign{SGD}: compressed optimisation for non-convex problems}, author = {Bernstein, Jeremy and Wang, Yu-Xiang and Azizzadenesheli, Kamyar and Anandkumar, Animashree}, booktitle = {International Conference on Machine Learning}, year = {2018}}
We analyse a sign-based scheme for stochastic optimisation, presenting theoretical conditions sufficient for convergence and experiments that demonstrate promising empirical performance for training deep neural networks.

* denotes equal contribution.