SNSF-funded research ETHZ-funded research

I'm a principal investigator at the Department of Computer Science of ETH Zurich hosted at Angelika Steger's group. My research interests lie at the intersection of neuroscience and machine learning. My goal is to develop better neuroscience-inspired machine learning algorithms, and in turn to use insights gained from designing these to understand learning in the brain.

My research is supported by a SNSF Ambizione Fellowship and an ETH Zurich Research Grant.

Previously, from 2015 to 2018, I was a postdoc with Walter Senn at the University of Bern. Together with Rui P. Costa and Yoshua Bengio we developed a model for error backpropagation in the cortex.

I received my PhD in computer science from IST (University of Lisbon, 2014) where I studied neural network models of memory with Andreas Wichert. Still at IST, in 2015, I was awarded a short research fellowship to work with Francisco C. Santos. During this period I studied energy-efficient synaptic plasticity rules with Mark van Rossum.

Current students and collaborators

Simon Schug — PhD student

Nicolas Zucchet — PhD student

Johannes von Oswald — PhD student (co-supervised with Angelika Steger)

Alexander Meulemans — Collaborator at the Computer Science Department of ETH Zürich

Seijin Kobayashi — Collaborator at the Computer Science Department of ETH Zürich

Robert Meier — Collaborator at the Computer Science Department of ETH Zürich

Angelika Steger — Collaborator at the Computer Science Department of ETH Zürich

Maciej Wołczyk — Visiting student

Maximilian Schlegel — Bachelor's student (co-supervised with Johannes von Oswald)

Qianqian Feng — Master's student (co-supervised with Nicolas Zucchet)

Alumni

Anja Šurina — Master's student (co-supervised with Shih-Chii Liu), now PhD student at EPFL and with Yoshua Bengio at Mila

Alexandra Proca — Research assistant, now PhD student at Imperial College London

Dominic Zhao — Bachelor's and exchange student, now at Common Sense Machines

Scheduled and recent talks

IST & Unbabel seminar (November 17): I'll present our recent mesa-optimization in Transformers paper at the IST & Unbabel seminar series.

Google DeepMind seminar (November 9): I presented our recent results on in-context learning in transformers and recurrent neural networks at Google DeepMind in London, UK.

Gatsby seminar (November 8): I presented our recent results on in-context learning in transformers and recurrent neural networks at the Gatsby Computational Neuroscience Unit in London, UK.

Bernstein Conference workshop talk (September 27): Talk at the Biologically plausible learning in artificial neural networks workshop at the upcoming Bernstein Conference in Berlin, Germany.

CoLLAs 2023 keynote (August 24): Keynote talk at the 2nd Conference on Lifelong Learning Agents in Montreal, Canada. Show abstract.

Title: Gradient-based optimization emerges in trained neural networks

Abstract: The algorithms implemented by task-optimized neural networks are usually unknown to their designers. In my talk I will attempt to reverse engineer transformers trained to solve small-scale few-shot learning and sequential prediction tasks. It turns out that these trained neural networks often approach their tasks by constructing appropriate objective functions and then optimizing them using gradient-based methods within their forward dynamics, not unlike what a human machine learning practitioner would do. I will then extend these analyses to recurrent neural networks by establishing an equivalence result between the two model classes. I will discuss how our findings might help understand in-context learning in language models and finish with some speculations about learning in the cortex.

News

ETH News interview on in-context learning: Johannes was interviewed for ETH News to discuss our work on in-context learning in Transformers.

Colloquium on Cortical Computation 2023: I presented our new work on online recurrent neural network learning at a colloquium organized to celebrate Walter Senn's 60th birthday in Bern, Switzerland.

Cambridge talk by Alexander: Alexander presented our least-control principle for learning to Timothy O'Leary's group at the University of Cambridge.

Coverage on Scientific American: Our study of in-context learning done in collaboration with Google Research has been highlighted on a Scientific American article.

National TV: I spoke about links between neuroscience and artificial intelligence research at the 72nd episode of Sociedade Civil, a portuguese television show.

BrainGain 2023: I gave a general-audience talk explaining our research to Portuguese students interested in neuroscience and AI.

COSYNE 2023: We presented two posters at the main meeting and gave two talks at the Top-down interactions in the neocortex: Structure, function, plasticity and models workshop.

Swiss Computational Neuroscience Retreat 2023: Nicolas presented our least-control principle for learning at the 2nd edition of the Swiss Computational Neuroscience Retreat in Crans Montana.

Nicolas's talk for the Brain & AI group at Meta AI: Nicolas gave a talk presenting our least-control principle for learning to Jean-Rémi King's Brain & AI group at Meta AI on November 16.

Mathematics, Physics & Machine Learning seminar talk: I gave an IST Mathematics, Physics & Machine Learning seminar talk on November 10.

Panel discussion on lifelong learning: I participated on a panel discussion on Lifelong Learning Machines at NeurIPS 2022.

Visit to Mila: Simon, Alexander, Nicolas and I visited Blake Richards's lab at Mila.

Doctoral symposium at EPIA: Together with Fernando P. Santos and Henrique Lopes Cardoso I organized a doctoral symposium at EPIA, the Portuguese conference on artificial intelligence which will be held in Lisbon.

DeepMind talk by Johannes: Johannes gave a talk at DeepMind, London presenting our models and algorithms for continual learning and meta-learning.

MLSSN 2022 lecture (video on YouTube): I gave a lecture with Alexander, Simon and Nicolas for the MLSSN 2022 summer school in Krakow, Poland where we discussed bilevel optimization problems involving neural networks. We covered how to solve them with recurrent backpropagation, equilibrium propagation, and some of our own work on learning and meta-learning without error backpropagation. The lecture is now on YouTube.

Oxford seminar talk: I gave an Oxford NeuroTheory Forum seminar presenting our work on biologically-plausible meta-learning.

NAISys 2022 poster: Alexander presented ongoing work on our new principle for learning at the NAISys 2022 conference in Cold Spring Harbor, NY.

Swiss Computational Neuroscience Retreat 2022: Nicolas presented our work on biologically-plausible meta-learning at the Swiss Computational Neuroscience Retreat in Crans Montana.

Show older news.

Recent papers

Uncovering mesa-optimization algorithms in Transformers Johannes von Oswald*, Eyvind Niklasson*, Maximilian Schlegel*, Seijin Kobayashi, Nicolas Zucchet, Nino Scherrer, Nolan Miller, Mark Sandler, Blaise Agüera y Arcas, Max Vladymyrov, Razvan Pascanu, João Sacramento (2023). Uncovering mesa-optimization algorithms in Transformers.
Preprint: arXiv:2309.05858
[ preprint ]
* — equal contributions

Gated recurrent neural networks discover attention Nicolas Zucchet*, Seijin Kobayashi*, Yassir Akram*, Johannes von Oswald, Maxime Larcher, Angelika Steger, João Sacramento (2023). Gated recurrent neural networks discover attention.
Preprint: arXiv:2309.01775
[ preprint ]
* — equal contributions
† — shared senior authorship

Online learning of long-range dependencies Nicolas Zucchet*, Robert Meier*, Simon Schug*, Asier Mujika, João Sacramento (2023). Online learning of long-range dependencies.
NeurIPS 2023
[ preprint ]
* — equal contributions

 

A neuronal least-action principle for real-time learning in cortical circuits Walter Senn*, Dominik Dold*, Akos Kungl, Benjamin Ellenberger, Jakob Jordan, Yoshua Bengio, João Sacramento, Mihai A. Petrovici* (2023). A neuronal least-action principle for real-time learning in cortical circuits.
eLife
[ preprint ]
* — equal contributions

 

Transformers learn in-context by gradient descent Johannes von Oswald, Eyvind Niklasson, Ettore Randazzo, João Sacramento, Alexander Mordvintsev, Andrey Zhmoginov, Max Vladymyrov (2022). Transformers learn in-context by gradient descent.
ICML 2023 (Oral)
[ paper ]

 

The least-control principle for local learning at equilibrium Alexander Meulemans*, Nicolas Zucchet*, Seijin Kobayashi*, Johannes von Oswald, João Sacramento (2022). The least-control principle for local learning at equilibrium.
NeurIPS 2022 (Oral)
[ paper ]
* — equal contributions

 

A contrastive rule for meta-learning Nicolas Zucchet*, Simon Schug*, Johannes von Oswald*, Dominic Zhao, João Sacramento (2021). A contrastive rule for meta-learning.
NeurIPS 2022
[ paper ]
* — equal contributions

 

Beyond backpropagation: bilevel optimization through implicit differentiation and equilibrium propagation Nicolas Zucchet, João Sacramento (2022). Beyond backpropagation: bilevel optimization through implicit differentiation and equilibrium propagation.
Neural Computation
[ link to journal | paper pdf ]

 

AMinimizing control for credit assignment with strong feedback Alexander Meulemans*, Matilde T. Farinha*, Maria R. Cervera*, João Sacramento, Benjamin F. Grewe (2022). Minimizing control for credit assignment with strong feedback.
ICML 2022 (Spotlight)
[ paper ]
* — equal contributions

 

Learning where to learn: Gradient sparsity in meta and continual learning Johannes von Oswald*, Dominic Zhao*, Seijin Kobayashi, Simon Schug, Massimo Caccia, Nicolas Zucchet, João Sacramento (2021). Learning where to learn: Gradient sparsity in meta and continual learning.
NeurIPS 2021
[ paper ]
* — equal contributions

 

Credit assignment in neural networks through deep feedback control Alexander Meulemans*, Matilde T. Farinha*, Javier G. Ordóñez, Pau V. Aceituno, João Sacramento, Benjamin F. Grewe (2021). Credit assignment in neural networks through deep feedback control.
NeurIPS 2021 (Spotlight)
[ paper ]
* — equal contributions

 

Posterior meta-replay for continual learning Christian Henning*, Maria R. Cervera*, Francesco D'Angelo, Johannes von Oswald, Regina Traber, Benjamin Ehret, Seijin Kobayashi, Benjamin F. Grewe, João Sacramento (2021). Posterior meta-replay for continual learning.
NeurIPS 2021
[ paper ]
* — equal contributions

 

Learning Bayes-optimal dendritic opinion pooling Jakob Jordan, João Sacramento, Willem A. M. Wybo, Mihai A. Petrovici*, Walter Senn* (2021). Learning Bayes-optimal dendritic opinion pooling.
Preprint: arXiv:2104.13238
[ preprint ]
* — equal contributions

 

Neural networks with late-phase weights Johannes von Oswald*, Seijin Kobayashi*, Alexander Meulemans, Christian Henning, Benjamin F. Grewe, João Sacramento (2020). Neural networks with late-phase weights.
ICLR 2021
[ paper | code ]
* — equal contributions

 

Meta-learning via hypernetworks Dominic Zhao, Seijin Kobayashi, João Sacramento*, Johannes von Oswald* (2020). Meta-learning via hypernetworks.
NeurIPS Workshop on Meta-Learning 2020
[ paper ]
* — equal contributions

 

A theoretical framework for target propagation Alexander Meulemans, Francesco S. Carzaniga, Johan A. K. Suykens, João Sacramento, Benjamin F. Grewe (2020). A theoretical framework for target propagation.
NeurIPS 2020 (Spotlight)
[ paper | code ]

 

Continual learning with hypernetworks Johannes von Oswald*, Christian Henning*, Benjamin F. Grewe, João Sacramento (2019). Continual learning with hypernetworks.
ICLR 2020 (Spotlight)
[ paper | talk video | code ]
* — equal contributions

 

A deep learning framework for neuroscience Blake Richards*, Timothy P. Lillicrap*, ..., João Sacramento, ..., Denis Therien*, Konrad P. Körding* (2019). A deep learning framework for neuroscience.
Nature Neuroscience
[ link to journal ]
* — equal contributions

 

Cortical backprop Milton Llera, João Sacramento, Rui P. Costa (2019). Computational roles of plastic probabilistic synapses.
Current Opinion in Neurobiology
[ link to journal ]

 

Cortical backprop João Sacramento, Rui P. Costa, Yoshua Bengio, Walter Senn (2018). Dendritic cortical microcircuits approximate the backpropagation algorithm.
NeurIPS 2018 (Oral)
[ paper | talk video ]

If my articles are behind a paywall you can't get through please send me an e-mail.

For a full list of publications see my Google Scholar profile.

Teaching

From 2019 to 2021, I was a guest lecturer for the Learning in Deep Artificial and Biological Neuronal Networks course offered at ETH Zürich.

Before that I served as a teaching assistant at the Department of Computer Science and Engineering of IST, where I lectured practical classes on computer programming and basic algorithms.