Scaling language models: Methods, analysis & insights from training gopher JW Rae, S Borgeaud, T Cai, K Millican, J Hoffmann, F Song, J Aslanides, ... arXiv preprint arXiv:2112.11446, 2021 | 569 | 2021 |
Compressive transformers for long-range sequence modelling JW Rae, A Potapenko, SM Jayakumar, TP Lillicrap arXiv preprint arXiv:1911.05507, 2019 | 377 | 2019 |
Stabilizing transformers for reinforcement learning E Parisotto, F Song, J Rae, R Pascanu, C Gulcehre, S Jayakumar, ... International conference on machine learning, 7487-7498, 2020 | 295 | 2020 |
Adapting auxiliary losses using gradient similarity Y Du, WM Czarnecki, SM Jayakumar, M Farajtabar, R Pascanu, ... arXiv preprint arXiv:1812.02224, 2018 | 135 | 2018 |
Multiplicative interactions and where to find them SM Jayakumar, WM Czarnecki, J Menick, J Schwarz, J Rae, S Osindero, ... International conference on learning representations, 2019 | 112 | 2019 |
Distilling policy distillation WM Czarnecki, R Pascanu, S Osindero, S Jayakumar, G Swirszcz, ... The 22nd international conference on artificial intelligence and statistics …, 2019 | 107 | 2019 |
Memory-based parameter adaptation P Sprechmann, SM Jayakumar, JW Rae, A Pritzel, AP Badia, B Uria, ... arXiv preprint arXiv:1802.10542, 2018 | 96 | 2018 |
Information asymmetry in KL-regularized RL A Galashov, SM Jayakumar, L Hasenclever, D Tirumala, J Schwarz, ... arXiv preprint arXiv:1905.01240, 2019 | 93 | 2019 |
Been there, done that: Meta-learning with episodic recall S Ritter, J Wang, Z Kurth-Nelson, S Jayakumar, C Blundell, R Pascanu, ... International conference on machine learning, 4354-4363, 2018 | 89 | 2018 |
Mix & match agent curricula for reinforcement learning W Czarnecki, S Jayakumar, M Jaderberg, L Hasenclever, YW Teh, ... International Conference on Machine Learning, 1087-1095, 2018 | 79 | 2018 |
Meta-learning of sequential strategies PA Ortega, JX Wang, M Rowland, T Genewein, Z Kurth-Nelson, ... arXiv preprint arXiv:1905.03030, 2019 | 73 | 2019 |
Top-kast: Top-k always sparse training S Jayakumar, R Pascanu, J Rae, S Osindero, E Elsen Advances in Neural Information Processing Systems 33, 20744-20754, 2020 | 70 | 2020 |
Cyprien de Masson d’Autume JW Rae, S Borgeaud, T Cai, K Millican, J Hoffmann, F Song, J Aslanides, ... | 64 | 2021 |
Powerpropagation: A sparsity inducing weight reparameterisation J Schwarz, S Jayakumar, R Pascanu, PE Latham, Y Teh Advances in neural information processing systems 34, 28889-28903, 2021 | 37 | 2021 |
Cyprien de Masson d’Autume, Yujia Li, Tayfun Terzi, Vladimir Mikulik, Igor Babuschkin, Aidan Clark, Diego de Las Casas, Aurelia Guy, Chris Jones, James Bradbury, Matthew G JW Rae, S Borgeaud, T Cai, K Millican, J Hoffmann, F Song, J Aslanides, ... Johnson, Blake A. Hechtman, Laura Weidinger, Iason Gabriel, William S. Isaac …, 2021 | 29 | 2021 |
Low-pass recurrent neural networks-a memory architecture for longer-term correlation discovery T Stepleton, R Pascanu, W Dabney, SM Jayakumar, H Soyer, R Munos arXiv preprint arXiv:1805.04955, 2018 | 4 | 2018 |
Perception-prediction-reaction agents for deep reinforcement learning A Stooke, V Dalibard, SM Jayakumar, WM Czarnecki, M Jaderberg arXiv preprint arXiv:2006.15223, 2020 | 2 | 2020 |
Gated attention neural networks E Parisotto, H Song, JW Rae, SM Jayakumar, ME Jaderberg, R Pascanu, ... US Patent App. 17/763,984, 2022 | 1 | 2022 |
Selecting actions by reverting to previous learned action selection policies S Ritter, XJ Wang, S Jayakumar, R Pascanu, C Blundell, M Botvinick US Patent 11,423,300, 2022 | 1 | 2022 |
Machine learning systems with memory based parameter adaptation for learning fast and slower P Sprechmann, S Jayakumar, JW Rae, A Pritzel, AP Badia, O Vinyals, ... US Patent App. 16/759,561, 2020 | 1 | 2020 |