When do transformers shine in rl? decoupling memory from credit assignment T Ni, M Ma, B Eysenbach, PL Bacon Advances in Neural Information Processing Systems 36, 2024 | 29 | 2024 |
Bridging State and History Representations: Understanding Self-Predictive RL T Ni, B Eysenbach, E Seyedsalehi, M Ma, C Gehring, A Mahajan, ... arXiv preprint arXiv:2401.08898, 2024 | 16 | 2024 |
Long-Term Credit Assignment via Model-based Temporal Shortcuts M Ma, P D'Oro, Y Bengio, PL Bacon Deep RL Workshop NeurIPS 2021, 2021 | 5 | 2021 |
Counterfactual Policy Evaluation and the Conditional Monte Carlo Method M Ma, B Pierre-Luc Offline Reinforcement Learning Workshop, NeurIPS, 2020 | 1 | 2020 |
Do Transformer World Models Give Better Policy Gradients? M Ma, T Ni, C Gehring, P D'Oro, PL Bacon arXiv preprint arXiv:2402.05290, 2024 | | 2024 |
Parsimonious reasoning in reinforcement learning for better credit assignment M Ma | | 2022 |
A Differentiable Sequence Model Perspective on Policy Gradients M Ma, P D'Oro, T Ni, C Gehring, PL Bacon | | |