TalkRL: The Reinforcement Learning Podcast-logo

TalkRL: The Reinforcement Learning Podcast

Technology Podcasts

TalkRL podcast is All Reinforcement Learning, All the Time. In-depth interviews with brilliant people at the forefront of RL research and practice. Guests from places like MILA, OpenAI, MIT, DeepMind, Berkeley, Amii, Oxford, Google Research, Brown, Waymo, Caltech, and Vector Institute. Hosted by Robin Ranjit Singh Chauhan.




TalkRL podcast is All Reinforcement Learning, All the Time. In-depth interviews with brilliant people at the forefront of RL research and practice. Guests from places like MILA, OpenAI, MIT, DeepMind, Berkeley, Amii, Oxford, Google Research, Brown, Waymo, Caltech, and Vector Institute. Hosted by Robin Ranjit Singh Chauhan.






Jakob Foerster

Jakob Foerster on Multi-Agent learning, Cooperation vs Competition, Emergent Communication, Zero-shot coordination, Opponent Shaping, agents for Hanabi and Prisoner's Dilemma, and more. Jakob Foerster is an Associate Professor at University of Oxford. Featured References Learning with Opponent-Learning Awareness Jakob N. Foerster, Richard Y. Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, Igor Mordatch Model-Free Opponent Shaping Chris Lu, Timon Willi, Christian Schroeder de Witt, Jakob Foerster Off-Belief Learning Hengyuan Hu, Adam Lerer, Brandon Cui, David Wu, Luis Pineda, Noam Brown, Jakob Foerster Learning to Communicate with Deep Multi-Agent Reinforcement Learning Jakob N. Foerster, Yannis M. Assael, Nando de Freitas, Shimon Whiteson Adversarial Cheap Talk Chris Lu, Timon Willi, Alistair Letcher, Jakob Foerster Cheap Talk Discovery and Utilization in Multi-Agent Reinforcement Learning Yat Long Lo, Christian Schroeder de Witt, Samuel Sokota, Jakob Nicolaus Foerster, Shimon Whiteson Additional References Lectures by Jakob on youtube


Danijar Hafner 2

Danijar Hafner on the DreamerV3 agent and world models, the Director agent and heirarchical RL, realtime RL on robots with DayDreamer, and his framework for unsupervised agent design! Danijar Hafner is a PhD candidate at the University of Toronto with Jimmy Ba, a visiting student at UC Berkeley with Pieter Abbeel, and an intern at DeepMind. He has been our guest before back on episode 11. Featured References Mastering Diverse Domains through World Models [ blog ] DreaverV3 Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap DayDreamer: World Models for Physical Robot Learning [ blog ] Philipp Wu, Alejandro Escontrela, Danijar Hafner, Ken Goldberg, Pieter Abbeel Deep Hierarchical Planning from Pixels [ blog ] Danijar Hafner, Kuang-Huei Lee, Ian Fischer, Pieter Abbeel Action and Perception as Divergence Minimization [ blog ] Danijar Hafner, Pedro A. Ortega, Jimmy Ba, Thomas Parr, Karl Friston, Nicolas Heess Additional References Mastering Atari with Discrete World ModelsblogDream to Control: Learning Behaviors by Latent ImaginationblogPlanning to Explore via Self-Supervised World Models


Jeff Clune

AI Generating Algos, Learning to play Minecraft with Video PreTraining (VPT), Go-Explore for hard exploration, POET and Open Endedness, AI-GAs and ChatGPT, AGI predictions, and lots more! Professor Jeff Clune is Associate Professor of Computer Science at University of British Columbia, a Canada CIFAR AI Chair and Faculty Member at Vector Institute, and Senior Research Advisor at DeepMind. Featured References Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos [ Blog Post ] Bowen Baker, Ilge Akkaya, Peter Zhokhov, Joost Huizinga, Jie Tang, Adrien Ecoffet, Brandon Houghton, Raul Sampedro, Jeff Clune Robots that can adapt like animals Antoine Cully, Jeff Clune, Danesh Tarapore, Jean-Baptiste Mouret Illuminating search spaces by mapping elites Jean-Baptiste Mouret, Jeff Clune Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions Rui Wang, Joel Lehman, Aditya Rawal, Jiale Zhi, Yulun Li, Jeff Clune, Kenneth O. Stanley Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions Rui Wang, Joel Lehman, Jeff Clune, Kenneth O. Stanley First return, then explore Adrien Ecoffet, Joost Huizinga, Joel Lehman, Kenneth O. Stanley, Jeff Clune


Natasha Jaques 2

Hear about why OpenAI cites her work in RLHF and dialog models, approaches to rewards in RLHF, ChatGPT, Industry vs Academia, PsiPhi-Learning, AGI and more! Dr Natasha Jaques is a Senior Research Scientist at Google Brain. Featured References Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Gu, Rosalind Picard Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control Natasha Jaques, Shixiang Gu, Dzmitry Bahdanau, José Miguel Hernández-Lobato, Richard E. Turner, Douglas Eck PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning Angelos Filos, Clare Lyle, Yarin Gal, Sergey Levine, Natasha Jaques, Gregory Farquhar Basis for Intentions: Efficient Inverse Reinforcement Learning using Past Experience Marwa Abdulhai, Natasha Jaques, Sergey Levine Additional References Fine-Tuning Language Models from Human PreferencesLearning to summarize from human feedbackTraining language models to follow instructions with human feedback


Jacob Beck and Risto Vuorio

Jacob Beck and Risto Vuorio on their recent Survey of Meta-Reinforcement Learning. Jacob and Risto are Ph.D. students at Whiteson Research Lab at University of Oxford. Featured Reference A Survey of Meta-Reinforcement Learning Jacob Beck, Risto Vuorio, Evan Zheran Liu, Zheng Xiong, Luisa Zintgraf, Chelsea Finn, Shimon Whiteson Additional References VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-LearningMastering Diverse Domains through World ModelsUnsupervised Meta-Learning for Reinforcement LearningDecoupling Exploration and Exploitation for Meta-Reinforcement Learning without SacrificesRL2: Fast Reinforcement Learning via Slow Reinforcement LearningLearning to reinforcement learn


John Schulman

John Schulman is a cofounder of OpenAI, and currently a researcher and engineer at OpenAI. Featured References WebGPT: Browser-assisted question-answering with human feedback Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Knight, Benjamin Chess, John Schulman Training language models to follow instructions with human feedback Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe Additional References Our approach to alignment researchTraining Verifiers to Solve Math Word ProblemsUC Berkeley Deep RL Bootcamp Lecture 6: Nuts and Bolts of Deep RL ExperimentationProximal Policy Optimization AlgorithmsOptimizing Expectations: From Deep Reinforcement Learning to Stochastic Computation Graphs


Sven Mika

Sven Mika is the Reinforcement Learning Team Lead at Anyscale, and lead committer of RLlib. He holds a PhD in biomathematics, bioinformatics, and computational biology from Witten/Herdecke University. Featured References RLlib Documentation: RLlib: Industry-Grade Reinforcement Learning Ray: Documentation RLlib: Abstractions for Distributed Reinforcement Learning Eric Liang, Richard Liaw, Philipp Moritz, Robert Nishihara, Roy Fox, Ken Goldberg, Joseph E. Gonzalez, Michael I. Jordan, Ion Stoica Episode sponsor: Anyscale Ray Summit 2022 is coming to San Francisco on August 23-24. Hear how teams at Dow, Verizon, Riot Games, and more are solving their RL challenges with Ray's RLlib. Register at and use code RAYSUMMIT22RL for a further 25% off the already reduced prices.


Karol Hausman and Fei Xia

Karol Hausman is a Senior Research Scientist at Google Brain and an Adjunct Professor at Stanford working on robotics and machine learning. Karol is interested in enabling robots to acquire general-purpose skills with minimal supervision in real-world environments. Fei Xia is a Research Scientist with Google Research. Fei Xia is mostly interested in robot learning in complex and unstructured environments. Previously he has been approaching this problem by learning in realistic and scalable simulation environments (GibsonEnv, iGibson). Most recently, he has been exploring using foundation models for those challenges. Featured References Do As I Can, Not As I Say: Grounding Language in Robotic Affordances [ website ] Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Daniel Ho, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Eric Jang, Rosario Jauregui Ruano, Kyle Jeffrey, Sally Jesmonth, Nikhil J Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Kuang-Huei Lee, Sergey Levine, Yao Lu, Linda Luu, Carolina Parada, Peter Pastor, Jornell Quiambao, Kanishka Rao, Jarek Rettinghouse, Diego Reyes, Pierre Sermanet, Nicolas Sievers, Clayton Tan, Alexander Toshev, Vincent Vanhoucke, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, Mengyuan Yan Inner Monologue: Embodied Reasoning through Planning with Language Models Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, Pierre Sermanet, Noah Brown, Tomas Jackson, Linda Luu, Sergey Levine, Karol Hausman, Brian Ichter Additional References Large-scale simulation for embodied perception and robot learningQT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic ManipulationMT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at ScaleReLMoGen: Leveraging Motion Generation in Reinforcement Learning for Mobile ManipulationActionable Models: Unsupervised Offline Reinforcement Learning of Robotic SkillsSocratic Models: Composing Zero-Shot Multimodal Reasoning with Language Episode sponsor: Anyscale Ray Summit 2022 is coming to San Francisco on August 23-24. Hear how teams at Dow, Verizon, Riot Games, and more are solving their RL challenges with Ray's RLlib. Register at and use code RAYSUMMIT22RL for a further 25% off the already reduced prices.


Sai Krishna Gottipati

Saikrishna Gottipati is an RL Researcher at AI Redefined, working on RL, MARL, human in the loop learning. Featured References Cogment: Open Source Framework For Distributed Multi-actor Training, Deployment & Operations AI Redefined, Sai Krishna Gottipati, Sagar Kurandwad, Clodéric Mars, Gregory Szriftgiser, François Chabot Do As You Teach: A Multi-Teacher Approach to Self-Play in Deep Reinforcement Learning Currently under review Learning to navigate the synthetically accessible chemical space using reinforcement learning Sai Krishna Gottipati, Boris Sattarov, Sufeng Niu, Yashaswi Pathak, Haoran Wei, Shengchao Liu, Karam J. Thomas, Simon Blackburn, Connor W. Coley, Jian Tang, Sarath Chandar, Yoshua Bengio Additional References Asymmetric self-play for automatic goal discovery in robotic manipulationContinuous Coordination As a Realistic Scenario for Lifelong LearningEpisode sponsor: Anyscale Ray Summit 2022 is coming to San Francisco on August 23-24. Hear how teams at Dow, Verizon, Riot Games, and more are solving their RL challenges with Ray's RLlib. Register at and use code RAYSUMMIT22RL for a further 25% off the already reduced prices.


Aravind Srinivas 2

Aravind Srinivas is back! He is now a research Scientist at OpenAI. Featured References Decision Transformer: Reinforcement Learning via Sequence Modeling Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch VideoGPT: Video Generation using VQ-VAE and Transformers Wilson Yan, Yunzhi Zhang, Pieter Abbeel, Aravind Srinivas


Rohin Shah

Dr. Rohin Shah is a Research Scientist at DeepMind, and the editor and main contributor of the Alignment Newsletter. Featured References The MineRL BASALT Competition on Learning from Human Feedback Rohin Shah, Cody Wild, Steven H. Wang, Neel Alex, Brandon Houghton, William Guss, Sharada Mohanty, Anssi Kanervisto, Stephanie Milani, Nicholay Topin, Pieter Abbeel, Stuart Russell, Anca Dragan Preferences Implicit in the State of the World Rohin Shah, Dmitrii Krasheninnikov, Jordan Alexander, Pieter Abbeel, Anca Dragan Benefits of Assistance over Reward Learning Rohin Shah, Pedro Freire, Neel Alex, Rachel Freedman, Dmitrii Krasheninnikov, Lawrence Chan, Michael D Dennis, Pieter Abbeel, Anca Dragan, Stuart Russell On the Utility of Learning about Humans for Human-AI Coordination Micah Carroll, Rohin Shah, Mark K. Ho, Thomas L. Griffiths, Sanjit A. Seshia, Pieter Abbeel, Anca Dragan Evaluating the Robustness of Collaborative Agents Paul Knott, Micah Carroll, Sam Devlin, Kamil Ciosek, Katja Hofmann, A. D. Dragan, Rohin Shah Additional References AGI Safety Fundamentals


Jordan Terry

Jordan Terry is a PhD candidate at University of Maryland, the maintainer of Gym, the maintainer and creator of PettingZoo and the founder of Swarm Labs. Featured References PettingZoo: Gym for Multi-Agent Reinforcement Learning J. K. Terry, Benjamin Black, Nathaniel Grammel, Mario Jayakumar, Ananth Hari, Ryan Sullivan, Luis Santos, Rodrigo Perez, Caroline Horsch, Clemens Dieffendahl, Niall L. Williams, Yashas Lokesh, Praveen Ravi PettingZoo on Github gym on Github Additional References Time Limits in Reinforcement LearningDeep Reinforcement Learning at the Edge of the Statistical Precipice


Robert Lange

Robert Tjarko Lange is a PhD student working at the Technical University Berlin. Featured References Learning not to learn: Nature versus nurture in silico Lange, R. T., & Sprekeler, H. (2020) On Lottery Tickets and Minimal Task Representations in Deep Reinforcement Learning Vischer, M. A., Lange, R. T., & Sprekeler, H. (2021). Semantic RL with Action Grammars: Data-Efficient Learning of Hierarchical Task Abstractions Lange, R. T., & Faisal, A. (2019). MLE-Infrastructure on Github Additional References RL^2: Fast Reinforcement Learning via Slow Reinforcement LearningLearning to reinforcement learnDecision Transformer: Reinforcement Learning via Sequence Modeling


NeurIPS 2021 Political Economy of Reinforcement Learning Systems (PERLS) Workshop

We hear about the idea of PERLS and why its important to talk about. Political Economy of Reinforcement Learning (PERLS) Workshop at NeurIPS 2021 NeurIPS 2021


Amy Zhang

Amy Zhang is a postdoctoral scholar at UC Berkeley and a research scientist at Facebook AI Research. She will be starting as an assistant professor at UT Austin in Spring 2023. Featured References Invariant Causal Prediction for Block MDPs Amy Zhang, Clare Lyle, Shagun Sodhani, Angelos Filos, Marta Kwiatkowska, Joelle Pineau, Yarin Gal, Doina Precup Multi-Task Reinforcement Learning with Context-based Representations Shagun Sodhani, Amy Zhang, Joelle Pineau MBRL-Lib: A Modular Library for Model-based Reinforcement Learning Luis Pineda, Brandon Amos, Amy Zhang, Nathan O. Lambert, Roberto Calandra Additional References Amy Zhang - Exploring Context for Better Generalization in Reinforcement Learning @ UCL DARKICML 2020 Poster session: Invariant Causal Prediction for Block MDPsClare Lyle - Invariant Prediction for Generalization in Reinforcement Learning @ Simons Institute


Xianyuan Zhan

Xianyuan Zhan is currently a research assistant professor at the Institute for AI Industry Research (AIR), Tsinghua University. He received his Ph.D. degree at Purdue University. Before joining Tsinghua University, Dr. Zhan worked as a researcher at Microsoft Research Asia (MSRA) and a data scientist at JD Technology. At JD Technology, he led the research that uses offline RL to optimize real-world industrial systems. Featured References DeepThermal: Combustion Optimization for Thermal Power Generating Units Using Offline Reinforcement Learning Xianyuan Zhan, Haoran Xu, Yue Zhang, Yusen Huo, Xiangyu Zhu, Honglei Yin, Yu Zheng


Eugene Vinitsky

Eugene Vinitsky is a PhD student at UC Berkeley advised by Alexandre Bayen. He has interned at Tesla and Deepmind. Featured References A learning agent that acquires social norms from public sanctions in decentralized multi-agent settings Eugene Vinitsky, Raphael Köster, John P. Agapiou, Edgar Duéñez-Guzmán, Alexander Sasha Vezhnevets, Joel Z. Leibo Optimizing Mixed Autonomy Traffic Flow With Decentralized Autonomous Vehicles and Multi-Agent RL Eugene Vinitsky, Nathan Lichtle, Kanaad Parvate, Alexandre Bayen Lagrangian Control through Deep-RL: Applications to Bottleneck Decongestion Eugene Vinitsky; Kanaad Parvate; Aboudy Kreidieh; Cathy Wu; Alexandre Bayen 2018 The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games Chao Yu, Akash Velu, Eugene Vinitsky, Yu Wang, Alexandre Bayen, Yi Wu Additional References SUMO: Simulation of Urban MObility


Jess Whittlestone

Dr. Jess Whittlestone is a Senior Research Fellow at the Centre for the Study of Existential Risk and the Leverhulme Centre for the Future of Intelligence, both at the University of Cambridge. Featured References The Societal Implications of Deep Reinforcement Learning Jess Whittlestone, Kai Arulkumaran, Matthew Crosby Artificial Canaries: Early Warning Signs for Anticipatory and Democratic Governance of AI Carla Zoe Cremer, Jess Whittlestone Additional References CogX: Cutting Edge: Understanding AI systems for a better AI policy


Aleksandra Faust

Dr Aleksandra Faust is a Staff Research Scientist and Reinforcement Learning research team co-founder at Google Brain Research. Featured References Reinforcement Learning and Planning for Preference Balancing Tasks Faust 2014 Learning Navigation Behaviors End-to-End with AutoRL Hao-Tien Lewis Chiang, Aleksandra Faust, Marek Fiser, Anthony Francis Evolving Rewards to Automate Reinforcement Learning Aleksandra Faust, Anthony Francis, Dar Mehta Evolving Reinforcement Learning Algorithms John D Co-Reyes, Yingjie Miao, Daiyi Peng, Esteban Real, Quoc V Le, Sergey Levine, Honglak Lee, Aleksandra Faust Adversarial Environment Generation for Learning to Navigate the Web Izzeddin Gur, Natasha Jaques, Kevin Malta, Manoj Tiwari, Honglak Lee, Aleksandra Faust Additional References AutoML-Zero: Evolving Machine Learning Algorithms From Scratch


Sam Ritter

Sam Ritter is a Research Scientist on the neuroscience team at DeepMind. Featured References Unsupervised Predictive Memory in a Goal-Directed Agent (MERLIN) Greg Wayne, Chia-Chun Hung, David Amos, Mehdi Mirza, Arun Ahuja, Agnieszka Grabska-Barwinska, Jack Rae, Piotr Mirowski, Joel Z. Leibo, Adam Santoro, Mevlana Gemici, Malcolm Reynolds, Tim Harley, Josh Abramson, Shakir Mohamed, Danilo Rezende, David Saxton, Adam Cain, Chloe Hillier, David Silver, Koray Kavukcuoglu, Matt Botvinick, Demis Hassabis, Timothy Lillicrap Meta-RL without forgetting: Been There, Done That: Meta-Learning with Episodic Recall Samuel Ritter, Jane X. Wang, Zeb Kurth-Nelson, Siddhant M. Jayakumar, Charles Blundell, Razvan Pascanu, Matthew Botvinick Meta-Reinforcement Learning with Episodic Recall: An Integrative Theory of Reward-Driven Learning Samuel Ritter 2019 Meta-RL exploration and planning: Rapid Task-Solving in Novel Environments Sam Ritter, Ryan Faulkner, Laurent Sartran, Adam Santoro, Matt Botvinick, David Raposo Synthetic Returns for Long-Term Credit Assignment David Raposo, Sam Ritter, Adam Santoro, Greg Wayne, Theophane Weber, Matt Botvinick, Hado van Hasselt, Francis Song Additional References Sam Ritter: Meta-Learning to Make Smart Inferences from Small DataThe Bitter Lesson