Learning by Stateful Reflection and its application for ai4sci

主讲人：Jun Wang

时间：2026-05-12 16:00 - 18:00

地点：徐汇校区人工智能学院503会议室

主持人：张娅

报名时间：暂未开放报名

未开放报名

会议介绍

Abstract:

We study continual experiential learning in Large Language Model (LLM) agents that integrate episodic memory with reinforcement learning. The central mechanism is reflection, where an agent uses past experiences to guide future decisions without modifying model parameters. Building on ideas from case-based reasoning and the Memento framework (Memento-1 and, Memento-2), we model learning as a memory-driven process in which agents store trajectories, cases, and reusable skills and retrieve them to improve decision making in new situations. To formalise this idea, we introduce the Stateful Reflective Decision Process (SRDP), where an agent maintains evolving memory and performs two operations: write, storing outcomes of interactions (policy evaluation), and read, retrieving relevant experiences to guide actions (policy improvement). We show how this read–write reflective learning can be integrated with reinforcement learning through retrieval-augmented policy iteration and prove that, as memory grows and increasingly covers the state space, the resulting policy converges to the optimal solution. This framework provides a principled foundation for memory-based LLM agents capable of continual adaptation during deployment. We will present our recent practical Memento-Skills agent system that is naturally integrated with existing industry scale LLM application.

Bio:

Jun Wang is Professor at the Computer Science department, University College London. Prof. Jun Wang is a leading expert in AI, Machine Learning, and Multiagent Systems, with over 200 publications. His research has earned eight Best Paper awards, including SIGIR Test of Time and Honourable Mentions, and has led to widely adopted algorithms used by Ray and CERN for particle discovery. He won the first global real-time bidding contest (2013) and NeurIPS 2020 black-box optimisation challenge, with solutions now deployed in industry. His patents with BT enhance personalisation in recommender systems by dynamically adjusting training data. As co-founder and Chief Scientist of UCL spinout MediaGamma (2013–2020), he led the development of AI-driven audience decision tools, helping the company secure £5.8M in funding before its acquisition in 2020.