Strategist: Self-improvement of LLM Decision Making via Bi-Level Tree Search

11citations

arXiv:2408.10635 Project

citations

#1413

in ICLR 2025

of 3827 papers

Top Authors

Data Points

Top Authors

Jonathan Light Min Cai Weiqin Chen Guanzhi Wang Xiusi Chen Wei Cheng Yisong Yue Ziniu Hu

Abstract

Traditional reinforcement learning and planning require a lot of data and training to develop effective strategies. On the other hand, large language models (LLMs) can generalize well and perform tasks without prior training but struggle with complex planning and decision-making. We introduceSTRATEGIST, a new approach that combines the strengths of both methods. It uses LLMs to generate and update high-level strategies in text form, while a Monte Carlo Tree Search (MCTS) algorithm refines and executes them. STRATEGIST is a general framework that optimizes strategies through self-play simulations without requiring any training data. We test STRATEGIST in competitive, multi-turn games with partial information, such asGame of Pure Strategy (GOPS)andThe Resistance: Avalon, a multi-agent hidden-identity discussion game. Our results show that STRATEGIST-based agents outperform traditional reinforcement learning models, other LLM-based methods, and existing LLM agents while achieving performance levels comparable to human players.

Citation History

Jan 25, 2026

Jan 26, 2026

Jan 28, 2026

Feb 13, 2026

11+11

Feb 13, 2026