Scaling Web Agent Training through Automatic Data Generation and Fine-grained Evaluation

0citations

PDF Project

citations

#337

in COLM 2025

of 418 papers

Top Authors

Data Points

Top Authors

Lajanugen Logeswaran Jaekyeom Kim Sungryull Sohn Creighton Glasscock Honglak Lee

Topics

web agent evaluation distillation

Abstract

We present a scalable pipeline for automatically generating high-quality training data for web agents. In particular, a major challenge in identifying high-quality training instances is trajectory evaluation - quantifying how much progress was made towards task completion. We introduce a novel constraint-based evaluation framework that provides fine-grained assessment of progress towards task completion. This enables us to leverage partially successful trajectories, which significantly expands the amount of usable training data. We evaluate our method on a new benchmark we propose called BookingArena, which consists of complex booking tasks across 20 popular websites, and demonstrate that our distilled student model outperforms open-source approaches and matches or exceeds commercial systems, while being a significantly smaller model. Our work addresses the challenge of efficiently creating diverse, realistic web interaction datasets and provides a systematic evaluation methodology for complex structured web tasks.

Citation History

Feb 12, 2026