HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Interactive AI Agents

0citations

PDF Project

citations

#337

in COLM 2025

of 418 papers

Top Authors

Data Points

Top Authors

Xuhui Zhou Hyunwoo Kim Faeze Brahman Liwei Jiang Hao Zhu Ximing Lu Frank F. Xu Bill Yuchen Lin Yejin Choi Niloofar Mireshghallah Ronan Le Bras Maarten Sap

Topics

AI Safety Multi-Agent Systems Human-AI Interaction Social Simulation

Abstract

To address the growing safety risks as AI agents become increasingly autonomous in their interactions with human users and environments, we present HAICOSYSTEM, a framework examining AI agent safety within diverse and complex social interactions. HAICOSYSTEM features a modular sandbox environment that simulates multi-turn interactions between users and AI agents. We then develop a comprehensive multi-dimensional evaluation framework that uses metrics covering operational, content-related, societal, and legal risks to examine the safety of AI agents in these interactions. Through running over 8K simulations based on 132 scenarios across seven domains (e.g., healthcare, finance, education), we show that state-of-the-art LLMs exhibit safety risks in 62% of cases, particularly during tool use with malicious users, highlighting the importance of evaluating and addressing AI agent safety in dynamic human-AI-environment interactions.

Citation History

Feb 12, 2026