HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Interactive AI Agents

0citations
PDFProject
0
citations
#337
in COLM 2025
of 418 papers
12
Top Authors
1
Data Points

Abstract

To address the growing safety risks as AI agents become increasingly autonomous in their interactions with human users and environments, we present HAICOSYSTEM, a framework examining AI agent safety within diverse and complex social interactions. HAICOSYSTEM features a modular sandbox environment that simulates multi-turn interactions between users and AI agents. We then develop a comprehensive multi-dimensional evaluation framework that uses metrics covering operational, content-related, societal, and legal risks to examine the safety of AI agents in these interactions. Through running over 8K simulations based on 132 scenarios across seven domains (e.g., healthcare, finance, education), we show that state-of-the-art LLMs exhibit safety risks in 62% of cases, particularly during tool use with malicious users, highlighting the importance of evaluating and addressing AI agent safety in dynamic human-AI-environment interactions.

Citation History

Feb 12, 2026
0