"deception detection" Papers
2 papers found
Conference
Among Us: A Sandbox for Measuring and Detecting Agentic Deception
Satvik Golechha, Adrià Garriga-Alonso
NEURIPS 2025spotlightarXiv:2504.04072
8
citations
CoT Red-Handed: Stress Testing Chain-of-Thought Monitoring
Benjamin Arnav, Pablo Bernabeu-Perez, Nathan Helm-Burger et al.
NEURIPS 2025arXiv:2505.23575
13
citations