AI Testing Should Account for Sophisticated Strategic Behaviour

2citations

arXiv:2508.14927

citations

#1951

in NEURIPS 2025

of 5858 papers

Top Authors

Data Points

Top Authors

Vojta Kovarik Eric Chen Sami Petersen Alexis Ghersengorin Vincent Conitzer

Abstract

This position paper argues for two claims regarding AI testing and evaluation. First, to remain informative about deployment behaviour, evaluations need account for the possibility that AI systems understand their circumstances and reason strategically. Second, game-theoretic analysis can inform evaluation design by formalising and scrutinising the reasoning in evaluation-based safety cases. Drawing on examples from existing AI systems, a review of relevant research, and formal strategic analysis of a stylised evaluation scenario, we present evidence for these claims and motivate several research directions.

Citation History

Jan 25, 2026

Feb 13, 2026

2+1

Feb 13, 2026