Poster by Alex Chouldechova Papers
2 papers found
Conference
Comparison requires valid measurement: Rethinking attack success rate comparisons in AI red teaming
Alex Chouldechova, A. Feder Cooper, Solon Barocas et al.
NEURIPS 2025arXiv:2601.18076
1
citations
Validating LLM-as-a-Judge Systems under Rating Indeterminacy
Luke Guerdan, Solon Barocas, Kenneth Holstein et al.
NEURIPS 2025arXiv:2503.05965
7
citations