by Kristina Nikolić Papers
2 papers found
Conference
RealMath: A Continuous Benchmark for Evaluating Language Models on Research-Level Mathematics
Jie Zhang, Cezara Petrui, Kristina Nikolić et al.
NEURIPS 2025arXiv:2505.12575
12
citations
The Jailbreak Tax: How Useful are Your Jailbreak Outputs?
Kristina Nikolić, Luze Sun, Jie Zhang et al.
ICML 2025spotlightarXiv:2504.10694
16
citations