"benchmark generation" Papers
6 papers found
Conference
Autonomous Evaluation of LLMs for Truth Maintenance and Reasoning Tasks
Rushang Karia, Daniel Bramblett, Daksh Dobhal et al.
ICLR 2025arXiv:2410.08437
2
citations
Physiome-ODE: A Benchmark for Irregularly Sampled Multivariate Time-Series Forecasting Based on Biological ODEs
Christian Klötergens, Vijaya Krishna Yalavarthi, Randolf Scholz et al.
ICLR 2025arXiv:2502.07489
2
citations
Semantic-KG: Using Knowledge Graphs to Construct Benchmarks for Measuring Semantic Similarity
Qiyao Wei, Edward R Morrell, Lea Goetz et al.
NEURIPS 2025arXiv:2511.19925
Silencer: From Discovery to Mitigation of Self-Bias in LLM-as-Benchmark-Generator
Peiwen Yuan, Yiwei Li, Shaoxiong Feng et al.
NEURIPS 2025arXiv:2505.20738
3
citations
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution
Alex Gu, Baptiste Roziere, Hugh Leather et al.
ICML 2024arXiv:2401.03065
224
citations
HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation
Ce Zhang, Simon Stepputtis, Joseph Campbell et al.
CVPR 2024arXiv:2403.12033
26
citations