by Robert Tang Papers
5 papers found
Conference
Beyond Chemical QA: Evaluating LLM's Chemical Reasoning with Modular Chemical Operations
Li Hao, He CAO, Bin Feng et al.
NEURIPS 2025arXiv:2505.21318
19
citations
DyFlow: Dynamic Workflow Framework for Agentic Reasoning
Yanbo Wang, Zixiang Xu, Yue Huang et al.
NEURIPS 2025arXiv:2509.26062
1
citations
KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation
Jiajun Shi, Jian Yang, Jiaheng Liu et al.
NEURIPS 2025spotlightarXiv:2505.14552
4
citations
SciArena: An Open Evaluation Platform for Non-Verifiable Scientific Literature-Grounded Tasks
Yilun Zhao, Kaiyan Zhang, Tiansheng Hu et al.
NEURIPS 2025spotlightarXiv:2507.01001
10
citations
WebDancer: Towards Autonomous Information Seeking Agency
Jialong Wu, Baixuan Li, Runnan Fang et al.
NEURIPS 2025arXiv:2505.22648
98
citations