by ma yingwei Papers
2 papers found
Conference
Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining
Jie Cheng, Ruixi Qiao, ma yingwei et al.
ICLR 2025oralarXiv:2410.00564
8
citations
At Which Training Stage Does Code Data Help LLMs Reasoning?
ma yingwei, Yue Liu, Yue Yu et al.
ICLR 2024spotlightarXiv:2309.16298
95
citations