"reward optimization" Papers
5 papers found
Conference
Alignment of Large Language Models with Constrained Learning
Botong Zhang, Shuo Li, Ignacio Hounie et al.
NEURIPS 2025arXiv:2505.19387
2
citations
Fine-Tuning Discrete Diffusion Models via Reward Optimization with Applications to DNA and Protein Design
Chenyu Wang, Masatoshi Uehara, Yichun He et al.
ICLR 2025arXiv:2410.13643
45
citations
Reducing the Probability of Undesirable Outputs in Language Models Using Probabilistic Inference
Stephen Zhao, Aidan Li, Rob Brekelmans et al.
NEURIPS 2025arXiv:2510.21184
Understanding Data Influence in Reinforcement Finetuning
Haoru Tan, Xiuzhe Wu, Sitong Wu et al.
NEURIPS 2025oral
GFlowNet Training by Policy Gradients
Puhua Niu, Shili Wu, Mingzhou Fan et al.
ICML 2024arXiv:2408.05885
3
citations