by Udari Sehwag Papers
3 papers found
Conference
Collab: Controlled Decoding using Mixture of Agents for LLM Alignment
Souradip Chakraborty, Sujay Bhatt, Udari Sehwag et al.
ICLR 2025arXiv:2503.21720
18
citations
GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-Time Alignment
Yuancheng Xu, Udari Sehwag, Alec Koppel et al.
ICLR 2025arXiv:2410.08193
37
citations
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal
Tinghao Xie, Xiangyu Qi, Yi Zeng et al.
ICLR 2025arXiv:2406.14598
151
citations