Poster "instruction-tuned models" Papers
5 papers found
Conference
Controllable Context Sensitivity and the Knob Behind It
Julian Minder, Kevin Du, Niklas Stoehr et al.
ICLR 2025arXiv:2411.07404
17
citations
Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs)
Leander Girrbach, Stephan Alaniz, Yiran Huang et al.
ICLR 2025arXiv:2410.19314
9
citations
Understanding and Enhancing Safety Mechanisms of LLMs via Safety-Specific Neuron
Yiran Zhao, Wenxuan Zhang, Yuxi Xie et al.
ICLR 2025
29
citations
Zero-Shot Detection of LLM-Generated Text via Implicit Reward Model
Runheng Liu, Heyan Huang, Xingchen Xiao et al.
NEURIPS 2025
Discovering Bias in Latent Space: An Unsupervised Debiasing Approach
Dyah Adila, Shuai Zhang, Boran Han et al.
ICML 2024arXiv:2406.03631
14
citations