"instruction-tuned models" Papers
7 papers found
Conference
Broken Tokens? Your Language Model can Secretly Handle Non-Canonical Tokenizations
Brian Zheng, Alisa Liu, Orevaoghene Ahia et al.
NEURIPS 2025spotlightarXiv:2506.19004
9
citations
Controllable Context Sensitivity and the Knob Behind It
Julian Minder, Kevin Du, Niklas Stoehr et al.
ICLR 2025arXiv:2411.07404
17
citations
Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs)
Leander Girrbach, Stephan Alaniz, Yiran Huang et al.
ICLR 2025arXiv:2410.19314
9
citations
Understanding and Enhancing Safety Mechanisms of LLMs via Safety-Specific Neuron
Yiran Zhao, Wenxuan Zhang, Yuxi Xie et al.
ICLR 2025
29
citations
Zero-Shot Detection of LLM-Generated Text via Implicit Reward Model
Runheng Liu, Heyan Huang, Xingchen Xiao et al.
NEURIPS 2025
A Comprehensive Analysis of the Effectiveness of Large Language Models as Automatic Dialogue Evaluators
Chen Zhang, L. F. D’Haro, Yiming Chen et al.
AAAI 2024paperarXiv:2312.15407
49
citations
Discovering Bias in Latent Space: An Unsupervised Debiasing Approach
Dyah Adila, Shuai Zhang, Boran Han et al.
ICML 2024arXiv:2406.03631
14
citations