"language model control" Papers
3 papers found
Conference
AI Control: Improving Safety Despite Intentional Subversion
Ryan Greenblatt, Buck Shlegeris, Kshitij Sachan et al.
ICML 2024arXiv:2312.06942
110
citations
A Language Model’s Guide Through Latent Space
Dimitri von Rütte, Sotiris Anagnostidis, Gregor Bachmann et al.
ICML 2024arXiv:2402.14433
44
citations
Successor Features for Efficient Multi-Subject Controlled Text Generation
Meng Cao, Mehdi Fatemi, Jackie Chi Kit Cheung et al.
ICML 2024