Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation

82citations

arXiv:2403.06988 PDF

citations

#179

in ICML 2024

of 2635 papers

Top Authors

Data Points

Top Authors

Luca Beurer-Kellner Marc Fischer Martin Vechev

Topics

constrained decoding large language models formal language constraints sub-word alignment speculative decoding decoding algorithms

Abstract

To ensure that text generated by large language models (LLMs) is in an expected format, constrained decoding methods propose to enforce strict formal language constraints during generation. However, as we show in this work, not only do such methods often incur performance overhead during generation, but many of them also significantly impair task accuracy, if they do not correctly align the underlying LLM sub-word vocabularies with external constraints. To address this, we present a novel decoding algorithm, DOMINO, that can enforce constraints in a fully subword-aligned fashion, while leveraging pre-computation and speculative decoding to achieve virtually no overhead and in some cases even almost 2$\times$ speedup over unconstrained decoding -- thereby outperforming existing approaches by a wide margin. We release DOMINO as open source at https://github.com/eth-sri/domino.

Citation History

Jan 28, 2026

Feb 13, 2026

82+82

Feb 13, 2026