KernelBench: Can LLMs Write Efficient GPU Kernels?

63citations

arXiv:2502.10517

citations

#71

in ICML 2025

of 3340 papers

Top Authors

Data Points

Top Authors

Anne Ouyang Simon Guo Simran Arora Alex Zhang William Hu Christopher Re Azalia Mirhoseini

Abstract

Efficient GPU kernels are crucial for building performant machine learning architectures, but writing them is a time-consuming challenge that requires significant expertise; therefore, we explore using language models (LMs) to automate kernel generation. We introduce **KernelBench**, an open-source framework for evaluating LMs' ability to write fast and correct kernels on a suite of 250 carefully selected PyTorch ML workloads. KernelBench represents a real-world engineering environment and making progress on the introduced benchmark directly translates to faster practical kernels. We introduce a new evaluation metric $\text{fast}_p$, which measures the percentage of generated kernels that are functionally correct and offer a speedup greater than an adjustable threshold $p$ over baseline. Our experiments across various state-of-the-art models and test-time methods show that frontier reasoning models perform the best out of the box but still fall short overall, matching the PyTorch baseline in less than 20\% of the cases. While we show that results can improve by leveraging execution and profiling feedback during iterative refinement, KernelBench remains a challenging benchmark, with its difficulty increasing as we raise speedup threshold $p$.

Citation History

Jan 28, 2026

Feb 13, 2026

63+63

Feb 13, 2026