Do Large Code Models Understand Programming Concepts? Counterfactual Analysis for Code Predicates

15citations

arXiv:2402.05980 PDF

citations

#772

in ICML 2024

of 2635 papers

Top Authors

Data Points

Top Authors

Ashish Hooda Mihai Christodorescu Miltiadis Allamanis Aaron Wilson Kassem Fawaz Somesh Jha

Topics

large language models code generation counterfactual analysis programming concepts code predicates data flow control flow black-box evaluation

Abstract

Large Language Models' success in text generation has also made them better at code generation and coding tasks. While a lot of work has demonstrated their remarkable performance on tasks such as code completion and editing, it is still unclear as to why. We help bridge this gap by exploring to what degree auto-regressive models understand the logical constructs of the underlying programs. We propose Counterfactual Analysis for Programming Concept Predicates (CACP) as a counterfactual testing framework to evaluate whether Large Code Models understand programming concepts. With only black-box access to the model, we use CACP to evaluate ten popular Large Code Models for four different programming concepts. Our findings suggest that current models lack understanding of concepts such as data flow and control flow.

Citation History

Jan 28, 2026

Feb 13, 2026

15+15

Feb 13, 2026