Analyzing Multilingualism in Large Language Models with Sparse Autoencoders

0citations
PDFProject
0
citations
#337
in COLM 2025
of 418 papers
2
Top Authors
1
Data Points

Abstract

Despite the impressive multilingual capabilities of recent large language models (LLMs), the mechanisms underlying their language-specific processing remain largely unclear. In this paper, we investigate how LLMs handle multilingualism through the lens of sparse autoencoders (SAEs), uncovering distinctive patterns that offer new insights into their internal workings. Specifically, we introduce two novel concepts—task instruction–focused (TF) and heading-focused (HF) SAE features—and use them to reveal intrinsic discrepancies between high- and low-performing languages. Our analysis yields several key findings: (1) SAEs provide concrete evidence that LLMs have a precise understanding of prompt structure; (2) heading keywords (e.g., “Question,” “Choices,” and “Answer”) play a distinct role in LLM processing; and (3) low-performing languages exhibit a relative deficiency in TF features compared to high-performing languages. Building on these insights, we propose two practical strategies to improve zero-shot multilingual performance: (1) incorporating English heading keywords and (2) amplifying TF features through steering. Our approach improves zero-shot performance in low-performing languages by up to 3.7% on average on ARC-Challenge and MMLU, while also shedding new light on fundamental differences between high- and low-performing languages in LLMs. Our code is available at https://github.com/ihcho2/SAE-ML.

Citation History

Feb 12, 2026
0