Maarten Sap

Affiliations

Carnegie Mellon University, Allen Institute for AI

papers

3,006

total citations

papers (14)

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

ICLR 2025arXiv

2,226

citations

SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents

ICLR 2024arXiv

239

citations

Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory

ICLR 2024arXiv

166

citations

When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment

NEURIPS 2022arXiv

118

citations

Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties

AAAI 2024arXiv

citations

Leftover Lunch: Advantage-based Offline Reinforcement Learning for Language Models

ICLR 2024arXiv

citations

ALFA: Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning

COLM 2025arXiv

citations

Fluid Language Model Benchmarking

COLM 2025arXiv

citations

The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong Gains

COLM 2025arXiv

citations

SafetyAnalyst: Interpretable, Transparent, and Steerable Safety Moderation for AI Behavior

ICML 2025arXiv

citations

Maarten Sap

Affiliations

papers (14)

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents

Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory

When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment

Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties

On the Resilience of LLM-Based Multi-Agent Collaboration with Faulty Agents

AutoPresent: Designing Structured Visuals from Scratch

PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages

Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)

Leftover Lunch: Advantage-based Offline Reinforcement Learning for Language Models

ALFA: Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning

Fluid Language Model Benchmarking

The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong Gains

SafetyAnalyst: Interpretable, Transparent, and Steerable Safety Moderation for AI Behavior

papers (14)

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents

Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory

When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment

Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties

On the Resilience of LLM-Based Multi-Agent Collaboration with Faulty Agents

AutoPresent: Designing Structured Visuals from Scratch

PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages

Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)

Leftover Lunch: Advantage-based Offline Reinforcement Learning for Language Models

ALFA: Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning

Fluid Language Model Benchmarking

The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong Gains

SafetyAnalyst: Interpretable, Transparent, and Steerable Safety Moderation for AI Behavior