Victor Veitch

papers

677

total citations

papers (12)

The Linear Representation Hypothesis and the Geometry of Large Language Models

ICML 2024arXiv

363

citations

Sense and Sensitivity Analysis: Simple Post-Hoc Analysis of Bias Due to Unobserved Confounding

NEURIPS 2020arXiv

citations

Causal Context Connects Counterfactual Fairness to Robust Prediction and Group Fairness

NEURIPS 2023arXiv

citations

Uncovering Meanings of Embeddings via Partial Orthogonality

NEURIPS 2023arXiv

citations

Using Embeddings for Causal Estimation of Peer Influence in Social Networks

NEURIPS 2022arXiv

citations

Does Editing Provide Evidence for Localization?

ICLR 2025arXiv

citations

RATE: Causal Explainability of Reward Models with Imperfect Counterfactuals

ICML 2025arXiv

citations

Counterfactual Invariance to Spurious Correlations in Text Classification

NEURIPS 2021

citations

Victor Veitch

papers (12)

The Linear Representation Hypothesis and the Geometry of Large Language Models

Sense and Sensitivity Analysis: Simple Post-Hoc Analysis of Bias Due to Unobserved Confounding

Concept Algebra for (Score-Based) Text-Controlled Generative Models

On the Origins of Linear Representations in Large Language Models

Invariant and Transportable Representations for Anti-Causal Domain Shifts

Transforming and Combining Rewards for Aligning Large Language Models

Causal Context Connects Counterfactual Fairness to Robust Prediction and Group Fairness

Uncovering Meanings of Embeddings via Partial Orthogonality

Using Embeddings for Causal Estimation of Peer Influence in Social Networks

Does Editing Provide Evidence for Localization?

RATE: Causal Explainability of Reward Models with Imperfect Counterfactuals

Counterfactual Invariance to Spurious Correlations in Text Classification

papers (12)

The Linear Representation Hypothesis and the Geometry of Large Language Models

Sense and Sensitivity Analysis: Simple Post-Hoc Analysis of Bias Due to Unobserved Confounding

Concept Algebra for (Score-Based) Text-Controlled Generative Models

On the Origins of Linear Representations in Large Language Models

Invariant and Transportable Representations for Anti-Causal Domain Shifts

Transforming and Combining Rewards for Aligning Large Language Models

Causal Context Connects Counterfactual Fairness to Robust Prediction and Group Fairness

Uncovering Meanings of Embeddings via Partial Orthogonality

Using Embeddings for Causal Estimation of Peer Influence in Social Networks

Does Editing Provide Evidence for Localization?

RATE: Causal Explainability of Reward Models with Imperfect Counterfactuals

Counterfactual Invariance to Spurious Correlations in Text Classification