α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
Kai-Wei Chang
Kai-Wei Chang
2
Affiliations
Affiliations
MIT CSAIL
UCLA
28
papers
6,055
total citations
papers (28)
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
NEURIPS 2022
arXiv
1,949
citations
Grounded Language-Image Pre-Training
CVPR 2022
arXiv
1,431
citations
MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts
ICLR 2024
arXiv
1,235
citations
Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models
NEURIPS 2023
arXiv
423
citations
REVEAL: Retrieval-Augmented Visual-Language Pre-Training With Multi-Source Multimodal Knowledge Memory
CVPR 2023
arXiv
149
citations
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory
ICLR 2025
arXiv
128
citations
Semantic Probabilistic Layers for Neuro-Symbolic Learning
NEURIPS 2022
arXiv
107
citations
On Prompt-Driven Safeguarding for Large Language Models
ICML 2024
arXiv
106
citations
VideoPhy: Evaluating Physical Commonsense for Video Generation
ICLR 2025
arXiv
106
citations
CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning
ICCV 2023
arXiv
66
citations
X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents
COLM 2025
arXiv
49
citations
Controllable Text Generation with Neurally-Decomposed Oracle
NEURIPS 2022
arXiv
42
citations
Characterizing Truthfulness in Large Language Model Generations with Local Intrinsic Dimension
ICML 2024
arXiv
40
citations
VideoCon: Robust Video-Language Alignment via Contrast Captions
CVPR 2024
arXiv
30
citations
DesCo: Learning Object Recognition with Rich Language Descriptions
NEURIPS 2023
arXiv
29
citations
When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning
COLM 2025
24
citations
STIV: Scalable Text and Image Conditioned Video Generation
ICCV 2025
arXiv
21
citations
ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in Large Multimodal Models
ICML 2024
arXiv
20
citations
GIVL: Improving Geographical Inclusivity of Vision-Language Models With Pre-Training Methods
CVPR 2023
arXiv
20
citations
A Pseudo-Semantic Loss for Autoregressive Models with Logical Constraints
NEURIPS 2023
arXiv
19
citations
Automatic Perturbation Analysis for Scalable Certified Robustness and Beyond
NEURIPS 2020
arXiv
15
citations
CoBIT: A Contrastive Bi-directional Image-Text Generation Model
ICLR 2024
arXiv
14
citations
VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning
CVPR 2025
arXiv
13
citations
AVIS: Autonomous Visual Information Seeking with Large Language Model Agent
NEURIPS 2023
arXiv
12
citations
Exploring Visual Vulnerabilities via Multi-Loss Adversarial Search for Jailbreaking Vision-Language Models
CVPR 2025
arXiv
6
citations
Verbalized Representation Learning for Interpretable Few-Shot Generalization
ICCV 2025
arXiv
1
citations
On the Discrimination Risk of Mean Aggregation Feature Imputation in Graphs
NEURIPS 2022
0
citations
Position: TrustLLM: Trustworthiness in Large Language Models
ICML 2024
0
citations