α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
William Yang Wang
William Yang Wang
26
papers
1,942
total citations
papers (26)
REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments
CVPR 2020
arXiv
433
citations
LayoutGPT: Compositional Visual Planning and Generation with Large Language Models
NEURIPS 2023
arXiv
300
citations
Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text
NEURIPS 2023
arXiv
222
citations
Large Language Models Are Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning
NEURIPS 2023
arXiv
164
citations
VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View
AAAI 2024
arXiv
108
citations
LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation
NEURIPS 2023
arXiv
100
citations
Weak-to-Strong Jailbreaking on Large Language Models
ICML 2025
arXiv
95
citations
Learning Concise and Descriptive Attributes for Visual Recognition
ICCV 2023
arXiv
88
citations
An Empirical Study of End-to-End Video-Language Transformers With Masked Visual Modeling
CVPR 2023
arXiv
83
citations
Unsupervised Reinforcement Learning of Transferable Meta-Skills for Embodied Navigation
CVPR 2020
arXiv
79
citations
Environment-agnostic Multitask Learning for Natural Language Grounded Navigation
ECCV 2020
arXiv
70
citations
Tell Me What Happened: Unifying Text-Guided Video Completion via Multimodal Masked Video Generation
CVPR 2023
arXiv
40
citations
Unveiling the Impact of Coding Data Instruction Fine-Tuning on Large Language Models Reasoning
AAAI 2025
arXiv
33
citations
Reward Guided Latent Consistency Distillation
ICLR 2025
arXiv
27
citations
Combating Multimodal LLM Hallucination via Bottom-Up Holistic Reasoning
AAAI 2025
arXiv
19
citations
VSP: Diagnosing the Dual Challenges of Perception and Reasoning in Spatial Planning Tasks for MLLMs
ICCV 2025
18
citations
Improving Few-Shot Generalization by Exploring and Exploiting Auxiliary Data
NEURIPS 2023
arXiv
13
citations
Local Explanation of Dialogue Response Generation
NEURIPS 2021
arXiv
13
citations
BlobGEN-Vid: Compositional Text-to-Video Generation with Blob Video Representations
CVPR 2025
arXiv
11
citations
M3L: Language-Based Video Editing via Multi-Modal Multi-Level Transformers
CVPR 2022
arXiv
10
citations
Counterfactual Maximum Likelihood Estimation for Training Deep Networks
NEURIPS 2021
arXiv
8
citations
Flexible Attention-Based Multi-Policy Fusion for Efficient Deep Reinforcement Learning
NEURIPS 2023
arXiv
6
citations
VITED: Video Temporal Evidence Distillation
CVPR 2025
arXiv
2
citations
Counterfactual Vision-and-Language Navigation via Adversarial Path Sampler
ECCV 2020
0
citations
Language-Driven Artistic Style Transfer
ECCV 2022
0
citations
ALGO: Synthesizing Algorithmic Programs with Generated Oracle Verifiers
NEURIPS 2023
0
citations