α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
Yu Su
Yu Su
2
Affiliations
Affiliations
Microsoft
The Ohio State University
23
papers
5,301
total citations
papers (23)
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
CVPR 2024
arXiv
1,715
citations
Mind2Web: Towards a Generalist Agent for the Web
NEURIPS 2023
arXiv
829
citations
LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models
ICCV 2023
arXiv
631
citations
MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing
NEURIPS 2023
arXiv
501
citations
GPT-4V(ision) is a Generalist Web Agent, if Grounded
ICML 2024
arXiv
424
citations
TravelPlanner: A Benchmark for Real-World Planning with Language Agents
ICML 2024
arXiv
319
citations
Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts
ICLR 2024
arXiv
261
citations
BioCLIP: A Vision Foundation Model for the Tree of Life
CVPR 2024
arXiv
176
citations
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics
CVPR 2025
arXiv
90
citations
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions
ICML 2024
arXiv
88
citations
VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
ICLR 2025
arXiv
70
citations
An Illusion of Progress? Assessing the Current State of Web Agents
COLM 2025
arXiv
56
citations
One Step at a Time: Long-Horizon Vision-and-Language Navigation With Milestones
CVPR 2022
arXiv
39
citations
A Simple Interpretable Transformer for Fine-Grained Image Classification and Analysis
ICLR 2024
arXiv
24
citations
Dual-View Visual Contextualization for Web Navigation
CVPR 2024
arXiv
23
citations
BioCLIP 2: Emergent Properties from Scaling Hierarchical Contrastive Learning
NEURIPS 2025
arXiv
20
citations
CONSIDER: Commonalities and Specialties Driven Multilingual Code Retrieval Framework
AAAI 2024
11
citations
Finer-CAM: Spotting the Difference Reveals Finer Details for Visual Explanation
CVPR 2025
arXiv
9
citations
Holistic Transfer: Towards Non-Disruptive Fine-Tuning with Partial Target Data
NEURIPS 2023
arXiv
7
citations
Prompt-CAM: Making Vision Transformers Interpretable for Fine-Grained Analysis
CVPR 2025
arXiv
6
citations
Distribution-Driven Dense Retrieval: Modeling Many-to-One Query-Document Relationship
AAAI 2025
2
citations
VERSE: Verification-based Self-Play for Code Instructions
AAAI 2025
0
citations
ScholarGEC: Enhancing Controllability of Large Language Model for Chinese Academic Grammatical Error Correction
AAAI 2025
0
citations