α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
Haotian Zhang
Haotian Zhang
19
papers
2,848
total citations
papers (19)
Grounded Language-Image Pre-Training
CVPR 2022
arXiv
1,431
citations
Ferret: Refer and Ground Anything Anywhere at Any Granularity
ICLR 2024
arXiv
457
citations
GLIPv2: Unifying Localization and Vision-Language Understanding
NEURIPS 2022
arXiv
357
citations
TransMVSNet: Global Context-Aware Multi-View Stereo Network With Transformers
CVPR 2022
arXiv
257
citations
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
ECCV 2024
arXiv
157
citations
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms
ICLR 2025
arXiv
45
citations
ELSD: Efficient Line Segment Detector and Descriptor
ICCV 2021
arXiv
32
citations
KD-MVS: Knowledge Distillation Based Self-Supervised Learning for Multi-View Stereo
ECCV 2022
arXiv
29
citations
Offline and Online Optical Flow Enhancement for Deep Video Compression
AAAI 2024
arXiv
28
citations
MMEgo: Towards Building Egocentric Multimodal LLMs for Video QA
ICLR 2025
12
citations
GENMO: A GENeralist Model for Human MOtion
ICCV 2025
arXiv
10
citations
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models
ICLR 2025
arXiv
9
citations
Rendering-Aware Reinforcement Learning for Vector Graphics Generation
NEURIPS 2025
arXiv
9
citations
Learned Image Compression with Hierarchical Progressive Context Modeling
ICCV 2025
arXiv
8
citations
Sobolev Training for Implicit Neural Representations with Approximated Image Derivatives
ECCV 2022
arXiv
6
citations
Few-Shot Domain Adaptation for Learned Image Compression
AAAI 2025
arXiv
1
citations
SIGMA: Refining Large Language Model Reasoning via Sibling-Guided Monte Carlo Augmentation
NEURIPS 2025
arXiv
0
citations
"Spotting Temporally Precise, Fine-Grained Events in Video"
ECCV 2022
0
citations
GenAL: Generative Agent for Adaptive Learning
AAAI 2025
0
citations