α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
Mingyu Ding
Mingyu Ding
1
Affiliations
Affiliations
The University of Hong Kong
27
papers
2,259
total citations
papers (27)
EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought
NEURIPS 2023
arXiv
358
citations
Learning Depth-Guided Convolutions for Monocular 3D Object Detection
CVPR 2020
arXiv
355
citations
DaViT: Dual Attention Vision Transformers
ECCV 2022
arXiv
352
citations
Segmenting Transparent Objects in the Wild
ECCV 2020
arXiv
206
citations
Dense Hybrid Recurrent Multi-view Stereo Net with Dynamic Consistency Checking
ECCV 2020
arXiv
200
citations
Pyramid Multi-view Stereo Net with Self-adaptive View Aggregation
ECCV 2020
arXiv
133
citations
VDT: General-purpose Video Diffusion Transformers via Mask Modeling
ICLR 2024
arXiv
102
citations
Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language
NEURIPS 2021
arXiv
81
citations
HR-NAS: Searching Efficient High-Resolution Neural Architectures With Lightweight Transformers
CVPR 2021
arXiv
74
citations
SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution
CVPR 2024
arXiv
67
citations
RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins
CVPR 2025
arXiv
60
citations
UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling
ICLR 2024
arXiv
55
citations
RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis
ICML 2024
arXiv
46
citations
Physion++: Evaluating Physical Scene Understanding that Requires Online Inference of Different Physical Properties
NEURIPS 2023
arXiv
28
citations
TextPSG: Panoptic Scene Graph Generation from Textual Descriptions
ICCV 2023
arXiv
23
citations
Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos
ICCV 2025
arXiv
22
citations
Visual Dependency Transformers: Dependency Tree Emerges From Reversed Attention
CVPR 2023
arXiv
19
citations
LGDN: Language-Guided Denoising Network for Video-Language Modeling
NEURIPS 2022
arXiv
19
citations
Mod-Squad: Designing Mixtures of Experts As Modular Multi-Task Learners
CVPR 2023
arXiv
18
citations
DexHandDiff: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation
CVPR 2025
arXiv
12
citations
Towards Free Data Selection with General-Purpose Models
NEURIPS 2023
arXiv
12
citations
CompGS: Unleashing 2D Compositionality for Compositional Text-to-3D via Dynamically Optimizing 3D Gaussians
CVPR 2025
arXiv
9
citations
X-Drive: Cross-modality Consistent Multi-Sensor Data Synthesis for Driving Scenarios
ICLR 2025
arXiv
8
citations
EC2: Emergent Communication for Embodied Control
CVPR 2023
0
citations
L2M-GAN: Learning To Manipulate Latent Space Semantics for Facial Attribute Editing
CVPR 2021
0
citations
Doubly-Robust Self-Training
NEURIPS 2023
0
citations
Compressed Video Contrastive Learning
NEURIPS 2021
0
citations