α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
Ruimao Zhang
Ruimao Zhang
27
papers
3,032
total citations
papers (27)
WorldSimBench: Towards Video Generation Models as World Simulators
ICML 2025
arXiv
842
citations
AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation
NEURIPS 2022
arXiv
461
citations
2DPASS: 2D Priors Assisted Semantic Segmentation on LiDAR Point Clouds
ECCV 2022
arXiv
291
citations
Parser-Free Virtual Try-On via Distilling Appearance Flows
CVPR 2021
arXiv
233
citations
End-to-End Dense Video Captioning With Parallel Decoding
ICCV 2021
arXiv
228
citations
Motion-X: A Large-scale 3D Expressive Whole-body Human Motion Dataset
NEURIPS 2023
arXiv
216
citations
InstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds Through Instance Multi-Level Contextual Referring
ICCV 2021
arXiv
175
citations
SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models
CVPR 2024
arXiv
143
citations
HumanTOMATO: Text-aligned Whole-body Motion Generation
ICML 2024
arXiv
111
citations
MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception
CVPR 2024
arXiv
77
citations
Weakly Supervised Object Localization via Transformer with Implicit Spatial Calibration
ECCV 2022
arXiv
45
citations
Open-World Human-Object Interaction Detection via Multi-modal Prompts
CVPR 2024
arXiv
35
citations
SupFusion: Supervised LiDAR-Camera Fusion for 3D Object Detection
ICCV 2023
arXiv
28
citations
ScaMo: Exploring the Scaling Law in Autoregressive Motion Generation Model
CVPR 2025
arXiv
26
citations
F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions
ECCV 2024
arXiv
23
citations
Semantic Human Parsing via Scalable Semantic Transfer Over Multiple Label Domains
CVPR 2023
arXiv
19
citations
Neural Interactive Keypoint Detection
ICCV 2023
arXiv
17
citations
Exemplar Normalization for Learning Deep Representation
CVPR 2020
arXiv
16
citations
RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints
ICCV 2025
arXiv
13
citations
DriveGEN: Generalized and Robust 3D Detection in Driving via Controllable Text-to-Image Diffusion Generation
CVPR 2025
arXiv
12
citations
X4D-SceneFormer: Enhanced Scene Understanding on 4D Point Cloud Videos through Cross-Modal Knowledge Transfer
AAAI 2024
arXiv
11
citations
FreeMan: Towards Benchmarking 3D Human Pose Estimation under Real-World Conditions
CVPR 2024
arXiv
6
citations
Discovering Intrinsic Spatial-Temporal Logic Rules to Explain Human Actions
NEURIPS 2023
arXiv
4
citations
Towards Content-Independent Multi-Reference Super-Resolution: Adaptive Pattern Matching and Feature Aggregation
ECCV 2020
0
citations
SEED-Bench: Benchmarking Multimodal Large Language Models
CVPR 2024
0
citations
Towards Photo-Realistic Virtual Try-On by Adaptively Generating-Preserving Image Content
CVPR 2020
0
citations
Let Images Give You More: Point Cloud Cross-Modal Training for Shape Analysis
NEURIPS 2022
0
citations