Gang Yu

Affiliations

Tencent

papers

3,342

total citations

papers (26)

Executing Your Commands via Motion Diffusion in Latent Space

CVPR 2023arXiv

545

citations

MotionGPT: Human Motion as a Foreign Language

NEURIPS 2023arXiv

466

citations

High-Order Information Matters: Learning Relation and Topology for Occluded Person Re-Identification

CVPR 2020arXiv

444

citations

Michelangelo: Conditional 3D Shape Generation based on Shape-Image-Text Aligned Latent Representation

NEURIPS 2023arXiv

174

citations

State-Aware Tracker for Real-Time Video Object Segmentation

CVPR 2020arXiv

120

citations

Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models

CVPR 2024arXiv

113

citations

MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers

ICLR 2025arXiv

102

citations

End-to-End 3D Dense Captioning With Vote2Cap-DETR

CVPR 2023arXiv

citations

IT3D: Improved Text-to-3D Generation with Explicit View Synthesis

AAAI 2024arXiv

citations

STAR Loss: Reducing Semantic Ambiguity in Facial Landmark Detection

CVPR 2023arXiv

citations

Hierarchical Normalization for Robust Monocular Depth Estimation

NEURIPS 2022arXiv

citations

D&D: Learning Human Dynamics from Dynamic Camera

ECCV 2022arXiv

citations

A Large-Scale Outdoor Multi-Modal Dataset and Benchmark for Novel View Synthesis and Implicit Scene Reconstruction

ICCV 2023arXiv

citations

KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models

NEURIPS 2025arXiv

citations

MotionChain: Conversational Motion Controllers via Multimodal Prompts

ECCV 2024arXiv

citations

MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D

CVPR 2025arXiv

citations

Coordinates Are NOT Lonely - Codebook Prior Helps Implicit Neural 3D representations

NEURIPS 2022arXiv

citations

Robust Geometry-Preserving Depth Estimation Using Differentiable Rendering

ICCV 2023arXiv

citations

DeRS: Towards Extremely Efficient Upcycled Mixture-of-Experts Models

CVPR 2025arXiv

citations

PDF: Point Diffusion Implicit Function for Large-scale Scene Neural Representation

NEURIPS 2023arXiv

citations

M3DBench: Towards Omni 3D Assistant with Interleaved Multi-modal Instructions

ECCV 2024

citations

LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding Reasoning and Planning

CVPR 2024

citations

PM-INR: Prior-Rich Multi-Modal Implicit Large-Scale Scene Neural Representation

AAAI 2024

citations

Gang Yu

Affiliations

papers (26)

Executing Your Commands via Motion Diffusion in Latent Space

MotionGPT: Human Motion as a Foreign Language

High-Order Information Matters: Learning Relation and Topology for Occluded Person Re-Identification

Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image

Context Prior for Scene Segmentation

TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation

Michelangelo: Conditional 3D Shape Generation based on Shape-Image-Text Aligned Latent Representation

State-Aware Tracker for Real-Time Video Object Segmentation

Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models

MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers

End-to-End 3D Dense Captioning With Vote2Cap-DETR

IT3D: Improved Text-to-3D Generation with Explicit View Synthesis

STAR Loss: Reducing Semantic Ambiguity in Facial Landmark Detection

Hierarchical Normalization for Robust Monocular Depth Estimation

D&D: Learning Human Dynamics from Dynamic Camera

A Large-Scale Outdoor Multi-Modal Dataset and Benchmark for Novel View Synthesis and Implicit Scene Reconstruction

KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models

MotionChain: Conversational Motion Controllers via Multimodal Prompts

MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D

Coordinates Are NOT Lonely - Codebook Prior Helps Implicit Neural 3D representations

Robust Geometry-Preserving Depth Estimation Using Differentiable Rendering

DeRS: Towards Extremely Efficient Upcycled Mixture-of-Experts Models

PDF: Point Diffusion Implicit Function for Large-scale Scene Neural Representation

M3DBench: Towards Omni 3D Assistant with Interleaved Multi-modal Instructions

LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding Reasoning and Planning

PM-INR: Prior-Rich Multi-Modal Implicit Large-Scale Scene Neural Representation

papers (26)

Executing Your Commands via Motion Diffusion in Latent Space

MotionGPT: Human Motion as a Foreign Language

High-Order Information Matters: Learning Relation and Topology for Occluded Person Re-Identification

Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image

Context Prior for Scene Segmentation

TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation

Michelangelo: Conditional 3D Shape Generation based on Shape-Image-Text Aligned Latent Representation

State-Aware Tracker for Real-Time Video Object Segmentation

Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models

MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers

End-to-End 3D Dense Captioning With Vote2Cap-DETR

IT3D: Improved Text-to-3D Generation with Explicit View Synthesis

STAR Loss: Reducing Semantic Ambiguity in Facial Landmark Detection

Hierarchical Normalization for Robust Monocular Depth Estimation

D&D: Learning Human Dynamics from Dynamic Camera

A Large-Scale Outdoor Multi-Modal Dataset and Benchmark for Novel View Synthesis and Implicit Scene Reconstruction

KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models

MotionChain: Conversational Motion Controllers via Multimodal Prompts

MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D

Coordinates Are NOT Lonely - Codebook Prior Helps Implicit Neural 3D representations

Robust Geometry-Preserving Depth Estimation Using Differentiable Rendering

DeRS: Towards Extremely Efficient Upcycled Mixture-of-Experts Models

PDF: Point Diffusion Implicit Function for Large-scale Scene Neural Representation

M3DBench: Towards Omni 3D Assistant with Interleaved Multi-modal Instructions

LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding Reasoning and Planning

PM-INR: Prior-Rich Multi-Modal Implicit Large-Scale Scene Neural Representation