Xingang Wang

papers

1,167

total citations

papers (18)

OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception

ICCV 2023arXiv

239

citations

DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation

CVPR 2025arXiv

citations

ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration

CVPR 2025arXiv

citations

DiffBEV: Conditional Diffusion Model for Bird’s Eye View Perception

AAAI 2024arXiv

citations

Relevant Intrinsic Feature Enhancement Network for Few-Shot Semantic Segmentation

AAAI 2024arXiv

citations

Are We Ready for Vision-Centric Driving Streaming Perception? The ASAP Benchmark

CVPR 2023arXiv

citations

Bayesian Prompt Flow Learning for Zero-Shot Anomaly Detection

CVPR 2025arXiv

citations

ReconDreamer++: Harmonizing Generative and Reconstructive Models for Driving Scene Representation

ICCV 2025arXiv

citations

EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Videos Generation

NEURIPS 2025arXiv

citations

HumanDreamer: Generating Controllable Human-Motion Videos via Decoupled Generation

CVPR 2025arXiv

citations

Multi-Granularity Distillation Scheme towards Lightweight Semi-Supervised Semantic Segmentation

ECCV 2022arXiv

citations

DictAS: A Framework for Class-Generalizable Few-Shot Anomaly Segmentation via Dictionary Lookup

ICCV 2025arXiv

citations

Rethinking Lanes and Points in Complex Scenarios for Monocular 3D Lane Detection

CVPR 2025arXiv

citations

DynImg: Key Frames with Visual Prompts are Good Representation for Multi-Modal Video Understanding

ICCV 2025arXiv

citations

Xingang Wang

papers (18)

OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception

Learning Dynamic Routing for Semantic Segmentation

DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation

MVSTER: Epipolar Transformer for Efficient Multi-View Stereo

FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation

DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation

ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration

DiffBEV: Conditional Diffusion Model for Bird’s Eye View Perception

Relevant Intrinsic Feature Enhancement Network for Few-Shot Semantic Segmentation

Are We Ready for Vision-Centric Driving Streaming Perception? The ASAP Benchmark

Bayesian Prompt Flow Learning for Zero-Shot Anomaly Detection

ReconDreamer++: Harmonizing Generative and Reconstructive Models for Driving Scene Representation

EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Videos Generation

HumanDreamer: Generating Controllable Human-Motion Videos via Decoupled Generation

Multi-Granularity Distillation Scheme towards Lightweight Semi-Supervised Semantic Segmentation

DictAS: A Framework for Class-Generalizable Few-Shot Anomaly Segmentation via Dictionary Lookup

Rethinking Lanes and Points in Complex Scenarios for Monocular 3D Lane Detection

DynImg: Key Frames with Visual Prompts are Good Representation for Multi-Modal Video Understanding

papers (18)

OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception

Learning Dynamic Routing for Semantic Segmentation

DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation

MVSTER: Epipolar Transformer for Efficient Multi-View Stereo

FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation

DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation

ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration

DiffBEV: Conditional Diffusion Model for Bird’s Eye View Perception

Relevant Intrinsic Feature Enhancement Network for Few-Shot Semantic Segmentation

Are We Ready for Vision-Centric Driving Streaming Perception? The ASAP Benchmark

Bayesian Prompt Flow Learning for Zero-Shot Anomaly Detection

ReconDreamer++: Harmonizing Generative and Reconstructive Models for Driving Scene Representation

EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Videos Generation

HumanDreamer: Generating Controllable Human-Motion Videos via Decoupled Generation

Multi-Granularity Distillation Scheme towards Lightweight Semi-Supervised Semantic Segmentation

DictAS: A Framework for Class-Generalizable Few-Shot Anomaly Segmentation via Dictionary Lookup

Rethinking Lanes and Points in Complex Scenarios for Monocular 3D Lane Detection

DynImg: Key Frames with Visual Prompts are Good Representation for Multi-Modal Video Understanding