Zongxin Yang

papers

1,613

total citations

papers (22)

SIFU: Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction

CVPR 2024arXiv

citations

DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)

ICML 2024arXiv

citations

Global-to-Local Modeling for Video-Based 3D Human Pose and Shape Estimation

CVPR 2023arXiv

citations

Global-correlated 3D-decoupling Transformer for Clothed Avatar Reconstruction

NEURIPS 2023arXiv

citations

DSC-PoseNet: Learning 6DoF Object Pose Estimation via Dual-Scale Consistency

CVPR 2021arXiv

citations

TransHuman: A Transformer-based Human Representation for Generalizable Neural Human Rendering

ICCV 2023arXiv

citations

The Devil is in Temporal Token: High Quality Video Reasoning Segmentation

CVPR 2025arXiv

citations

JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human Mesh Recovery

ICCV 2023arXiv

citations

Integrating Boxes and Masks: A Multi-Object Framework for Unified Visual Tracking and Segmentation

ICCV 2023arXiv

citations

DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models

ICCV 2025arXiv

citations

Instance As Identity: A Generic Online Paradigm for Video Instance Segmentation

ECCV 2022arXiv

citations

3DIS: Depth-Driven Decoupled Image Synthesis for Universal Multi-Instance Generation

ICLR 2025

citations

Few-Shot Incremental Learning via Foreground Aggregation and Knowledge Transfer for Audio-Visual Semantic Segmentation

AAAI 2025

citations

ProD: Prompting-To-Disentangle Domain Knowledge for Cross-Domain Few-Shot Image Classification

CVPR 2023

citations

FedSeg: Class-Heterogeneous Federated Learning for Semantic Segmentation

CVPR 2023

citations

H2FA R-CNN: Holistic and Hierarchical Feature Alignment for Cross-Domain Weakly Supervised Object Detection

CVPR 2022

citations

SKDream: Controllable Multi-view and 3D Generation with Arbitrary Skeletons

CVPR 2025

citations

Zongxin Yang

papers (22)

Associating Objects with Transformers for Video Object Segmentation

Collaborative Video Object Segmentation by Foreground-Background Integration

Gated Channel Transformation for Visual Recognition

Decoupling Features in Hierarchical Propagation for Video Object Segmentation

Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation

SIFU: Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction

DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)

Global-to-Local Modeling for Video-Based 3D Human Pose and Shape Estimation

Global-correlated 3D-decoupling Transformer for Clothed Avatar Reconstruction

DSC-PoseNet: Learning 6DoF Object Pose Estimation via Dual-Scale Consistency

TransHuman: A Transformer-based Human Representation for Generalizable Neural Human Rendering

The Devil is in Temporal Token: High Quality Video Reasoning Segmentation

JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human Mesh Recovery

Integrating Boxes and Masks: A Multi-Object Framework for Unified Visual Tracking and Segmentation

DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models

Instance As Identity: A Generic Online Paradigm for Video Instance Segmentation

3DIS: Depth-Driven Decoupled Image Synthesis for Universal Multi-Instance Generation

Few-Shot Incremental Learning via Foreground Aggregation and Knowledge Transfer for Audio-Visual Semantic Segmentation

ProD: Prompting-To-Disentangle Domain Knowledge for Cross-Domain Few-Shot Image Classification

FedSeg: Class-Heterogeneous Federated Learning for Semantic Segmentation

H2FA R-CNN: Holistic and Hierarchical Feature Alignment for Cross-Domain Weakly Supervised Object Detection

SKDream: Controllable Multi-view and 3D Generation with Arbitrary Skeletons

papers (22)

Associating Objects with Transformers for Video Object Segmentation

Collaborative Video Object Segmentation by Foreground-Background Integration

Gated Channel Transformation for Visual Recognition

Decoupling Features in Hierarchical Propagation for Video Object Segmentation

Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation

SIFU: Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction

DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)

Global-to-Local Modeling for Video-Based 3D Human Pose and Shape Estimation

Global-correlated 3D-decoupling Transformer for Clothed Avatar Reconstruction

DSC-PoseNet: Learning 6DoF Object Pose Estimation via Dual-Scale Consistency

TransHuman: A Transformer-based Human Representation for Generalizable Neural Human Rendering

The Devil is in Temporal Token: High Quality Video Reasoning Segmentation

JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human Mesh Recovery

Integrating Boxes and Masks: A Multi-Object Framework for Unified Visual Tracking and Segmentation

DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models

Instance As Identity: A Generic Online Paradigm for Video Instance Segmentation

3DIS: Depth-Driven Decoupled Image Synthesis for Universal Multi-Instance Generation

Few-Shot Incremental Learning via Foreground Aggregation and Knowledge Transfer for Audio-Visual Semantic Segmentation

ProD: Prompting-To-Disentangle Domain Knowledge for Cross-Domain Few-Shot Image Classification

FedSeg: Class-Heterogeneous Federated Learning for Semantic Segmentation

H2FA R-CNN: Holistic and Hierarchical Feature Alignment for Cross-Domain Weakly Supervised Object Detection

SKDream: Controllable Multi-view and 3D Generation with Arbitrary Skeletons