Zhendong Mao

papers

902

total citations

papers (26)

Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

CVPR 2021arXiv

377

citations

Graph Structured Network for Image-Text Matching

CVPR 2020arXiv

285

citations

Towards Accurate Image Coding: Improved Autoregressive Image Generation With Dynamic Vector Quantization

CVPR 2023arXiv

citations

Linguistic-Aware Patch Slimming Framework for Fine-grained Cross-Modal Alignment

CVPR 2024

citations

RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization

CVPR 2024arXiv

citations

Not All Image Regions Matter: Masked Vector Quantization for Autoregressive Image Generation

CVPR 2023arXiv

citations

RealGeneral: Unifying Visual Generation via Temporal In-Context Learning with Video Models

ICCV 2025arXiv

citations

Graph Mixture of Experts and Memory-augmented Routers for Multivariate Time Series Anomaly Detection

AAAI 2025arXiv

citations

Identification of Necessary Semantic Undertakers in the Causal View for Image-Text Matching

AAAI 2024

citations

CustomContrast: A Multilevel Contrastive Perspective for Subject-Driven Text-to-Image Customization

AAAI 2025arXiv

citations

D^2iT: Dynamic Diffusion Transformer for Accurate Image Generation

CVPR 2025

citations

DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization

ICCV 2025arXiv

citations

LongAnimation: Long Animation Generation with Dynamic Global-Local Memory

ICCV 2025arXiv

citations

Gradual Residuals Alignment: A Dual-Stream Framework for GAN Inversion and Image Attribute Editing

AAAI 2024arXiv

citations

FeedEdit: Text-Based Image Editing with Dynamic Feedback Regulation

CVPR 2025

citations

ELDER: Enhancing Lifelong Model Editing with Mixture-of-LoRA

AAAI 2025arXiv

citations

SVLTA: Benchmarking Vision-Language Temporal Alignment via Synthetic Video Situation

CVPR 2025arXiv

citations

1066 Benchmarking Large Language Models on Controllable Generation under Diversified Instructions

AAAI 2024

citations

Hierarchy-Aware Pseudo Word Learning with Text Adaptation for Zero-Shot Composed Image Retrieval

ICCV 2025

citations

A4A: Adapter for Adapter Transfer via All-for-All Mapping for Cross-Architecture Models

CVPR 2025

citations

Crossing the Gap: Domain Generalization for Image Captioning

CVPR 2023

citations

Lesion-Aware Transformers for Diabetic Retinopathy Grading

CVPR 2021

citations

DreamIdentity: Enhanced Editability for Efficient Face-Identity Preserved Image Generation

AAAI 2024

citations

Zhendong Mao

papers (26)

Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

Graph Structured Network for Image-Text Matching

Towards Accurate Image Coding: Improved Autoregressive Image Generation With Dynamic Vector Quantization

Linguistic-Aware Patch Slimming Framework for Fine-grained Cross-Modal Alignment

RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization

Not All Image Regions Matter: Masked Vector Quantization for Autoregressive Image Generation

RealGeneral: Unifying Visual Generation via Temporal In-Context Learning with Video Models

Graph Mixture of Experts and Memory-augmented Routers for Multivariate Time Series Anomaly Detection

Identification of Necessary Semantic Undertakers in the Causal View for Image-Text Matching

CustomContrast: A Multilevel Contrastive Perspective for Subject-Driven Text-to-Image Customization

D^2iT: Dynamic Diffusion Transformer for Accurate Image Generation

DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization

LongAnimation: Long Animation Generation with Dynamic Global-Local Memory

Gradual Residuals Alignment: A Dual-Stream Framework for GAN Inversion and Image Attribute Editing

FeedEdit: Text-Based Image Editing with Dynamic Feedback Regulation

ELDER: Enhancing Lifelong Model Editing with Mixture-of-LoRA

SVLTA: Benchmarking Vision-Language Temporal Alignment via Synthetic Video Situation

1066 Benchmarking Large Language Models on Controllable Generation under Diversified Instructions

Hierarchy-Aware Pseudo Word Learning with Text Adaptation for Zero-Shot Composed Image Retrieval

A4A: Adapter for Adapter Transfer via All-for-All Mapping for Cross-Architecture Models

Crossing the Gap: Domain Generalization for Image Captioning

Lesion-Aware Transformers for Diabetic Retinopathy Grading

DreamIdentity: Enhanced Editability for Efficient Face-Identity Preserved Image Generation

Negative-Aware Attention Framework for Image-Text Matching

Learning Semantic Relationship Among Instances for Image-Text Matching

Dragin3D: Image Editing by Dragging in 3D Space

papers (26)

Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

Graph Structured Network for Image-Text Matching

Towards Accurate Image Coding: Improved Autoregressive Image Generation With Dynamic Vector Quantization

Linguistic-Aware Patch Slimming Framework for Fine-grained Cross-Modal Alignment

RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization

Not All Image Regions Matter: Masked Vector Quantization for Autoregressive Image Generation

RealGeneral: Unifying Visual Generation via Temporal In-Context Learning with Video Models

Graph Mixture of Experts and Memory-augmented Routers for Multivariate Time Series Anomaly Detection

Identification of Necessary Semantic Undertakers in the Causal View for Image-Text Matching

CustomContrast: A Multilevel Contrastive Perspective for Subject-Driven Text-to-Image Customization

D^2iT: Dynamic Diffusion Transformer for Accurate Image Generation

DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization

LongAnimation: Long Animation Generation with Dynamic Global-Local Memory

Gradual Residuals Alignment: A Dual-Stream Framework for GAN Inversion and Image Attribute Editing

FeedEdit: Text-Based Image Editing with Dynamic Feedback Regulation

ELDER: Enhancing Lifelong Model Editing with Mixture-of-LoRA

SVLTA: Benchmarking Vision-Language Temporal Alignment via Synthetic Video Situation

1066 Benchmarking Large Language Models on Controllable Generation under Diversified Instructions

Hierarchy-Aware Pseudo Word Learning with Text Adaptation for Zero-Shot Composed Image Retrieval

A4A: Adapter for Adapter Transfer via All-for-All Mapping for Cross-Architecture Models

Crossing the Gap: Domain Generalization for Image Captioning

Lesion-Aware Transformers for Diabetic Retinopathy Grading

DreamIdentity: Enhanced Editability for Efficient Face-Identity Preserved Image Generation

Negative-Aware Attention Framework for Image-Text Matching

Learning Semantic Relationship Among Instances for Image-Text Matching

Dragin3D: Image Editing by Dragging in 3D Space