Jun Zhang

Affiliations

Zhejiang University

papers

1,044

total citations

papers (29)

Generalized Relation Modeling for Transformer Tracking

CVPR 2023arXiv

203

citations

PyramidCLIP: Hierarchical Feature Alignment for Vision-language Model Pretraining

NEURIPS 2022arXiv

143

citations

Generalized Predictive Model for Autonomous Driving

CVPR 2024arXiv

128

citations

DimensionX: Create Any 3D and 4D Scenes from a Single Image with Decoupled Video Diffusion

ICCV 2025

citations

GATCluster: Self-Supervised Gaussian-Attention Network for Image Clustering

ECCV 2020arXiv

citations

Boosting Consistency in Story Visualization with Rich-Contextual Conditional Diffusion Models

AAAI 2025arXiv

citations

Training-Free Long-Context Scaling of Large Language Models

ICML 2024arXiv

citations

Do Not Disturb Me: Person Re-identification Under the Interference of Other Pedestrians

ECCV 2020arXiv

citations

DReS-FL: Dropout-Resilient Secure Federated Learning for Non-IID Clients via Secret Data Sharing

NEURIPS 2022arXiv

citations

Ask&Confirm: Active Detail Enriching for Cross-Modal Retrieval With Partial Query

ICCV 2021arXiv

citations

Multi-dataset Training of Transformers for Robust Action Recognition

NEURIPS 2022arXiv

citations

Individual Contributions as Intrinsic Exploration Scaffolds for Multi-agent Reinforcement Learning

ICML 2024arXiv

citations

Task-Aware Encoder Control for Deep Video Compression

CVPR 2024arXiv

citations

p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay

ICCV 2025arXiv

citations

FinMMR: Make Financial Numerical Reasoning More Multimodal, Comprehensive, and Challenging

ICCV 2025arXiv

citations

CAMSIC: Content-aware Masked Image Modeling Transformer for Stereo Image Compression

AAAI 2025arXiv

citations

FloE: On-the-Fly MoE Inference on Memory-constrained GPU

ICML 2025arXiv

citations

Semi-Supervised Clustering Framework for Fine-grained Scene Graph Generation

AAAI 2025

citations

Learn How to Query from Unlabeled Data Streams in Federated Learning

AAAI 2025arXiv

citations

Learning 3D Shape Feature for Texture-Insensitive Person Re-Identification

CVPR 2021

citations

Node-Aligned Graph Convolutional Network for Whole-Slide Image Representation and Classification

CVPR 2022

citations

Predicting Lymph Node Metastasis Using Histopathological Images Based on Multiple Instance Learning With Deep Graph Convolution

CVPR 2020

citations

SCL-WC: Cross-Slide Contrastive Learning for Weakly-Supervised Whole-Slide Image Classification

NEURIPS 2022

citations

On the Convergence of an Adaptive Momentum Method for Adversarial Attacks

AAAI 2024

citations

TransLoc4D: Transformer-based 4D Radar Place Recognition

CVPR 2024

citations

Attentional Pyramid Pooling of Salient Visual Residuals for Place Recognition

ICCV 2021

citations

Jun Zhang

Affiliations

papers (29)

Generalized Relation Modeling for Transformer Tracking

PyramidCLIP: Hierarchical Feature Alignment for Vision-language Model Pretraining

Generalized Predictive Model for Autonomous Driving

DimensionX: Create Any 3D and 4D Scenes from a Single Image with Decoupled Video Diffusion

GATCluster: Self-Supervised Gaussian-Attention Network for Image Clustering

Boosting Consistency in Story Visualization with Rich-Contextual Conditional Diffusion Models

Training-Free Long-Context Scaling of Large Language Models

Do Not Disturb Me: Person Re-identification Under the Interference of Other Pedestrians

DReS-FL: Dropout-Resilient Secure Federated Learning for Non-IID Clients via Secret Data Sharing

Text-Adaptive Multiple Visual Prototype Matching for Video-Text Retrieval

Boosting Neural Representations for Videos with a Conditional Decoder

MEGA: Memory-Efficient 4D Gaussian Splatting for Dynamic Scenes

Ask&Confirm: Active Detail Enriching for Cross-Modal Retrieval With Partial Query

Multi-dataset Training of Transformers for Robust Action Recognition

Individual Contributions as Intrinsic Exploration Scaffolds for Multi-agent Reinforcement Learning

Task-Aware Encoder Control for Deep Video Compression

p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay

FinMMR: Make Financial Numerical Reasoning More Multimodal, Comprehensive, and Challenging

CAMSIC: Content-aware Masked Image Modeling Transformer for Stereo Image Compression

FloE: On-the-Fly MoE Inference on Memory-constrained GPU

Semi-Supervised Clustering Framework for Fine-grained Scene Graph Generation

Learn How to Query from Unlabeled Data Streams in Federated Learning

Learning 3D Shape Feature for Texture-Insensitive Person Re-Identification

Node-Aligned Graph Convolutional Network for Whole-Slide Image Representation and Classification

Predicting Lymph Node Metastasis Using Histopathological Images Based on Multiple Instance Learning With Deep Graph Convolution

SCL-WC: Cross-Slide Contrastive Learning for Weakly-Supervised Whole-Slide Image Classification

On the Convergence of an Adaptive Momentum Method for Adversarial Attacks

TransLoc4D: Transformer-based 4D Radar Place Recognition

Attentional Pyramid Pooling of Salient Visual Residuals for Place Recognition

papers (29)

Generalized Relation Modeling for Transformer Tracking

PyramidCLIP: Hierarchical Feature Alignment for Vision-language Model Pretraining

Generalized Predictive Model for Autonomous Driving

DimensionX: Create Any 3D and 4D Scenes from a Single Image with Decoupled Video Diffusion

GATCluster: Self-Supervised Gaussian-Attention Network for Image Clustering

Boosting Consistency in Story Visualization with Rich-Contextual Conditional Diffusion Models

Training-Free Long-Context Scaling of Large Language Models

Do Not Disturb Me: Person Re-identification Under the Interference of Other Pedestrians

DReS-FL: Dropout-Resilient Secure Federated Learning for Non-IID Clients via Secret Data Sharing

Text-Adaptive Multiple Visual Prototype Matching for Video-Text Retrieval

Boosting Neural Representations for Videos with a Conditional Decoder

MEGA: Memory-Efficient 4D Gaussian Splatting for Dynamic Scenes

Ask&Confirm: Active Detail Enriching for Cross-Modal Retrieval With Partial Query

Multi-dataset Training of Transformers for Robust Action Recognition

Individual Contributions as Intrinsic Exploration Scaffolds for Multi-agent Reinforcement Learning

Task-Aware Encoder Control for Deep Video Compression

p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay

FinMMR: Make Financial Numerical Reasoning More Multimodal, Comprehensive, and Challenging

CAMSIC: Content-aware Masked Image Modeling Transformer for Stereo Image Compression

FloE: On-the-Fly MoE Inference on Memory-constrained GPU

Semi-Supervised Clustering Framework for Fine-grained Scene Graph Generation

Learn How to Query from Unlabeled Data Streams in Federated Learning

Learning 3D Shape Feature for Texture-Insensitive Person Re-Identification

Node-Aligned Graph Convolutional Network for Whole-Slide Image Representation and Classification

Predicting Lymph Node Metastasis Using Histopathological Images Based on Multiple Instance Learning With Deep Graph Convolution

SCL-WC: Cross-Slide Contrastive Learning for Weakly-Supervised Whole-Slide Image Classification

On the Convergence of an Adaptive Momentum Method for Adversarial Attacks

TransLoc4D: Transformer-based 4D Radar Place Recognition

Attentional Pyramid Pooling of Salient Visual Residuals for Place Recognition