α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
Xiaoyi Dong
Xiaoyi Dong
26
papers
3,700
total citations
papers (26)
CSWin Transformer: A General Vision Transformer Backbone With Cross-Shaped Windows
CVPR 2022
arXiv
1,252
citations
Mobile-Former: Bridging MobileNet and Transformer
CVPR 2022
arXiv
634
citations
OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
CVPR 2024
arXiv
385
citations
Visual-RFT: Visual Reinforcement Fine-Tuning
ICCV 2025
arXiv
357
citations
MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining
CVPR 2023
arXiv
231
citations
Protecting Celebrities From DeepFake With Identity Consistency Transformer
CVPR 2022
arXiv
164
citations
LG-GAN: Label Guided Adversarial Network for Flexible Targeted Attack of Point Cloud Based Deep Networks
CVPR 2020
arXiv
121
citations
Shape-Invariant 3D Adversarial Point Clouds
CVPR 2022
arXiv
103
citations
Bootstrapped Masked Autoencoders for Vision BERT Pretraining
ECCV 2022
arXiv
89
citations
Diversity-Aware Meta Visual Prompting
CVPR 2023
arXiv
78
citations
GreedyFool: Distortion-Aware Sparse Adversarial Attack
NEURIPS 2020
arXiv
77
citations
SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree
ICCV 2025
arXiv
56
citations
OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
CVPR 2025
arXiv
40
citations
Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction
CVPR 2025
arXiv
36
citations
MM-IFEngine: Towards Multimodal Instruction Following
ICCV 2025
arXiv
22
citations
Light-A-Video: Training-free Video Relighting via Progressive Light Fusion
ICCV 2025
arXiv
21
citations
Emotional Listener Portrait: Neural Listener Head Generation with Emotion
ICCV 2023
arXiv
18
citations
Improving Adversarial Robustness of Masked Autoencoders via Test-time Frequency-domain Prompting
ICCV 2023
arXiv
12
citations
ByTheWay: Boost Your Text-to-Video Generation Model to Higher Quality in a Training-free Way
CVPR 2025
arXiv
2
citations
Bootstrap3D: Improving Multi-view Diffusion Model with Synthetic Data
ICCV 2025
arXiv
2
citations
Robust Superpixel-Guided Attentional Adversarial Attack
CVPR 2020
0
citations
Conical Visual Concentration for Efficient Large Vision-Language Models
CVPR 2025
0
citations
Adaptive Face Forgery Detection in Cross Domain
ECCV 2022
0
citations
Self-Robust 3D Point Recognition via Gather-Vector Guidance
CVPR 2020
0
citations
X-Prompt: Generalizable Auto-Regressive Visual Learning with In-Context Prompting
ICCV 2025
0
citations
Deciphering Cross-Modal Alignment in Large Vision-Language Models via Modality Integration Rate
ICCV 2025
0
citations