α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
Rita Cucchiara
Rita Cucchiara
24
papers
2,157
total citations
papers (24)
Meshed-Memory Transformer for Image Captioning
CVPR 2020
arXiv
1,042
citations
Conditional Channel Gated Networks for Task-Aware Continual Learning
CVPR 2020
arXiv
200
citations
Dress Code: High-Resolution Multi-Category Virtual Try-On
ECCV 2022
arXiv
189
citations
MOTSynth: How Can Synthetic Data Help Pedestrian Detection and Tracking?
ICCV 2021
arXiv
126
citations
Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing
ICCV 2023
arXiv
93
citations
Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation
CVPR 2023
arXiv
89
citations
Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation
CVPR 2020
arXiv
86
citations
How Many Observations Are Enough? Knowledge Distillation for Trajectory Forecasting
CVPR 2022
arXiv
72
citations
Handwritten Text Generation From Visual Archetypes
CVPR 2023
arXiv
43
citations
Masked Jigsaw Puzzle: A Versatile Position Embedding for Vision Transformers
CVPR 2023
arXiv
39
citations
With a Little Help from Your Own Past: Prototypical Memory Networks for Image Captioning
ICCV 2023
arXiv
31
citations
Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation
CVPR 2024
arXiv
31
citations
Maximum Class Separation as Inductive Bias in One Matrix
NEURIPS 2022
arXiv
26
citations
Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation
ICCV 2025
arXiv
23
citations
TrackFlow: Multi-Object tracking with Normalizing Flows
ICCV 2023
arXiv
23
citations
Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering
CVPR 2025
arXiv
11
citations
Zero-Shot Styled Text Image Generation, but Make It Autoregressive
CVPR 2025
arXiv
9
citations
Hyperbolic Safety-Aware Vision-Language Models
CVPR 2025
arXiv
6
citations
Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas
ECCV 2024
arXiv
6
citations
Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval
CVPR 2025
arXiv
6
citations
Diffusion Transformers for Tabular Data Time Series Generation
ICLR 2025
arXiv
3
citations
What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models
ICCV 2025
arXiv
2
citations
Modeling Human Gaze Behavior with Diffusion Models for Unified Scanpath Prediction
ICCV 2025
arXiv
1
citations
MissRAG: Addressing the Missing Modality Challenge in Multimodal Large Language Models
ICCV 2025
0
citations