α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
Hilde Kuehne
Hilde Kuehne
2
Affiliations
Affiliations
Goethe University Frankfurt
MIT-IBM Watson AI Lab
27
papers
889
total citations
papers (27)
Everything at Once - Multi-Modal Fusion Transformer for Video Retrieval
CVPR 2022
arXiv
157
citations
Multimodal Clustering Networks for Self-Supervised Learning From Unlabeled Videos
ICCV 2021
arXiv
97
citations
Grounding Everything: Emerging Localization Properties in Vision-Language Transformers
CVPR 2024
arXiv
76
citations
Generalized and Incremental Few-Shot Learning by Explicit Learning and Calibration Without Forgetting
ICCV 2021
arXiv
70
citations
Deep Differentiable Logic Gate Networks
NEURIPS 2022
arXiv
65
citations
Unsupervised Domain Generalization by Learning a Bridge Across Domains
CVPR 2022
arXiv
54
citations
MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge
ICCV 2023
arXiv
49
citations
Video Test-Time Adaptation for Action Recognition
CVPR 2023
arXiv
47
citations
Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules
CVPR 2021
arXiv
41
citations
HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
ECCV 2024
arXiv
33
citations
Learning with Algorithmic Supervision via Continuous Relaxations
NEURIPS 2021
arXiv
32
citations
Detector-Free Weakly Supervised Grounding by Separation
ICCV 2021
arXiv
31
citations
What a MESS: Multi-Domain Evaluation of Zero-Shot Semantic Segmentation
NEURIPS 2023
arXiv
28
citations
LeGrad: An Explainability Method for Vision Transformers via Feature Formation Sensitivity
ICCV 2025
arXiv
25
citations
Learning Situation Hyper-Graphs for Video Question Answering
CVPR 2023
arXiv
23
citations
Learning by Sorting: Self-supervised Learning with Group Ordering Constraints
ICCV 2023
arXiv
15
citations
Preserving Modality Structure Improves Multi-Modal Learning
ICCV 2023
arXiv
13
citations
What When and Where? Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions
CVPR 2024
arXiv
9
citations
Learning Human Action Recognition Representations Without Real Humans
NEURIPS 2023
arXiv
7
citations
In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video Retrieval
ICCV 2023
arXiv
6
citations
Weakly Supervised Grounding for VQA in Vision-Language Transformers
ECCV 2022
arXiv
5
citations
Teaching VLMs to Localize Specific Objects from In-context Examples
ICCV 2025
arXiv
3
citations
CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment
CVPR 2025
arXiv
2
citations
Unbiasing through Textual Descriptions: Mitigating Representation Bias in Video Benchmarks
CVPR 2025
arXiv
1
citations
VideoGEM: Training-free Action Grounding in Videos
CVPR 2025
arXiv
0
citations
CycDA: Unsupervised Cycle Domain Adaptation to Learn from Image to Video
ECCV 2022
0
citations
How Transferable are Video Representations Based on Synthetic Data?
NEURIPS 2022
0
citations