α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
Josef Sivic
Josef Sivic
26
papers
3,040
total citations
papers (26)
End-to-End Learning of Visual Representations From Uncurated Instructional Videos
CVPR 2020
arXiv
761
citations
CosyPose: Consistent multi-view multi-object 6D pose estimation
ECCV 2020
arXiv
501
citations
Just Ask: Learning To Answer Questions From Millions of Narrated Videos
ICCV 2021
arXiv
338
citations
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
CVPR 2023
arXiv
332
citations
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
NEURIPS 2022
arXiv
277
citations
Efficient Neighbourhood Consensus Networks via Submanifold Sparse Convolutions
ECCV 2020
arXiv
192
citations
Thinking Fast and Slow: Efficient Text-to-Visual Retrieval With Transformers
CVPR 2021
arXiv
162
citations
TubeDETR: Spatio-Temporal Video Grounding With Transformers
CVPR 2022
arXiv
123
citations
Single-View Robot Pose and Joint Angle Estimation via Render & Compare
CVPR 2021
arXiv
62
citations
POP-3D: Open-Vocabulary 3D Occupancy Prediction from Images
NEURIPS 2023
arXiv
51
citations
Look for the Change: Learning Object States and State-Modifying Actions From Untrimmed Web Videos
CVPR 2022
arXiv
44
citations
VidChapters-7M: Video Chapters at Scale
NEURIPS 2023
arXiv
41
citations
Language-Guided Music Recommendation for Video via Prompt Analogies
CVPR 2023
arXiv
32
citations
Drive&Segment: Unsupervised Semantic Segmentation of Urban Scenes via Cross-Modal Distillation
ECCV 2022
arXiv
29
citations
Learning to design protein-protein interactions with enhanced generalization
ICLR 2024
arXiv
26
citations
Focal Length and Object Pose Estimation via Render and Compare
CVPR 2022
arXiv
22
citations
Weakly Supervised Human-Object Interaction Detection in Video via Contrastive Spatiotemporal Regions
ICCV 2021
arXiv
17
citations
Meta-Personalizing Vision-Language Models To Find Named Instances in Video
CVPR 2023
arXiv
15
citations
ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions
CVPR 2025
arXiv
6
citations
Learning to engineer protein flexibility
ICLR 2025
arXiv
4
citations
Large-scale Pre-training for Grounded Video Caption Generation
ICCV 2025
arXiv
3
citations
Improving Personalized Search with Regularized Low-Rank Parameter Updates
CVPR 2025
arXiv
1
citations
ResidualViT for Efficient Temporally Dense Video Encoding
ICCV 2025
arXiv
1
citations
Learning Actionness via Long-range Temporal Order Verification
ECCV 2020
0
citations
Discovering Divergent Representations between Text-to-Image Models
ICCV 2025
arXiv
0
citations
GenHowTo: Learning to Generate Actions and State Transformations from Instructional Videos
CVPR 2024
0
citations