"object localization" Papers

16 papers found

CrossOver: 3D Scene Cross-Modal Alignment

Sayan Deb Sarkar, Ondrej Miksik, Marc Pollefeys et al.

CVPR 2025highlightarXiv:2502.15011
8
citations

FBRT-YOLO: Faster and Better for Real-Time Aerial Image Detection

Yao Xiao, Tingfa Xu, Yu Xin et al.

AAAI 2025paperarXiv:2504.20670
62
citations

Fire360: A Benchmark for Robust Perception and Episodic Memory in Degraded 360° Firefighting Video

Aditi Tiwari, Farzaneh Masoud, Dac Nguyen et al.

NEURIPS 2025oral

Ground-V: Teaching VLMs to Ground Complex Instructions in Pixels

Yongshuo Zong, Qin ZHANG, DONGSHENG An et al.

CVPR 2025arXiv:2505.13788
3
citations

MomentSeeker: A Task-Oriented Benchmark For Long-Video Moment Retrieval

Huaying Yuan, Jian Ni, Zheng Liu et al.

NEURIPS 2025arXiv:2502.12558
3
citations

RELOCATE: A Simple Training-Free Baseline for Visual Query Localization Using Region-Based Representations

Savya Khosla, Sethuraman T V, Alexander G. Schwing et al.

CVPR 2025arXiv:2412.01826
5
citations

SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding

Rong Li, Shijie Li, Lingdong Kong et al.

CVPR 2025arXiv:2412.04383
43
citations

ThinkBot: Embodied Instruction Following with Thought Chain Reasoning

Guanxing Lu, Ziwei Wang, Changliu Liu et al.

ICLR 2025arXiv:2312.07062
17
citations

Adaptive Multi-task Learning for Few-shot Object Detection

Yan Ren, Yanling Li, Wai-Kin Adams Kong

ECCV 2024
6
citations

FRED: Towards a Full Rotation-Equivariance in Aerial Image Object Detection

Chanho Lee, Jinsu Son, Hyounguk Shon et al.

AAAI 2024paperarXiv:2401.06159
26
citations

Griffon: Spelling out All Object Locations at Any Granularity with Large Language Models

Yufei Zhan, Yousong Zhu, Zhiyang Chen et al.

ECCV 2024arXiv:2311.14552
31
citations

Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding

Jin-Chuan Shi, Miao Wang, Haobin Duan et al.

CVPR 2024arXiv:2311.18482
177
citations

PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs

Michael Dorkenwald, Nimrod Barazani, Cees G. M. Snoek et al.

CVPR 2024arXiv:2402.08657
15
citations

Point Segment and Count: A Generalized Framework for Object Counting

Zhizhong Huang, Mingliang Dai, Yi Zhang et al.

CVPR 2024arXiv:2311.12386
46
citations

Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models

Siddharth Karamcheti, Suraj Nair, Ashwin Balakrishna et al.

ICML 2024arXiv:2402.07865
258
citations

The All-Seeing Project V2: Towards General Relation Comprehension of the Open World

Weiyun Wang Weiyun, yiming ren, Haowen Luo et al.

ECCV 2024arXiv:2402.19474
89
citations