"visual instruction tuning" Papers

15 papers found

CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal Large Language Models

Junho Kim, Hyungjin Chung, Byung-Hoon Kim

ICCV 2025arXiv:2411.06869
2
citations

IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model

Yatai Ji, Shilong Zhang, Jie Wu et al.

ICLR 2025arXiv:2407.07577
8
citations

Learning to Instruct for Visual Instruction Tuning

Zhihan Zhou, Feng Hong, JIAAN LUO et al.

NEURIPS 2025arXiv:2503.22215
3
citations

Reconstructive Visual Instruction Tuning

Haochen Wang, Anlin Zheng, Yucheng Zhao et al.

ICLR 2025arXiv:2410.09575
35
citations

Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness

Haochen Wang, Yucheng Zhao, Tiancai Wang et al.

ICCV 2025arXiv:2504.01901
33
citations

SMoLoRA: Exploring and Defying Dual Catastrophic Forgetting in Continual Visual Instruction Tuning

Ziqi Wang, Chang Che, Qi Wang et al.

ICCV 2025arXiv:2411.13949
4
citations

Visual Instruction Bottleneck Tuning

Changdae Oh, Jiatong Li, Shawn Im et al.

NEURIPS 2025arXiv:2505.13946
3
citations

BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions

Wenbo Hu, Yifan Xu, Yi Li et al.

AAAI 2024paperarXiv:2308.09936
192
citations

DoRA: Weight-Decomposed Low-Rank Adaptation

Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin et al.

ICML 2024arXiv:2402.09353
706
citations

Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models

Mingrui Wu, Jiayi Ji, Oucheng Huang et al.

ICML 2024arXiv:2406.16449
27
citations

Improved Baselines with Visual Instruction Tuning

Haotian Liu, Chunyuan Li, Yuheng Li et al.

CVPR 2024highlightarXiv:2310.03744
4359
citations

LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning

Bolin Lai, Xiaoliang Dai, Lawrence Chen et al.

ECCV 2024arXiv:2312.03849
26
citations

Osprey: Pixel Understanding with Visual Instruction Tuning

Yuqian Yuan, Wentong Li, Jian liu et al.

CVPR 2024arXiv:2312.10032
149
citations

Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models

Jinrui Zhang, Teng Wang, Haigang Zhang et al.

ECCV 2024arXiv:2407.11422
11
citations

SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant

Guohao Sun, Can Qin, JIAMINAN WANG et al.

ECCV 2024arXiv:2403.11299
24
citations