"visual instruction tuning" Papers
15 papers found
Conference
CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal Large Language Models
Junho Kim, Hyungjin Chung, Byung-Hoon Kim
ICCV 2025arXiv:2411.06869
2
citations
IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model
Yatai Ji, Shilong Zhang, Jie Wu et al.
ICLR 2025arXiv:2407.07577
8
citations
Learning to Instruct for Visual Instruction Tuning
Zhihan Zhou, Feng Hong, JIAAN LUO et al.
NEURIPS 2025arXiv:2503.22215
3
citations
Reconstructive Visual Instruction Tuning
Haochen Wang, Anlin Zheng, Yucheng Zhao et al.
ICLR 2025arXiv:2410.09575
35
citations
Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness
Haochen Wang, Yucheng Zhao, Tiancai Wang et al.
ICCV 2025arXiv:2504.01901
33
citations
SMoLoRA: Exploring and Defying Dual Catastrophic Forgetting in Continual Visual Instruction Tuning
Ziqi Wang, Chang Che, Qi Wang et al.
ICCV 2025arXiv:2411.13949
4
citations
Visual Instruction Bottleneck Tuning
Changdae Oh, Jiatong Li, Shawn Im et al.
NEURIPS 2025arXiv:2505.13946
3
citations
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions
Wenbo Hu, Yifan Xu, Yi Li et al.
AAAI 2024paperarXiv:2308.09936
192
citations
DoRA: Weight-Decomposed Low-Rank Adaptation
Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin et al.
ICML 2024arXiv:2402.09353
706
citations
Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models
Mingrui Wu, Jiayi Ji, Oucheng Huang et al.
ICML 2024arXiv:2406.16449
27
citations
Improved Baselines with Visual Instruction Tuning
Haotian Liu, Chunyuan Li, Yuheng Li et al.
CVPR 2024highlightarXiv:2310.03744
4359
citations
LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning
Bolin Lai, Xiaoliang Dai, Lawrence Chen et al.
ECCV 2024arXiv:2312.03849
26
citations
Osprey: Pixel Understanding with Visual Instruction Tuning
Yuqian Yuan, Wentong Li, Jian liu et al.
CVPR 2024arXiv:2312.10032
149
citations
Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models
Jinrui Zhang, Teng Wang, Haigang Zhang et al.
ECCV 2024arXiv:2407.11422
11
citations
SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant
Guohao Sun, Can Qin, JIAMINAN WANG et al.
ECCV 2024arXiv:2403.11299
24
citations