α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
Kaipeng Zhang
Kaipeng Zhang
21
papers
1,293
total citations
papers (21)
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models
ICLR 2024
arXiv
341
citations
OneLLM: One Framework to Align All Modalities with Language
CVPR 2024
arXiv
201
citations
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
ICML 2024
arXiv
163
citations
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
ICML 2024
arXiv
141
citations
GUIOdyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
ICCV 2025
arXiv
113
citations
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation
ICML 2025
arXiv
76
citations
DiffRate : Differentiable Compression Rate for Efficient Vision Transformers
ICCV 2023
arXiv
75
citations
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
ICLR 2025
arXiv
48
citations
TagCLIP: A Local-to-Global Framework to Enhance Open-Vocabulary Multi-Label Classification of CLIP without Training
AAAI 2024
arXiv
28
citations
Foundation Model is Efficient Multimodal Multitask Model Selector
NEURIPS 2023
arXiv
22
citations
OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation
CVPR 2025
arXiv
20
citations
Neighboring Autoregressive Modeling for Efficient Visual Generation
ICCV 2025
arXiv
19
citations
Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping
ICLR 2025
arXiv
17
citations
REPA Works Until It Doesn’t: Early-Stopped, Holistic Alignment Supercharges Diffusion Training
NEURIPS 2025
8
citations
DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model
CVPR 2024
arXiv
7
citations
LiT: Delving into a Simple Linear Diffusion Transformer for Image Generation
ICCV 2025
arXiv
6
citations
Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification
AAAI 2024
arXiv
4
citations
ProJudge: A Multi-Modal Multi-Discipline Benchmark and Instruction-Tuning Dataset for MLLM-based Process Judges
ICCV 2025
arXiv
3
citations
Position: Towards Implicit Prompt For Text-To-Image Models
ICML 2024
arXiv
1
citations
Neural Routing by Memory
NEURIPS 2021
0
citations
ZipVL: Accelerating Vision-Language Models through Dynamic Token Sparsity
ICCV 2025
0
citations