"model compression" Papers

128 papers found • Page 2 of 3

Quantized Spike-driven Transformer

Xuerui Qiu, Malu Zhang, Jieyuan Zhang et al.

ICLR 2025arXiv:2501.13492
14
citations

RILQ: Rank-Insensitive LoRA-Based Quantization Error Compensation for Boosting 2-Bit Large Language Model Accuracy

Geonho Lee, Janghwan Lee, Sukjin Hong et al.

AAAI 2025paperarXiv:2412.01129
5
citations

RSAVQ: Riemannian Sensitivity-Aware Vector Quantization for Large Language Models

Zukang Xu, Xing Hu, Qiang Wu et al.

NEURIPS 2025arXiv:2510.01240

S$^2$NN: Sub-bit Spiking Neural Networks

Wenjie Wei, Malu Zhang, Jieyuan (Eric) Zhang et al.

NEURIPS 2025arXiv:2509.24266

SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators

Rasoul Shafipour, David Harrison, Maxwell Horton et al.

ICLR 2025arXiv:2410.10714
2
citations

SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking

Xingrun Xing, Boyan Gao, Zheng Liu et al.

ICLR 2025arXiv:2407.04752
23
citations

STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs

Peijie Dong, Lujun Li, Yuedong Zhong et al.

ICLR 2025arXiv:2408.01803
32
citations

Streamlining Redundant Layers to Compress Large Language Models

Xiaodong Chen, Yuxuan Hu, Jing Zhang et al.

ICLR 2025arXiv:2403.19135
19
citations

Systematic Outliers in Large Language Models

Yongqi An, Xu Zhao, Tao Yu et al.

ICLR 2025arXiv:2502.06415
19
citations

TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models

Makoto Shing, Kou Misaki, Han Bao et al.

ICLR 2025oralarXiv:2501.16937
13
citations

The Journey Matters: Average Parameter Count over Pre-training Unifies Sparse and Dense Scaling Laws

Tian Jin, Ahmed Imtiaz Humayun, Utku Evci et al.

ICLR 2025arXiv:2501.12486
1
citations

The Unreasonable Ineffectiveness of the Deeper Layers

Andrey Gromov, Kushal Tirumala, Hassan Shapourian et al.

ICLR 2025arXiv:2403.17887
172
citations

Toward Adaptive Large Language Models Structured Pruning via Hybrid-grained Weight Importance Assessment

Jun Liu, Zhenglun Kong, Pu Zhao et al.

AAAI 2025paperarXiv:2403.10799
14
citations

TSENOR: Highly-Efficient Algorithm for Finding Transposable N:M Sparse Masks

Xiang Meng, Mehdi Makni, Rahul Mazumder

NEURIPS 2025

Two is Better than One: Efficient Ensemble Defense for Robust and Compact Models

Yoojin Jung, Byung Cheol Song

CVPR 2025arXiv:2504.04747
1
citations

Variance-Based Pruning for Accelerating and Compressing Trained Networks

Uranik Berisha, Jens Mehnert, Alexandru Condurache

ICCV 2025arXiv:2507.12988
1
citations

VLDrive: Vision-Augmented Lightweight MLLMs for Efficient Language-grounded Autonomous Driving

Ruifei Zhang, Wei Zhang, Xiao Tan et al.

ICCV 2025arXiv:2511.06256
5
citations

VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers

Juncan Deng, Shuaiting Li, Zeyu Wang et al.

AAAI 2025paperarXiv:2408.17131
11
citations

What Makes a Good Dataset for Knowledge Distillation?

Logan Frank, Jim Davis

CVPR 2025arXiv:2411.12817
4
citations

An Empirical Study of CLIP for Text-Based Person Search

Cao Min, Yang Bai, ziyin Zeng et al.

AAAI 2024paperarXiv:2308.10045
98
citations

Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs

Yeonhong Park, Jake Hyun, SangLyul Cho et al.

ICML 2024arXiv:2402.10517
43
citations

Bayesian Knowledge Distillation: A Bayesian Perspective of Distillation with Uncertainty Quantification

Luyang Fang, Yongkai Chen, Wenxuan Zhong et al.

ICML 2024

BiLLM: Pushing the Limit of Post-Training Quantization for LLMs

Wei Huang, Yangdong Liu, Haotong Qin et al.

ICML 2024arXiv:2402.04291
142
citations

Binarized Low-light Raw Video Enhancement

Gengchen Zhang, Yulun Zhang, Xin Yuan et al.

CVPR 2024arXiv:2403.19944
16
citations

BiPFT: Binary Pre-trained Foundation Transformer with Low-Rank Estimation of Binarization Residual Polynomials

Xingrun Xing, Li Du, Xinyuan Wang et al.

AAAI 2024paperarXiv:2312.08937
5
citations

BK-SDM: A Lightweight, Fast, and Cheap Version of Stable Diffusion

Bo-Kyeong Kim, Hyoung-Kyu Song, Thibault Castells et al.

ECCV 2024arXiv:2305.15798
9
citations

Building Variable-Sized Models via Learngene Pool

Boyu Shi, Shiyu Xia, Xu Yang et al.

AAAI 2024paperarXiv:2312.05743
5
citations

CHAI: Clustered Head Attention for Efficient LLM Inference

Saurabh Agarwal, Bilge Acun, Basil Hosmer et al.

ICML 2024arXiv:2403.08058
13
citations

CompGS: Smaller and Faster Gaussian Splatting with Vector Quantization

K L Navaneet, Kossar Pourahmadi, Soroush Abbasi Koohpayegani et al.

ECCV 2024arXiv:2311.18159
115
citations

Compressing Large Language Models by Joint Sparsification and Quantization

Jinyang Guo, Jianyu Wu, Zining Wang et al.

ICML 2024

Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

Junyuan Hong, Jinhao Duan, Chenhui Zhang et al.

ICML 2024arXiv:2403.15447
49
citations

DeepCache: Accelerating Diffusion Models for Free

Xinyin Ma, Gongfan Fang, Xinchao Wang

CVPR 2024arXiv:2312.00858
279
citations

DFD: Distilling the Feature Disparity Differently for Detectors

Kang Liu, Yingyi Zhang, Jingyun Zhang et al.

ICML 2024

Distilling Knowledge from Large-Scale Image Models for Object Detection

Gang Li, Wenhai Wang, Xiang Li et al.

ECCV 2024
4
citations

DistiLLM: Towards Streamlined Distillation for Large Language Models

Jongwoo Ko, Sungnyun Kim, Tianyi Chen et al.

ICML 2024arXiv:2402.03898
73
citations

Do Topological Characteristics Help in Knowledge Distillation?

Jungeun Kim, Junwon You, Dongjin Lee et al.

ICML 2024

DεpS: Delayed ε-Shrinking for Faster Once-For-All Training

Aditya Annavajjala, Alind Khare, Animesh Agrawal et al.

ECCV 2024arXiv:2407.06167
1
citations

Efficient Multitask Dense Predictor via Binarization

Yuzhang Shang, Dan Xu, Gaowen Liu et al.

CVPR 2024arXiv:2405.14136
6
citations

Enhanced Sparsification via Stimulative Training

Shengji Tang, Weihao Lin, Hancheng Ye et al.

ECCV 2024arXiv:2403.06417
2
citations

Enhancing Vision Transformer: Amplifying Non-Linearity in Feedforward Network Module

Yixing Xu, Chao Li, Dong Li et al.

ICML 2024

Entropy Induced Pruning Framework for Convolutional Neural Networks

Yiheng Lu, Ziyu Guan, Yaming Yang et al.

AAAI 2024paperarXiv:2208.06660
6
citations

EPSD: Early Pruning with Self-Distillation for Efficient Model Compression

Dong Chen, Ning Liu, Yichen Zhu et al.

AAAI 2024paperarXiv:2402.00084
9
citations

ERQ: Error Reduction for Post-Training Quantization of Vision Transformers

Yunshan Zhong, Jiawei Hu, You Huang et al.

ICML 2024spotlight

ExCP: Extreme LLM Checkpoint Compression via Weight-Momentum Joint Shrinking

Wenshuo Li, Xinghao Chen, Han Shu et al.

ICML 2024arXiv:2406.11257
9
citations

Exploring Intrinsic Dimension for Vision-Language Model Pruning

Hanzhang Wang, Jiawen Zhang, Qingyuan Ma

ICML 2024

Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation

Zhewei Yao, Xiaoxia Wu, Cheng Li et al.

AAAI 2024paperarXiv:2303.08302
71
citations

Extreme Compression of Large Language Models via Additive Quantization

Vage Egiazarian, Andrei Panferov, Denis Kuznedelev et al.

ICML 2024arXiv:2401.06118
160
citations

FedMef: Towards Memory-efficient Federated Dynamic Pruning

Hong Huang, Weiming Zhuang, Chen Chen et al.

CVPR 2024arXiv:2403.14737
18
citations

Fixed Point Diffusion Models

Luke Melas-Kyriazi, Xingjian Bai

CVPR 2024arXiv:2401.08741
6
citations

Flextron: Many-in-One Flexible Large Language Model

Ruisi Cai, Saurav Muralidharan, Greg Heinrich et al.

ICML 2024arXiv:2406.10260
34
citations