"model compression" Papers
128 papers found • Page 2 of 3
Conference
Quantized Spike-driven Transformer
Xuerui Qiu, Malu Zhang, Jieyuan Zhang et al.
RILQ: Rank-Insensitive LoRA-Based Quantization Error Compensation for Boosting 2-Bit Large Language Model Accuracy
Geonho Lee, Janghwan Lee, Sukjin Hong et al.
RSAVQ: Riemannian Sensitivity-Aware Vector Quantization for Large Language Models
Zukang Xu, Xing Hu, Qiang Wu et al.
S$^2$NN: Sub-bit Spiking Neural Networks
Wenjie Wei, Malu Zhang, Jieyuan (Eric) Zhang et al.
SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators
Rasoul Shafipour, David Harrison, Maxwell Horton et al.
SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking
Xingrun Xing, Boyan Gao, Zheng Liu et al.
STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs
Peijie Dong, Lujun Li, Yuedong Zhong et al.
Streamlining Redundant Layers to Compress Large Language Models
Xiaodong Chen, Yuxuan Hu, Jing Zhang et al.
Systematic Outliers in Large Language Models
Yongqi An, Xu Zhao, Tao Yu et al.
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models
Makoto Shing, Kou Misaki, Han Bao et al.
The Journey Matters: Average Parameter Count over Pre-training Unifies Sparse and Dense Scaling Laws
Tian Jin, Ahmed Imtiaz Humayun, Utku Evci et al.
The Unreasonable Ineffectiveness of the Deeper Layers
Andrey Gromov, Kushal Tirumala, Hassan Shapourian et al.
Toward Adaptive Large Language Models Structured Pruning via Hybrid-grained Weight Importance Assessment
Jun Liu, Zhenglun Kong, Pu Zhao et al.
TSENOR: Highly-Efficient Algorithm for Finding Transposable N:M Sparse Masks
Xiang Meng, Mehdi Makni, Rahul Mazumder
Two is Better than One: Efficient Ensemble Defense for Robust and Compact Models
Yoojin Jung, Byung Cheol Song
Variance-Based Pruning for Accelerating and Compressing Trained Networks
Uranik Berisha, Jens Mehnert, Alexandru Condurache
VLDrive: Vision-Augmented Lightweight MLLMs for Efficient Language-grounded Autonomous Driving
Ruifei Zhang, Wei Zhang, Xiao Tan et al.
VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers
Juncan Deng, Shuaiting Li, Zeyu Wang et al.
What Makes a Good Dataset for Knowledge Distillation?
Logan Frank, Jim Davis
An Empirical Study of CLIP for Text-Based Person Search
Cao Min, Yang Bai, ziyin Zeng et al.
Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs
Yeonhong Park, Jake Hyun, SangLyul Cho et al.
Bayesian Knowledge Distillation: A Bayesian Perspective of Distillation with Uncertainty Quantification
Luyang Fang, Yongkai Chen, Wenxuan Zhong et al.
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Wei Huang, Yangdong Liu, Haotong Qin et al.
Binarized Low-light Raw Video Enhancement
Gengchen Zhang, Yulun Zhang, Xin Yuan et al.
BiPFT: Binary Pre-trained Foundation Transformer with Low-Rank Estimation of Binarization Residual Polynomials
Xingrun Xing, Li Du, Xinyuan Wang et al.
BK-SDM: A Lightweight, Fast, and Cheap Version of Stable Diffusion
Bo-Kyeong Kim, Hyoung-Kyu Song, Thibault Castells et al.
Building Variable-Sized Models via Learngene Pool
Boyu Shi, Shiyu Xia, Xu Yang et al.
CHAI: Clustered Head Attention for Efficient LLM Inference
Saurabh Agarwal, Bilge Acun, Basil Hosmer et al.
CompGS: Smaller and Faster Gaussian Splatting with Vector Quantization
K L Navaneet, Kossar Pourahmadi, Soroush Abbasi Koohpayegani et al.
Compressing Large Language Models by Joint Sparsification and Quantization
Jinyang Guo, Jianyu Wu, Zining Wang et al.
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
Junyuan Hong, Jinhao Duan, Chenhui Zhang et al.
DeepCache: Accelerating Diffusion Models for Free
Xinyin Ma, Gongfan Fang, Xinchao Wang
DFD: Distilling the Feature Disparity Differently for Detectors
Kang Liu, Yingyi Zhang, Jingyun Zhang et al.
Distilling Knowledge from Large-Scale Image Models for Object Detection
Gang Li, Wenhai Wang, Xiang Li et al.
DistiLLM: Towards Streamlined Distillation for Large Language Models
Jongwoo Ko, Sungnyun Kim, Tianyi Chen et al.
Do Topological Characteristics Help in Knowledge Distillation?
Jungeun Kim, Junwon You, Dongjin Lee et al.
DεpS: Delayed ε-Shrinking for Faster Once-For-All Training
Aditya Annavajjala, Alind Khare, Animesh Agrawal et al.
Efficient Multitask Dense Predictor via Binarization
Yuzhang Shang, Dan Xu, Gaowen Liu et al.
Enhanced Sparsification via Stimulative Training
Shengji Tang, Weihao Lin, Hancheng Ye et al.
Enhancing Vision Transformer: Amplifying Non-Linearity in Feedforward Network Module
Yixing Xu, Chao Li, Dong Li et al.
Entropy Induced Pruning Framework for Convolutional Neural Networks
Yiheng Lu, Ziyu Guan, Yaming Yang et al.
EPSD: Early Pruning with Self-Distillation for Efficient Model Compression
Dong Chen, Ning Liu, Yichen Zhu et al.
ERQ: Error Reduction for Post-Training Quantization of Vision Transformers
Yunshan Zhong, Jiawei Hu, You Huang et al.
ExCP: Extreme LLM Checkpoint Compression via Weight-Momentum Joint Shrinking
Wenshuo Li, Xinghao Chen, Han Shu et al.
Exploring Intrinsic Dimension for Vision-Language Model Pruning
Hanzhang Wang, Jiawen Zhang, Qingyuan Ma
Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation
Zhewei Yao, Xiaoxia Wu, Cheng Li et al.
Extreme Compression of Large Language Models via Additive Quantization
Vage Egiazarian, Andrei Panferov, Denis Kuznedelev et al.
FedMef: Towards Memory-efficient Federated Dynamic Pruning
Hong Huang, Weiming Zhuang, Chen Chen et al.
Fixed Point Diffusion Models
Luke Melas-Kyriazi, Xingjian Bai
Flextron: Many-in-One Flexible Large Language Model
Ruisi Cai, Saurav Muralidharan, Greg Heinrich et al.