Paper "model compression" Papers
18 papers found
Conference
Extracting Interpretable Task-Specific Circuits from Large Language Models for Faster Inference
Jorge García-Carrasco, Alejandro Maté, Juan Trujillo
Free Lunch in the Forest: Functionally-Identical Pruning of Boosted Tree Ensembles
Youssouf Emine, Alexandre Forel, Idriss Malek et al.
MeRino: Entropy-Driven Design for Generative Language Models on IoT Devices
Youpeng Zhao, Ming Lin, Huadong Tang et al.
Numerical Pruning for Efficient Autoregressive Models
Xuan Shen, Zhao Song, Yufa Zhou et al.
Pruning Large Language Models with Semi-Structural Adaptive Sparse Training
Weiyu Huang, Yuezhou Hu, Guohao Jian et al.
RILQ: Rank-Insensitive LoRA-Based Quantization Error Compensation for Boosting 2-Bit Large Language Model Accuracy
Geonho Lee, Janghwan Lee, Sukjin Hong et al.
Toward Adaptive Large Language Models Structured Pruning via Hybrid-grained Weight Importance Assessment
Jun Liu, Zhenglun Kong, Pu Zhao et al.
VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers
Juncan Deng, Shuaiting Li, Zeyu Wang et al.
An Empirical Study of CLIP for Text-Based Person Search
Cao Min, Yang Bai, ziyin Zeng et al.
BiPFT: Binary Pre-trained Foundation Transformer with Low-Rank Estimation of Binarization Residual Polynomials
Xingrun Xing, Li Du, Xinyuan Wang et al.
Building Variable-Sized Models via Learngene Pool
Boyu Shi, Shiyu Xia, Xu Yang et al.
Entropy Induced Pruning Framework for Convolutional Neural Networks
Yiheng Lu, Ziyu Guan, Yaming Yang et al.
EPSD: Early Pruning with Self-Distillation for Efficient Model Compression
Dong Chen, Ning Liu, Yichen Zhu et al.
Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation
Zhewei Yao, Xiaoxia Wu, Cheng Li et al.
Fluctuation-Based Adaptive Structured Pruning for Large Language Models
Yongqi An, Xu Zhao, Tao Yu et al.
Generative Model-Based Feature Knowledge Distillation for Action Recognition
Guiqin Wang, Peng Zhao, Yanjiang Shi et al.
OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Models
Changhun Lee, Jungyu Jin, Taesu Kim et al.
Progressive Distillation Based on Masked Generation Feature Method for Knowledge Graph Completion
Cunhang Fan, Yujie Chen, Jun Xue et al.