Paper "model compression" Papers

18 papers found

Extracting Interpretable Task-Specific Circuits from Large Language Models for Faster Inference

Jorge García-Carrasco, Alejandro Maté, Juan Trujillo

AAAI 2025paperarXiv:2412.15750
3
citations

Free Lunch in the Forest: Functionally-Identical Pruning of Boosted Tree Ensembles

Youssouf Emine, Alexandre Forel, Idriss Malek et al.

AAAI 2025paperarXiv:2408.16167
2
citations

MeRino: Entropy-Driven Design for Generative Language Models on IoT Devices

Youpeng Zhao, Ming Lin, Huadong Tang et al.

AAAI 2025paperarXiv:2403.07921
1
citations

Numerical Pruning for Efficient Autoregressive Models

Xuan Shen, Zhao Song, Yufa Zhou et al.

AAAI 2025paperarXiv:2412.12441
23
citations

Pruning Large Language Models with Semi-Structural Adaptive Sparse Training

Weiyu Huang, Yuezhou Hu, Guohao Jian et al.

AAAI 2025paperarXiv:2407.20584
21
citations

RILQ: Rank-Insensitive LoRA-Based Quantization Error Compensation for Boosting 2-Bit Large Language Model Accuracy

Geonho Lee, Janghwan Lee, Sukjin Hong et al.

AAAI 2025paperarXiv:2412.01129
5
citations

Toward Adaptive Large Language Models Structured Pruning via Hybrid-grained Weight Importance Assessment

Jun Liu, Zhenglun Kong, Pu Zhao et al.

AAAI 2025paperarXiv:2403.10799
14
citations

VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers

Juncan Deng, Shuaiting Li, Zeyu Wang et al.

AAAI 2025paperarXiv:2408.17131
11
citations

An Empirical Study of CLIP for Text-Based Person Search

Cao Min, Yang Bai, ziyin Zeng et al.

AAAI 2024paperarXiv:2308.10045
98
citations

BiPFT: Binary Pre-trained Foundation Transformer with Low-Rank Estimation of Binarization Residual Polynomials

Xingrun Xing, Li Du, Xinyuan Wang et al.

AAAI 2024paperarXiv:2312.08937
5
citations

Building Variable-Sized Models via Learngene Pool

Boyu Shi, Shiyu Xia, Xu Yang et al.

AAAI 2024paperarXiv:2312.05743
5
citations

Entropy Induced Pruning Framework for Convolutional Neural Networks

Yiheng Lu, Ziyu Guan, Yaming Yang et al.

AAAI 2024paperarXiv:2208.06660
6
citations

EPSD: Early Pruning with Self-Distillation for Efficient Model Compression

Dong Chen, Ning Liu, Yichen Zhu et al.

AAAI 2024paperarXiv:2402.00084
9
citations

Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation

Zhewei Yao, Xiaoxia Wu, Cheng Li et al.

AAAI 2024paperarXiv:2303.08302
71
citations

Fluctuation-Based Adaptive Structured Pruning for Large Language Models

Yongqi An, Xu Zhao, Tao Yu et al.

AAAI 2024paperarXiv:2312.11983
106
citations

Generative Model-Based Feature Knowledge Distillation for Action Recognition

Guiqin Wang, Peng Zhao, Yanjiang Shi et al.

AAAI 2024paperarXiv:2312.08644
7
citations

OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Models

Changhun Lee, Jungyu Jin, Taesu Kim et al.

AAAI 2024paperarXiv:2306.02272
105
citations

Progressive Distillation Based on Masked Generation Feature Method for Knowledge Graph Completion

Cunhang Fan, Yujie Chen, Jun Xue et al.

AAAI 2024paperarXiv:2401.12997
5
citations