"model compression" Papers
128 papers found • Page 3 of 3
Conference
Fluctuation-Based Adaptive Structured Pruning for Large Language Models
Yongqi An, Xu Zhao, Tao Yu et al.
FrameQuant: Flexible Low-Bit Quantization for Transformers
Harshavardhan Adepu, Zhanpeng Zeng, Li Zhang et al.
Generative Model-Based Feature Knowledge Distillation for Action Recognition
Guiqin Wang, Peng Zhao, Yanjiang Shi et al.
Good Teachers Explain: Explanation-Enhanced Knowledge Distillation
Amin Parchami, Moritz Böhle, Sukrut Rao et al.
How Far Can We Compress Instant-NGP-Based NeRF?
Yihang Chen, Qianyi Wu, Mehrtash Harandi et al.
Instance-Aware Group Quantization for Vision Transformers
Jaehyeon Moon, Dohyung Kim, Jun Yong Cheon et al.
Junk DNA Hypothesis: Pruning Small Pre-Trained Weights $\textit{Irreversibly}$ and $\textit{Monotonically}$ Impairs ``Difficult" Downstream Tasks in LLMs
Lu Yin, Ajay Jaiswal, Shiwei Liu et al.
KernelWarehouse: Rethinking the Design of Dynamic Convolution
Chao Li, Anbang Yao
Lightweight Image Super-Resolution via Flexible Meta Pruning
Yulun Zhang, Kai Zhang, Luc Van Gool et al.
Localizing Task Information for Improved Model Merging and Compression
Ke Wang, Nikolaos Dimitriadis, Guillermo Ortiz-Jimenez et al.
LORS: Low-rank Residual Structure for Parameter-Efficient Network Stacking
Jialin Li, Qiang Nie, Weifu Fu et al.
MoEAD: A Parameter-efficient Model for Multi-class Anomaly Detection
Shiyuan Meng, Wenchao Meng, Qihang Zhou et al.
Neural Metamorphosis
Xingyi Yang, Xinchao Wang
On the Road to Portability: Compressing End-to-End Motion Planner for Autonomous Driving
Kaituo Feng, Changsheng Li, Dongchun Ren et al.
OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Models
Changhun Lee, Jungyu Jin, Taesu Kim et al.
PELA: Learning Parameter-Efficient Models with Low-Rank Approximation
Yangyang Guo, Guangzhi Wang, Mohan Kankanhalli
Progressive Distillation Based on Masked Generation Feature Method for Knowledge Graph Completion
Cunhang Fan, Yujie Chen, Jun Xue et al.
Pruner-Zero: Evolving Symbolic Pruning Metric From Scratch for Large Language Models
Peijie Dong, Lujun Li, Zhenheng Tang et al.
PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation
Yizhe Xiong, Hui Chen, Tianxiang Hao et al.
Rethinking Optimization and Architecture for Tiny Language Models
Yehui Tang, Kai Han, Fangcheng Liu et al.
Reweighted Solutions for Weighted Low Rank Approximation
David Woodruff, Taisuke Yasuda
SAGS: Structure-Aware 3D Gaussian Splatting
Evangelos Ververas, Rolandos Alexandros Potamias, Song Jifei et al.
SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks
Jiwon Song, Kyungseok Oh, Taesu Kim et al.
SNP: Structured Neuron-level Pruning to Preserve Attention Scores
Kyunghwan Shim, Jaewoong Yun, Shinkook Choi
Soft Prompt Recovers Compressed LLMs, Transferably
Zhaozhuo Xu, Zirui Liu, Beidi Chen et al.
Towards efficient deep spiking neural networks construction with spiking activity based pruning
Yaxin Li, Qi Xu, Jiangrong Shen et al.
Transferable and Principled Efficiency for Open-Vocabulary Segmentation
Jingxuan Xu, Wuyang Chen, Yao Zhao et al.
Transferring Knowledge From Large Foundation Models to Small Downstream Models
Shikai Qiu, Boran Han, Danielle Robinson et al.