"model compression" Papers
128 papers found • Page 1 of 3
Conference
Active Data Curation Effectively Distills Large-Scale Multimodal Models
Vishaal Udandarao, Nikhil Parthasarathy, Muhammad Ferjad Naeem et al.
AugKD: Ingenious Augmentations Empower Knowledge Distillation for Image Super-Resolution
Yun Zhang, Wei Li, Simiao Li et al.
Balanced Token Pruning: Accelerating Vision Language Models Beyond Local Optimization
kaiyuan Li, Xiaoyue Chen, Chen Gao et al.
Basis Sharing: Cross-Layer Parameter Sharing for Large Language Model Compression
Jingcun Wang, Yu-Guang Chen, Ing-Chao Lin et al.
Bi-Directional Communication-Efficient Stochastic FL via Remote Source Generation
Maximilian Egger, Rawad Bitar, Antonia Wachter-Zeh et al.
BillBoard Splatting (BBSplat): Learnable Textured Primitives for Novel View Synthesis
David Svitov, Pietro Morerio, Lourdes Agapito et al.
CASP: Compression of Large Multimodal Models Based on Attention Sparsity
Mohsen Gholami, Mohammad Akbari, Kevin Cannons et al.
Composable Interventions for Language Models
Arinbjörn Kolbeinsson, Kyle O'Brien, Tianjin Huang et al.
Computation and Memory-Efficient Model Compression with Gradient Reweighting
Zhiwei Li, Yuesen Liao, Binrui Wu et al.
DeRS: Towards Extremely Efficient Upcycled Mixture-of-Experts Models
Yongqi Huang, Peng Ye, Chenyu Huang et al.
DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture
Qianlong Xiang, Miao Zhang, Yuzhang Shang et al.
DLP: Dynamic Layerwise Pruning in Large Language Models
Yuli Chen, Bo Cheng, Jiale Han et al.
DuoGPT: Training-free Dual Sparsity through Activation-aware Pruning in LLMs
Ruokai Yin, Yuhang Li, Donghyun Lee et al.
Dynamical Low-Rank Compression of Neural Networks with Robustness under Adversarial Attacks
Steffen Schotthöfer, Lexie Yang, Stefan Schnake
EdgeTAM: On-Device Track Anything Model
Chong Zhou, Chenchen Zhu, Yunyang Xiong et al.
Effective Interplay between Sparsity and Quantization: From Theory to Practice
Simla Harma, Ayan Chakraborty, Elizaveta Kostenok et al.
EMLoC: Emulator-based Memory-efficient Fine-tuning with LoRA Correction
Hsi-Che Lin, Yu-Chu Yu, Kai-Po Chang et al.
Extracting Interpretable Task-Specific Circuits from Large Language Models for Faster Inference
Jorge García-Carrasco, Alejandro Maté, Juan Trujillo
Fast Feedforward 3D Gaussian Splatting Compression
Yihang Chen, Qianyi Wu, Mengyao Li et al.
FedWSQ: Efficient Federated Learning with Weight Standardization and Distribution-Aware Non-Uniform Quantization
Seung-Wook Kim, Seongyeol Kim, Jiah Kim et al.
FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation
Zhuguanyu Wu, Shihe Wang, Jiayi Zhang et al.
FlexGS: Train Once, Deploy Everywhere with Many-in-One Flexible 3D Gaussian Splatting
Hengyu Liu, Yuehao Wang, Chenxin Li et al.
Free Lunch in the Forest: Functionally-Identical Pruning of Boosted Tree Ensembles
Youssouf Emine, Alexandre Forel, Idriss Malek et al.
Global Minimizers of $\ell^p$-Regularized Objectives Yield the Sparsest ReLU Neural Networks
Julia Nakhleh, Robert Nowak
Hankel Singular Value Regularization for Highly Compressible State Space Models
Paul Schwerdtner, Jules Berman, Benjamin Peherstorfer
HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models
Seanie Lee, Haebin Seong, Dong Bok Lee et al.
Indirect Gradient Matching for Adversarial Robust Distillation
Hongsin Lee, Seungju Cho, Changick Kim
Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search
Yuxian Gu, Qinghao Hu, Haocheng Xi et al.
Knowledge Distillation with Multi-granularity Mixture of Priors for Image Super-Resolution
Simiao Li, Yun Zhang, Wei Li et al.
Layer as Puzzle Pieces: Compressing Large Language Models through Layer Concatenation
Fei Wang, Li Shen, Liang Ding et al.
Layer-wise Update Aggregation with Recycling for Communication-Efficient Federated Learning
Jisoo Kim, Sungmin Kang, Sunwoo Lee
LLaMaFlex: Many-in-one LLMs via Generalized Pruning and Weight Sharing
Ruisi Cai, Saurav Muralidharan, Hongxu Yin et al.
LLaVA-KD: A Framework of Distilling Multimodal Large Language Models
Yuxuan Cai, Jiangning Zhang, Haoyang He et al.
MDP: Multidimensional Vision Model Pruning with Latency Constraint
Xinglong Sun, Barath Lakshmanan, Maying Shen et al.
MeRino: Entropy-Driven Design for Generative Language Models on IoT Devices
Youpeng Zhao, Ming Lin, Huadong Tang et al.
Mixture Compressor for Mixture-of-Experts LLMs Gains More
Wei Huang, Yue Liao, Jianhui Liu et al.
MODEL SHAPLEY: Find Your Ideal Parameter Player via One Gradient Backpropagation
Chu Xu, Xinke Jiang, Rihong Qiu et al.
MosaicDiff: Training-free Structural Pruning for Diffusion Model Acceleration Reflecting Pretraining Dynamics
Bowei Guo, Shengkun Tang, Cong Zeng et al.
Numerical Pruning for Efficient Autoregressive Models
Xuan Shen, Zhao Song, Yufa Zhou et al.
One-Shot Knowledge Transfer for Scalable Person Re-Identification
Longhua Li, Lei Qi, Xin Geng
On LLM Knowledge Distillation - A Comparison between Forward KL and Reverse KL
Yihan Cao, Yanbin Kang
Optimal Brain Apoptosis
Mingyuan Sun, Zheng Fang, Jiaxu Wang et al.
Optimization with Access to Auxiliary Information
EL MAHDI CHAYTI, Sai Karimireddy
PALMBENCH: A COMPREHENSIVE BENCHMARK OF COMPRESSED LARGE LANGUAGE MODELS ON MOBILE PLATFORMS
Yilong Li, Jingyu Liu, Hao Zhang et al.
PassionSR: Post-Training Quantization with Adaptive Scale in One-Step Diffusion based Image Super-Resolution
Zhu Li Bo, Jianze Li, Haotong Qin et al.
PLD: A Choice-Theoretic List-Wise Knowledge Distillation
Ejafa Bassam, Dawei Zhu, Kaigui Bian
Pruning Large Language Models with Semi-Structural Adaptive Sparse Training
Weiyu Huang, Yuezhou Hu, Guohao Jian et al.
PUP 3D-GS: Principled Uncertainty Pruning for 3D Gaussian Splatting
Alex Hanson, Allen Tu, Vasu Singla et al.
Quantization Error Propagation: Revisiting Layer-Wise Post-Training Quantization
Yamato Arai, Yuma Ichikawa
Quantization without Tears
Minghao Fu, Hao Yu, Jie Shao et al.