Poster "model compression" Papers

104 papers found • Page 2 of 3

The Journey Matters: Average Parameter Count over Pre-training Unifies Sparse and Dense Scaling Laws

Tian Jin, Ahmed Imtiaz Humayun, Utku Evci et al.

ICLR 2025arXiv:2501.12486
1
citations

The Unreasonable Ineffectiveness of the Deeper Layers

Andrey Gromov, Kushal Tirumala, Hassan Shapourian et al.

ICLR 2025arXiv:2403.17887
172
citations

TSENOR: Highly-Efficient Algorithm for Finding Transposable N:M Sparse Masks

Xiang Meng, Mehdi Makni, Rahul Mazumder

NEURIPS 2025

Two is Better than One: Efficient Ensemble Defense for Robust and Compact Models

Yoojin Jung, Byung Cheol Song

CVPR 2025arXiv:2504.04747
1
citations

Variance-Based Pruning for Accelerating and Compressing Trained Networks

Uranik Berisha, Jens Mehnert, Alexandru Condurache

ICCV 2025arXiv:2507.12988
1
citations

VLDrive: Vision-Augmented Lightweight MLLMs for Efficient Language-grounded Autonomous Driving

Ruifei Zhang, Wei Zhang, Xiao Tan et al.

ICCV 2025arXiv:2511.06256
5
citations

What Makes a Good Dataset for Knowledge Distillation?

Logan Frank, Jim Davis

CVPR 2025arXiv:2411.12817
4
citations

Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs

Yeonhong Park, Jake Hyun, SangLyul Cho et al.

ICML 2024arXiv:2402.10517
43
citations

Bayesian Knowledge Distillation: A Bayesian Perspective of Distillation with Uncertainty Quantification

Luyang Fang, Yongkai Chen, Wenxuan Zhong et al.

ICML 2024

BiLLM: Pushing the Limit of Post-Training Quantization for LLMs

Wei Huang, Yangdong Liu, Haotong Qin et al.

ICML 2024arXiv:2402.04291
142
citations

Binarized Low-light Raw Video Enhancement

Gengchen Zhang, Yulun Zhang, Xin Yuan et al.

CVPR 2024arXiv:2403.19944
16
citations

BK-SDM: A Lightweight, Fast, and Cheap Version of Stable Diffusion

Bo-Kyeong Kim, Hyoung-Kyu Song, Thibault Castells et al.

ECCV 2024arXiv:2305.15798
9
citations

CHAI: Clustered Head Attention for Efficient LLM Inference

Saurabh Agarwal, Bilge Acun, Basil Hosmer et al.

ICML 2024arXiv:2403.08058
13
citations

CompGS: Smaller and Faster Gaussian Splatting with Vector Quantization

K L Navaneet, Kossar Pourahmadi, Soroush Abbasi Koohpayegani et al.

ECCV 2024arXiv:2311.18159
115
citations

Compressing Large Language Models by Joint Sparsification and Quantization

Jinyang Guo, Jianyu Wu, Zining Wang et al.

ICML 2024

Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

Junyuan Hong, Jinhao Duan, Chenhui Zhang et al.

ICML 2024arXiv:2403.15447
49
citations

DeepCache: Accelerating Diffusion Models for Free

Xinyin Ma, Gongfan Fang, Xinchao Wang

CVPR 2024arXiv:2312.00858
279
citations

DFD: Distilling the Feature Disparity Differently for Detectors

Kang Liu, Yingyi Zhang, Jingyun Zhang et al.

ICML 2024

Distilling Knowledge from Large-Scale Image Models for Object Detection

Gang Li, Wenhai Wang, Xiang Li et al.

ECCV 2024
4
citations

DistiLLM: Towards Streamlined Distillation for Large Language Models

Jongwoo Ko, Sungnyun Kim, Tianyi Chen et al.

ICML 2024arXiv:2402.03898
73
citations

Do Topological Characteristics Help in Knowledge Distillation?

Jungeun Kim, Junwon You, Dongjin Lee et al.

ICML 2024

DεpS: Delayed ε-Shrinking for Faster Once-For-All Training

Aditya Annavajjala, Alind Khare, Animesh Agrawal et al.

ECCV 2024arXiv:2407.06167
1
citations

Efficient Multitask Dense Predictor via Binarization

Yuzhang Shang, Dan Xu, Gaowen Liu et al.

CVPR 2024arXiv:2405.14136
6
citations

Enhanced Sparsification via Stimulative Training

Shengji Tang, Weihao Lin, Hancheng Ye et al.

ECCV 2024arXiv:2403.06417
2
citations

Enhancing Vision Transformer: Amplifying Non-Linearity in Feedforward Network Module

Yixing Xu, Chao Li, Dong Li et al.

ICML 2024

ExCP: Extreme LLM Checkpoint Compression via Weight-Momentum Joint Shrinking

Wenshuo Li, Xinghao Chen, Han Shu et al.

ICML 2024arXiv:2406.11257
9
citations

Exploring Intrinsic Dimension for Vision-Language Model Pruning

Hanzhang Wang, Jiawen Zhang, Qingyuan Ma

ICML 2024

Extreme Compression of Large Language Models via Additive Quantization

Vage Egiazarian, Andrei Panferov, Denis Kuznedelev et al.

ICML 2024arXiv:2401.06118
160
citations

FedMef: Towards Memory-efficient Federated Dynamic Pruning

Hong Huang, Weiming Zhuang, Chen Chen et al.

CVPR 2024arXiv:2403.14737
18
citations

Fixed Point Diffusion Models

Luke Melas-Kyriazi, Xingjian Bai

CVPR 2024arXiv:2401.08741
6
citations

Flextron: Many-in-One Flexible Large Language Model

Ruisi Cai, Saurav Muralidharan, Greg Heinrich et al.

ICML 2024arXiv:2406.10260
34
citations

FrameQuant: Flexible Low-Bit Quantization for Transformers

Harshavardhan Adepu, Zhanpeng Zeng, Li Zhang et al.

ICML 2024arXiv:2403.06082
16
citations

Good Teachers Explain: Explanation-Enhanced Knowledge Distillation

Amin Parchami, Moritz Böhle, Sukrut Rao et al.

ECCV 2024arXiv:2402.03119
19
citations

How Far Can We Compress Instant-NGP-Based NeRF?

Yihang Chen, Qianyi Wu, Mehrtash Harandi et al.

CVPR 2024arXiv:2406.04101
34
citations

Instance-Aware Group Quantization for Vision Transformers

Jaehyeon Moon, Dohyung Kim, Jun Yong Cheon et al.

CVPR 2024arXiv:2404.00928
15
citations

Junk DNA Hypothesis: Pruning Small Pre-Trained Weights $\textit{Irreversibly}$ and $\textit{Monotonically}$ Impairs ``Difficult" Downstream Tasks in LLMs

Lu Yin, Ajay Jaiswal, Shiwei Liu et al.

ICML 2024

KernelWarehouse: Rethinking the Design of Dynamic Convolution

Chao Li, Anbang Yao

ICML 2024arXiv:2406.07879
9
citations

Lightweight Image Super-Resolution via Flexible Meta Pruning

Yulun Zhang, Kai Zhang, Luc Van Gool et al.

ICML 2024

Localizing Task Information for Improved Model Merging and Compression

Ke Wang, Nikolaos Dimitriadis, Guillermo Ortiz-Jimenez et al.

ICML 2024arXiv:2405.07813
92
citations

MoEAD: A Parameter-efficient Model for Multi-class Anomaly Detection

Shiyuan Meng, Wenchao Meng, Qihang Zhou et al.

ECCV 2024
14
citations

Neural Metamorphosis

Xingyi Yang, Xinchao Wang

ECCV 2024arXiv:2410.11878
5
citations

On the Road to Portability: Compressing End-to-End Motion Planner for Autonomous Driving

Kaituo Feng, Changsheng Li, Dongchun Ren et al.

CVPR 2024arXiv:2403.01238
15
citations

PELA: Learning Parameter-Efficient Models with Low-Rank Approximation

Yangyang Guo, Guangzhi Wang, Mohan Kankanhalli

CVPR 2024arXiv:2310.10700
10
citations

Pruner-Zero: Evolving Symbolic Pruning Metric From Scratch for Large Language Models

Peijie Dong, Lujun Li, Zhenheng Tang et al.

ICML 2024arXiv:2406.02924
54
citations

PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation

Yizhe Xiong, Hui Chen, Tianxiang Hao et al.

ECCV 2024arXiv:2403.09192
26
citations

Rethinking Optimization and Architecture for Tiny Language Models

Yehui Tang, Kai Han, Fangcheng Liu et al.

ICML 2024

Reweighted Solutions for Weighted Low Rank Approximation

David Woodruff, Taisuke Yasuda

ICML 2024arXiv:2406.02431
2
citations

SAGS: Structure-Aware 3D Gaussian Splatting

Evangelos Ververas, Rolandos Alexandros Potamias, Song Jifei et al.

ECCV 2024arXiv:2404.19149
21
citations

SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks

Jiwon Song, Kyungseok Oh, Taesu Kim et al.

ICML 2024arXiv:2402.09025
73
citations

SNP: Structured Neuron-level Pruning to Preserve Attention Scores

Kyunghwan Shim, Jaewoong Yun, Shinkook Choi

ECCV 2024arXiv:2404.11630
3
citations