"parameter efficiency" Papers

37 papers found

As large as it gets – Studying Infinitely Large Convolutions via Neural Implicit Frequency Filters

Margret Keuper, Julia Grabinski, Janis Keuper

ICLR 2025
8
citations

Bayesian Low-Rank Learning (Bella): A Practical Approach to Bayesian Neural Networks

Bao Gia Doan, Afshar Shamsi, Xiao-Yu Guo et al.

AAAI 2025paperarXiv:2407.20891
4
citations

Composing Linear Layers from Irreducibles

Travis Pence, Daisuke Yamada, Vikas Singh

NEURIPS 2025arXiv:2507.11688

DEPT: Decoupled Embeddings for Pre-training Language Models

Alex Iacob, Lorenzo Sani, Meghdad Kurmanji et al.

ICLR 2025arXiv:2410.05021
2
citations

DeRS: Towards Extremely Efficient Upcycled Mixture-of-Experts Models

Yongqi Huang, Peng Ye, Chenyu Huang et al.

CVPR 2025arXiv:2503.01359
6
citations

Discovering Important Experts for Mixture-of-Experts Models Pruning Through a Theoretical Perspective

Weizhong Huang, Yuxin Zhang, Xiawu Zheng et al.

NEURIPS 2025

Drama: Mamba-Enabled Model-Based Reinforcement Learning Is Sample and Parameter Efficient

Wenlong Wang, Ivana Dusparic, Yucheng Shi et al.

ICLR 2025arXiv:2410.08893
3
citations

Efficient Source-Free Time-Series Adaptation via Parameter Subspace Disentanglement

Gaurav Patel, Christopher M. Sandino, Behrooz Mahasseni et al.

ICLR 2025arXiv:2410.02147
6
citations

eQMARL: Entangled Quantum Multi-Agent Reinforcement Learning for Distributed Cooperation over Quantum Channels

Alexander DeRieux, Walid Saad

ICLR 2025arXiv:2405.17486
5
citations

Hankel Singular Value Regularization for Highly Compressible State Space Models

Paul Schwerdtner, Jules Berman, Benjamin Peherstorfer

NEURIPS 2025arXiv:2510.22951
2
citations

Kolmogorov-Arnold Transformer

Xingyi Yang, Xinchao Wang

ICLR 2025arXiv:2409.10594
92
citations

Layerwise Recurrent Router for Mixture-of-Experts

Zihan Qiu, Zeyu Huang, Shuang Cheng et al.

ICLR 2025arXiv:2408.06793
8
citations

Lightweight and Fast Real-time Image Enhancement via Decomposition of the Spatial-aware Lookup Tables

Wontae Kim, Keuntek Lee, Nam Ik Cho

ICCV 2025arXiv:2508.16121

LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation

Fangxun Shu, Yue Liao, Lei Zhang et al.

ICLR 2025arXiv:2408.15881
38
citations

(Mis)Fitting Scaling Laws: A Survey of Scaling Law Fitting Techniques in Deep Learning

Margaret Li, Sneha Kudugunta, Luke Zettlemoyer

ICLR 2025
9
citations

No Need to Talk: Asynchronous Mixture of Language Models

Anastasiia Filippova, Angelos Katharopoulos, David Grangier et al.

ICLR 2025arXiv:2410.03529
3
citations

Optimal Control for Transformer Architectures: Enhancing Generalization, Robustness and Efficiency

Kelvin Kan, Xingjian Li, Benjamin Zhang et al.

NEURIPS 2025arXiv:2505.13499
3
citations

PROL : Rehearsal Free Continual Learning in Streaming Data via Prompt Online Learning

Muhammad Anwar Ma'sum, Mahardhika Pratama, Savitha Ramasamy et al.

ICCV 2025arXiv:2507.12305

Sample- and Parameter-Efficient Auto-Regressive Image Models

Elad Amrani, Leonid Karlinsky, Alex M. Bronstein

CVPR 2025arXiv:2411.15648
2
citations

SD-LoRA: Scalable Decoupled Low-Rank Adaptation for Class Incremental Learning

Yichen Wu, Hongming Piao, Long-Kai Huang et al.

ICLR 2025arXiv:2501.13198
34
citations

SLMRec: Distilling Large Language Models into Small for Sequential Recommendation

Wujiang Xu, Qitian Wu, Zujie Liang et al.

ICLR 2025oralarXiv:2405.17890
18
citations

Sparsity Outperforms Low-Rank Projections in Few-Shot Adaptation

Nairouz Mrabah, Nicolas Richet, Ismail Ayed et al.

ICCV 2025arXiv:2504.12436

Surprising Effectiveness of pretraining Ternary Language Model at Scale

Ayush Kaushal, Tejas Vaidhya, Arnab Mondal et al.

ICLR 2025arXiv:2407.12327
13
citations

TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation

Zonglin Lyu, Chen Chen

ICCV 2025arXiv:2507.04984
3
citations

A Tensor Decomposition Perspective on Second-order RNNs

Maude Lizaire, Michael Rizvi-Martel, Marawan Gamal et al.

ICML 2024spotlightarXiv:2406.05045
2
citations

Compressible Dynamics in Deep Overparameterized Low-Rank Learning & Adaptation

Can Yaras, Peng Wang, Laura Balzano et al.

ICML 2024arXiv:2406.04112
25
citations

Data-free Neural Representation Compression with Riemannian Neural Dynamics

Zhengqi Pei, Anran Zhang, Shuhui Wang et al.

ICML 2024

Efficient Pareto Manifold Learning with Low-Rank Structure

Weiyu CHEN, James Kwok

ICML 2024spotlightarXiv:2407.20734
10
citations

Flora: Low-Rank Adapters Are Secretly Gradient Compressors

Yongchang Hao, Yanshuai Cao, Lili Mou

ICML 2024arXiv:2402.03293
96
citations

Image-adaptive 3D Lookup Tables for Real-time Image Enhancement with Bilateral Grids

Wontae Kim, Nam Ik Cho

ECCV 2024
7
citations

In value-based deep reinforcement learning, a pruned network is a good network

Johan Obando Ceron, Aaron Courville, Pablo Samuel Castro

ICML 2024arXiv:2402.12479
33
citations

KernelWarehouse: Rethinking the Design of Dynamic Convolution

Chao Li, Anbang Yao

ICML 2024arXiv:2406.07879
9
citations

MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions

Kai Zhang, Yi Luan, Hexiang Hu et al.

ICML 2024arXiv:2403.19651
88
citations

OrthCaps: An Orthogonal CapsNet with Sparse Attention Routing and Pruning

Geng Xinyu, Jiaming Wang, Jiawei Gong et al.

CVPR 2024arXiv:2403.13351
10
citations

Progressive Distillation Based on Masked Generation Feature Method for Knowledge Graph Completion

Cunhang Fan, Yujie Chen, Jun Xue et al.

AAAI 2024paperarXiv:2401.12997
5
citations

SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks

Jiwon Song, Kyungseok Oh, Taesu Kim et al.

ICML 2024arXiv:2402.09025
73
citations

Taming the Sigmoid Bottleneck: Provably Argmaxable Sparse Multi-Label Classification

Andreas Grivas, Antonio Vergari, Adam Lopez

AAAI 2024paperarXiv:2310.10443
8
citations