"parameter efficiency" Papers
37 papers found
Conference
As large as it gets – Studying Infinitely Large Convolutions via Neural Implicit Frequency Filters
Margret Keuper, Julia Grabinski, Janis Keuper
Bayesian Low-Rank Learning (Bella): A Practical Approach to Bayesian Neural Networks
Bao Gia Doan, Afshar Shamsi, Xiao-Yu Guo et al.
Composing Linear Layers from Irreducibles
Travis Pence, Daisuke Yamada, Vikas Singh
DEPT: Decoupled Embeddings for Pre-training Language Models
Alex Iacob, Lorenzo Sani, Meghdad Kurmanji et al.
DeRS: Towards Extremely Efficient Upcycled Mixture-of-Experts Models
Yongqi Huang, Peng Ye, Chenyu Huang et al.
Discovering Important Experts for Mixture-of-Experts Models Pruning Through a Theoretical Perspective
Weizhong Huang, Yuxin Zhang, Xiawu Zheng et al.
Drama: Mamba-Enabled Model-Based Reinforcement Learning Is Sample and Parameter Efficient
Wenlong Wang, Ivana Dusparic, Yucheng Shi et al.
Efficient Source-Free Time-Series Adaptation via Parameter Subspace Disentanglement
Gaurav Patel, Christopher M. Sandino, Behrooz Mahasseni et al.
eQMARL: Entangled Quantum Multi-Agent Reinforcement Learning for Distributed Cooperation over Quantum Channels
Alexander DeRieux, Walid Saad
Hankel Singular Value Regularization for Highly Compressible State Space Models
Paul Schwerdtner, Jules Berman, Benjamin Peherstorfer
Kolmogorov-Arnold Transformer
Xingyi Yang, Xinchao Wang
Layerwise Recurrent Router for Mixture-of-Experts
Zihan Qiu, Zeyu Huang, Shuang Cheng et al.
Lightweight and Fast Real-time Image Enhancement via Decomposition of the Spatial-aware Lookup Tables
Wontae Kim, Keuntek Lee, Nam Ik Cho
LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation
Fangxun Shu, Yue Liao, Lei Zhang et al.
(Mis)Fitting Scaling Laws: A Survey of Scaling Law Fitting Techniques in Deep Learning
Margaret Li, Sneha Kudugunta, Luke Zettlemoyer
No Need to Talk: Asynchronous Mixture of Language Models
Anastasiia Filippova, Angelos Katharopoulos, David Grangier et al.
Optimal Control for Transformer Architectures: Enhancing Generalization, Robustness and Efficiency
Kelvin Kan, Xingjian Li, Benjamin Zhang et al.
PROL : Rehearsal Free Continual Learning in Streaming Data via Prompt Online Learning
Muhammad Anwar Ma'sum, Mahardhika Pratama, Savitha Ramasamy et al.
Sample- and Parameter-Efficient Auto-Regressive Image Models
Elad Amrani, Leonid Karlinsky, Alex M. Bronstein
SD-LoRA: Scalable Decoupled Low-Rank Adaptation for Class Incremental Learning
Yichen Wu, Hongming Piao, Long-Kai Huang et al.
SLMRec: Distilling Large Language Models into Small for Sequential Recommendation
Wujiang Xu, Qitian Wu, Zujie Liang et al.
Sparsity Outperforms Low-Rank Projections in Few-Shot Adaptation
Nairouz Mrabah, Nicolas Richet, Ismail Ayed et al.
Surprising Effectiveness of pretraining Ternary Language Model at Scale
Ayush Kaushal, Tejas Vaidhya, Arnab Mondal et al.
TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation
Zonglin Lyu, Chen Chen
A Tensor Decomposition Perspective on Second-order RNNs
Maude Lizaire, Michael Rizvi-Martel, Marawan Gamal et al.
Compressible Dynamics in Deep Overparameterized Low-Rank Learning & Adaptation
Can Yaras, Peng Wang, Laura Balzano et al.
Data-free Neural Representation Compression with Riemannian Neural Dynamics
Zhengqi Pei, Anran Zhang, Shuhui Wang et al.
Efficient Pareto Manifold Learning with Low-Rank Structure
Weiyu CHEN, James Kwok
Flora: Low-Rank Adapters Are Secretly Gradient Compressors
Yongchang Hao, Yanshuai Cao, Lili Mou
Image-adaptive 3D Lookup Tables for Real-time Image Enhancement with Bilateral Grids
Wontae Kim, Nam Ik Cho
In value-based deep reinforcement learning, a pruned network is a good network
Johan Obando Ceron, Aaron Courville, Pablo Samuel Castro
KernelWarehouse: Rethinking the Design of Dynamic Convolution
Chao Li, Anbang Yao
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions
Kai Zhang, Yi Luan, Hexiang Hu et al.
OrthCaps: An Orthogonal CapsNet with Sparse Attention Routing and Pruning
Geng Xinyu, Jiaming Wang, Jiawei Gong et al.
Progressive Distillation Based on Masked Generation Feature Method for Knowledge Graph Completion
Cunhang Fan, Yujie Chen, Jun Xue et al.
SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks
Jiwon Song, Kyungseok Oh, Taesu Kim et al.
Taming the Sigmoid Bottleneck: Provably Argmaxable Sparse Multi-Label Classification
Andreas Grivas, Antonio Vergari, Adam Lopez