"memory efficiency" Papers

19 papers found

Accurate KV Cache Eviction via Anchor Direction Projection for Efficient LLM Inference

Zijie Geng, Jie Wang, Ziqi Liu et al.

NEURIPS 2025

Ask and Remember: A Questions-Only Replay Strategy for Continual Visual Question Answering

Imad Eddine MAROUF, Enzo Tartaglione, Stéphane Lathuilière et al.

ICCV 2025arXiv:2502.04469
1
citations

Building, Reusing, and Generalizing Abstract Representations from Concrete Sequences

Shuchen Wu, Mirko Thalmann, Peter Dayan et al.

ICLR 2025arXiv:2410.21332
1
citations

DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Guangxuan Xiao, Jiaming Tang, Jingwei Zuo et al.

ICLR 2025arXiv:2410.10819
179
citations

GaussianSpa: An “Optimizing-Sparsifying” Simplification Framework for Compact and High-Quality 3D Gaussian Splatting

Yangming Zhang, Wenqi Jia, Wei Niu et al.

CVPR 2025arXiv:2411.06019
12
citations

NestedFP: High-Performance, Memory-Efficient Dual-Precision Floating Point Support for LLMs

Haeun Lee, Omin Kwon, Yeonhong Park et al.

NEURIPS 2025arXiv:2506.02024
1
citations

PolarQuant: Leveraging Polar Transformation for Key Cache Quantization and Decoding Acceleration

Songhao Wu, Ang Lv, xiao feng et al.

NEURIPS 2025

SVDQuant: Absorbing Outliers by Low-Rank Component for 4-Bit Diffusion Models

Muyang Li, Yujun Lin, Zhekai Zhang et al.

ICLR 2025arXiv:2411.05007
98
citations

Tensor Product Attention Is All You Need

Yifan Zhang, Yifeng Liu, Huizhuo Yuan et al.

NEURIPS 2025spotlightarXiv:2501.06425
34
citations

Tiled Flash Linear Attention: More Efficient Linear RNN and xLSTM Kernels

Maximilian Beck, Korbinian Pöppel, Phillip Lippe et al.

NEURIPS 2025arXiv:2503.14376
8
citations

Variational Bayesian Pseudo-Coreset

Hyungi Lee, Seungyoo Lee, Juho Lee

ICLR 2025arXiv:2502.21143

CHAI: Clustered Head Attention for Efficient LLM Inference

Saurabh Agarwal, Bilge Acun, Basil Hosmer et al.

ICML 2024arXiv:2403.08058
13
citations

CoTracker: It is Better to Track Together

Nikita Karaev, Ignacio Rocco, Ben Graham et al.

ECCV 2024arXiv:2307.07635
466
citations

DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs

Donghyun Kim, Byeongho Heo, Dongyoon Han

ECCV 2024arXiv:2403.19588
44
citations

Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference

Piotr Nawrot, Adrian Łańcucki, Marcin Chochowski et al.

ICML 2024arXiv:2403.09636
94
citations

FedMef: Towards Memory-efficient Federated Dynamic Pruning

Hong Huang, Weiming Zhuang, Chen Chen et al.

CVPR 2024arXiv:2403.14737
18
citations

Memory Efficient Neural Processes via Constant Memory Attention Block

Leo Feng, Frederick Tung, Hossein Hajimirsadeghi et al.

ICML 2024arXiv:2305.14567
8
citations

REST: Efficient and Accelerated EEG Seizure Analysis through Residual State Updates

Arshia Afzal, Grigorios Chrysos, Volkan Cevher et al.

ICML 2024oralarXiv:2406.16906
12
citations

Video Super-Resolution Transformer with Masked Inter&Intra-Frame Attention

Xingyu Zhou, Leheng Zhang, Xiaorui Zhao et al.

CVPR 2024arXiv:2401.06312
34
citations