Poster "transformer architectures" Papers

25 papers found

Adaptive Rank Allocation: Speeding Up Modern Transformers with RaNA Adapters

Roberto Garcia, Jerry Liu, Daniel Sorvisto et al.

ICLR 2025arXiv:2503.18216
1
citations

Architectural and Inferential Inductive Biases for Exchangeable Sequence Modeling

Daksh Mittal, Leon Li, Thomson Yen et al.

NEURIPS 2025arXiv:2503.01215
1
citations

ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding

Guangda Ji, Silvan Weder, Francis Engelmann et al.

CVPR 2025arXiv:2410.13924
6
citations

Attention on the Sphere

Boris Bonev, Max Rietmann, Andrea Paris et al.

NEURIPS 2025arXiv:2505.11157
1
citations

DiC: Rethinking Conv3x3 Designs in Diffusion Models

Yuchuan Tian, Jing Han, Chengcheng Wang et al.

CVPR 2025arXiv:2501.00603
8
citations

Disentangling Representations through Multi-task Learning

Pantelis Vafidis, Aman Bhargava, Antonio Rangel

ICLR 2025arXiv:2407.11249
5
citations

Do ImageNet-trained Models Learn Shortcuts? The Impact of Frequency Shortcuts on Generalization

Shunxin Wang, Raymond Veldhuis, Nicola Strisciuglio

CVPR 2025arXiv:2503.03519
2
citations

EUGens: Efficient, Unified and General Dense Layers

Sang Min Kim, Byeongchan Kim, Arijit Sehanobish et al.

NEURIPS 2025arXiv:2601.22563
1
citations

Learning in Compact Spaces with Approximately Normalized Transformer

Jörg Franke, Urs Spiegelhalter, Marianna Nezhurina et al.

NEURIPS 2025arXiv:2505.22014
1
citations

Let Me Grok for You: Accelerating Grokking via Embedding Transfer from a Weaker Model

Zhiwei Xu, Zhiyu Ni, Yixin Wang et al.

ICLR 2025arXiv:2504.13292
6
citations

Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory

Nikola Zubic, Federico Soldà, Aurelio Sulser et al.

ICLR 2025arXiv:2405.16674
18
citations

L-SWAG: Layer-Sample Wise Activation with Gradients Information for Zero-Shot NAS on Vision Transformers

Sofia Casarin, Sergio Escalera, Oswald Lanz

CVPR 2025arXiv:2505.07300
2
citations

Optimal Brain Apoptosis

Mingyuan Sun, Zheng Fang, Jiaxu Wang et al.

ICLR 2025arXiv:2502.17941
3
citations

Q3R: Quadratic Reweighted Rank Regularizer for Effective Low-Rank Training

Ipsita Ghosh, Ethan Nguyen, Christian Kümmerle

NEURIPS 2025arXiv:2511.04485

Streamlining Prediction in Bayesian Deep Learning

Rui Li, Marcus Klasson, Arno Solin et al.

ICLR 2025arXiv:2411.18425
5
citations

TabM: Advancing tabular deep learning with parameter-efficient ensembling

Yury Gorishniy, Akim Kotelnikov, Artem Babenko

ICLR 2025arXiv:2410.24210
56
citations

Unsupervised Meta-Learning via In-Context Learning

Anna Vettoruzzo, Lorenzo Braccaioli, Joaquin Vanschoren et al.

ICLR 2025arXiv:2405.16124
3
citations

All-in-one simulation-based inference

Manuel Gloeckler, Michael Deistler, Christian Weilbach et al.

ICML 2024arXiv:2404.09636
62
citations

Controllable Prompt Tuning For Balancing Group Distributional Robustness

Hoang Phan, Andrew Wilson, Qi Lei

ICML 2024arXiv:2403.02695
12
citations

Improving Token-Based World Models with Parallel Observation Prediction

Lior Cohen, Kaixin Wang, Bingyi Kang et al.

ICML 2024arXiv:2402.05643
12
citations

Loss Shaping Constraints for Long-Term Time Series Forecasting

Ignacio Hounie, Javier Porras-Valenzuela, Alejandro Ribeiro

ICML 2024arXiv:2402.09373
3
citations

Operational Open-Set Recognition and PostMax Refinement

Steve Cruz, Ryan Rabinowitz, Manuel Günther et al.

ECCV 2024
6
citations

Outlier-aware Slicing for Post-Training Quantization in Vision Transformer

Yuexiao Ma, Huixia Li, Xiawu Zheng et al.

ICML 2024

Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation

Bingxin Ke, Anton Obukhov, Shengyu Huang et al.

CVPR 2024arXiv:2312.02145
332
citations

Towards Interpretable Deep Local Learning with Successive Gradient Reconciliation

Yibo Yang, Xiaojie Li, Motasem Alfarra et al.

ICML 2024arXiv:2406.05222
7
citations