Razvan Pascanu

Affiliations

Google DeepMind

papers

966

total citations

papers (20)

Universality of Linear Recurrences Followed by Non-linear Projections: Finite-Width Guarantees and Benefits of Complex Eigenvalues

ICML 2024arXiv

citations

The Tunnel Effect: Building Data Representations in Deep Neural Networks

NEURIPS 2023arXiv

citations

Learning to Modulate pre-trained Models in RL

NEURIPS 2023arXiv

citations

Fine-tuning Reinforcement Learning Models is Secretly a Forgetting Mitigation Problem

ICML 2024arXiv

citations

How do language models learn facts? Dynamics, curricula and hallucinations

COLM 2025

citations

A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks

ICML 2025arXiv

citations

On the Role of Optimization in Double Descent: A Least Squares Study

NEURIPS 2021arXiv

citations

Attention as a Hypernetwork

ICLR 2025arXiv

citations

MS-SSM: A Multi-Scale State Space Model for Efficient Sequence Modeling

COLM 2025arXiv

citations

Disentangling Transfer in Continual Reinforcement Learning

NEURIPS 2022

citations

RAT: Bridging RNN Efficiency and Attention Accuracy via Chunk-based Sequence Modeling

NEURIPS 2025arXiv

citations

Meta-learning how to Share Credit among Macro-Actions

NEURIPS 2025arXiv

citations

Razvan Pascanu

Affiliations

papers (20)

Understanding the Role of Training Regimes in Continual Learning

Continual World: A Robotic Benchmark For Continual Reinforcement Learning

Top-KAST: Top-K Always Sparse Training

Pointer Graph Networks

Deep Reinforcement Learning with Plasticity Injection

Why do LLMs attend to the first token?

Powerpropagation: A sparsity inducing weight reparameterisation

Improving fine-grained understanding in image-text pre-training

Universality of Linear Recurrences Followed by Non-linear Projections: Finite-Width Guarantees and Benefits of Complex Eigenvalues

The Tunnel Effect: Building Data Representations in Deep Neural Networks

Learning to Modulate pre-trained Models in RL

Fine-tuning Reinforcement Learning Models is Secretly a Forgetting Mitigation Problem

How do language models learn facts? Dynamics, curricula and hallucinations

A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks

On the Role of Optimization in Double Descent: A Least Squares Study

Attention as a Hypernetwork

MS-SSM: A Multi-Scale State Space Model for Efficient Sequence Modeling

Disentangling Transfer in Continual Reinforcement Learning

RAT: Bridging RNN Efficiency and Attention Accuracy via Chunk-based Sequence Modeling

Meta-learning how to Share Credit among Macro-Actions

papers (20)

Understanding the Role of Training Regimes in Continual Learning

Continual World: A Robotic Benchmark For Continual Reinforcement Learning

Top-KAST: Top-K Always Sparse Training

Pointer Graph Networks

Deep Reinforcement Learning with Plasticity Injection

Why do LLMs attend to the first token?

Powerpropagation: A sparsity inducing weight reparameterisation

Improving fine-grained understanding in image-text pre-training

Universality of Linear Recurrences Followed by Non-linear Projections: Finite-Width Guarantees and Benefits of Complex Eigenvalues

The Tunnel Effect: Building Data Representations in Deep Neural Networks

Learning to Modulate pre-trained Models in RL

Fine-tuning Reinforcement Learning Models is Secretly a Forgetting Mitigation Problem

How do language models learn facts? Dynamics, curricula and hallucinations

A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks

On the Role of Optimization in Double Descent: A Least Squares Study

Attention as a Hypernetwork

MS-SSM: A Multi-Scale State Space Model for Efficient Sequence Modeling

Disentangling Transfer in Continual Reinforcement Learning

RAT: Bridging RNN Efficiency and Attention Accuracy via Chunk-based Sequence Modeling

Meta-learning how to Share Credit among Macro-Actions