Weizhu Chen

papers

1,083

total citations

papers (12)

Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling

ICLR 2025arXiv

122

citations

AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation

NEURIPS 2023arXiv

120

citations

In-Context Learning Unlocked for Diffusion Models

NEURIPS 2023arXiv

100

citations

Key-Point-Driven Data Synthesis with Its Enhancement on Mathematical Reasoning

AAAI 2025arXiv

citations

Meet in the Middle: A New Pre-training Paradigm

NEURIPS 2023arXiv

citations

MTL-LoRA: Low-Rank Adaptation for Multi-Task Learning

AAAI 2025arXiv

citations

SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning

NEURIPS 2025arXiv

citations

Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective

ICLR 2024arXiv

citations

Scalable Learning to Optimize: A Learned Optimizer Can Train Big Models

ECCV 2022

citations

Weizhu Chen

papers (12)

Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer

LoftQ: LoRA-Fine-Tuning-aware Quantization for Large Language Models

Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models

Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling

AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation

In-Context Learning Unlocked for Diffusion Models

Key-Point-Driven Data Synthesis with Its Enhancement on Mathematical Reasoning

Meet in the Middle: A New Pre-training Paradigm

MTL-LoRA: Low-Rank Adaptation for Multi-Task Learning

SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning

Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective

Scalable Learning to Optimize: A Learned Optimizer Can Train Big Models

papers (12)

Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer

LoftQ: LoRA-Fine-Tuning-aware Quantization for Large Language Models

Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models

Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling

AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation

In-Context Learning Unlocked for Diffusion Models

Key-Point-Driven Data Synthesis with Its Enhancement on Mathematical Reasoning

Meet in the Middle: A New Pre-training Paradigm

MTL-LoRA: Low-Rank Adaptation for Multi-Task Learning

SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning

Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective

Scalable Learning to Optimize: A Learned Optimizer Can Train Big Models