Hao Tan

OpenReview

papers

2,050

total citations

papers (29)

PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction

ICLR 2024arXiv

155

citations

Scaling Data Generation in Vision-and-Language Navigation

ICCV 2023arXiv

113

citations

EnvEdit: Environment Editing for Vision-and-Language Navigation

CVPR 2022arXiv

108

citations

LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias

ICLR 2025arXiv

citations

RandAR: Decoder-only Autoregressive Visual Generation in Random Orders

CVPR 2025arXiv

citations

Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats

ICCV 2025arXiv

citations

Learning Navigational Visual Representations with Semantic Map Supervision

ICCV 2023arXiv

citations

LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers

AAAI 2025arXiv

citations

VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer

NEURIPS 2021arXiv

citations

Numerical Pruning for Efficient Autoregressive Models

AAAI 2025arXiv

citations

Carve3D: Improving Multi-view Reconstruction Consistency for Diffusion Models with RL Finetuning

CVPR 2024arXiv

citations

RayZer: A Self-supervised Large View Synthesis Model

ICCV 2025arXiv

citations

VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation

ICCV 2025arXiv

citations

Compound Text-Guided Prompt Tuning via Image-Adaptive Cues

AAAI 2024arXiv

citations

RelitLRM: Generative Relightable Radiance for Large Reconstruction Models

ICLR 2025arXiv

citations

MegaSynth: Scaling Up 3D Scene Reconstruction with Synthesized Data

CVPR 2025arXiv

citations

Gaussian Mixture Flow Matching Models

ICML 2025arXiv

citations

Adaptive Few-shot Prompting for Machine Translation with Pre-trained Language Models

AAAI 2025arXiv

citations

Turbo3D: Ultra-fast Text-to-3D Generation

CVPR 2025arXiv

citations

Generating 3D-Consistent Videos from Unposed Internet Photos

CVPR 2025arXiv

citations

Buffer Anytime: Zero-Shot Video Depth and Normal from Image Priors

CVPR 2025arXiv

citations

Recover and Match: Open-Vocabulary Multi-Label Recognition through Knowledge-Constrained Optimal Transport

CVPR 2025arXiv

citations

DiffTell: A High-Quality Dataset for Describing Image Manipulation Changes

ICCV 2025

citations

Efficient Federated Incomplete Multi-View Clustering

ICML 2025

citations

Large-scale Multi-view Tensor Clustering with Implicit Linear Kernels

CVPR 2025

citations

Building Vision-Language Models on Solid Foundations with Masked Distillation

CVPR 2024

citations

Hao Tan

papers (29)

LRM: Large Reconstruction Model for Single Image to 3D

GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting

DMV3D: Denoising Multi-view Diffusion Using 3D Large Reconstruction Model

PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction

Scaling Data Generation in Vision-and-Language Navigation

EnvEdit: Environment Editing for Vision-and-Language Navigation

LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias

RandAR: Decoder-only Autoregressive Visual Generation in Random Orders

Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats

Learning Navigational Visual Representations with Semantic Map Supervision

LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers

VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer

Numerical Pruning for Efficient Autoregressive Models

Carve3D: Improving Multi-view Reconstruction Consistency for Diffusion Models with RL Finetuning

RayZer: A Self-supervised Large View Synthesis Model

VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation

Compound Text-Guided Prompt Tuning via Image-Adaptive Cues

RelitLRM: Generative Relightable Radiance for Large Reconstruction Models

MegaSynth: Scaling Up 3D Scene Reconstruction with Synthesized Data

Gaussian Mixture Flow Matching Models

Adaptive Few-shot Prompting for Machine Translation with Pre-trained Language Models

Turbo3D: Ultra-fast Text-to-3D Generation

Generating 3D-Consistent Videos from Unposed Internet Photos

Buffer Anytime: Zero-Shot Video Depth and Normal from Image Priors

Recover and Match: Open-Vocabulary Multi-Label Recognition through Knowledge-Constrained Optimal Transport

DiffTell: A High-Quality Dataset for Describing Image Manipulation Changes

Efficient Federated Incomplete Multi-View Clustering

Large-scale Multi-view Tensor Clustering with Implicit Linear Kernels

Building Vision-Language Models on Solid Foundations with Masked Distillation

papers (29)

LRM: Large Reconstruction Model for Single Image to 3D

GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting

DMV3D: Denoising Multi-view Diffusion Using 3D Large Reconstruction Model

PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction

Scaling Data Generation in Vision-and-Language Navigation

EnvEdit: Environment Editing for Vision-and-Language Navigation

LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias

RandAR: Decoder-only Autoregressive Visual Generation in Random Orders

Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats

Learning Navigational Visual Representations with Semantic Map Supervision

LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers

VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer

Numerical Pruning for Efficient Autoregressive Models

Carve3D: Improving Multi-view Reconstruction Consistency for Diffusion Models with RL Finetuning

RayZer: A Self-supervised Large View Synthesis Model

VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation

Compound Text-Guided Prompt Tuning via Image-Adaptive Cues

RelitLRM: Generative Relightable Radiance for Large Reconstruction Models

MegaSynth: Scaling Up 3D Scene Reconstruction with Synthesized Data

Gaussian Mixture Flow Matching Models

Adaptive Few-shot Prompting for Machine Translation with Pre-trained Language Models

Turbo3D: Ultra-fast Text-to-3D Generation

Generating 3D-Consistent Videos from Unposed Internet Photos

Buffer Anytime: Zero-Shot Video Depth and Normal from Image Priors

Recover and Match: Open-Vocabulary Multi-Label Recognition through Knowledge-Constrained Optimal Transport

DiffTell: A High-Quality Dataset for Describing Image Manipulation Changes

Efficient Federated Incomplete Multi-View Clustering

Large-scale Multi-view Tensor Clustering with Implicit Linear Kernels

Building Vision-Language Models on Solid Foundations with Masked Distillation