Zihao Wang

Affiliations

Peking UniversityHong Kong University of Science and Technology

papers

715

total citations

papers (26)

Open-World Multi-Task Control Through Goal-Aware Representation Learning and Adaptive Horizon Prediction

CVPR 2023arXiv

citations

GROOT: Learning to Follow Instructions by Watching Gameplay Videos

ICLR 2024arXiv

citations

Selecting Large Language Model to Fine-tune via Rectified Scaling Law

ICML 2024arXiv

citations

Transforming and Combining Rewards for Aligning Large Language Models

ICML 2024arXiv

citations

Posterior Collapse of a Linear Latent Variable Model

NEURIPS 2022arXiv

citations

MCU: An Evaluation Framework for Open-Ended Game Agents

ICML 2025arXiv

citations

Where am I? Cross-View Geo-localization with Natural Language Descriptions

ICCV 2025arXiv

citations

ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting

CVPR 2025arXiv

citations

Quasi-Balanced Self-Training on Noise-Aware Synthesis of Object Point Clouds for Closing Domain Gap

ECCV 2022arXiv

citations

ACE: Anti-Editing Concept Erasure in Text-to-Image Models

CVPR 2025arXiv

citations

Learning Hierarchical Polynomials with Three-Layer Neural Networks

ICLR 2024arXiv

citations

NestE: Modeling Nested Relational Structures for Knowledge Graph Reasoning

AAAI 2024arXiv

citations

A User-Friendly Framework for Generating Model-Preferred Prompts in Text-to-Image

AAAI 2024arXiv

citations

Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models

ICCV 2025arXiv

citations

Learning Hierarchical Polynomials of Multiple Nonlinear Features

ICLR 2025arXiv

citations

Transtreaming: Adaptive Delay-aware Transformer for Real-time Streaming Perception

AAAI 2025arXiv

citations

Open-World Skill Discovery from Unsegmented Demonstration Videos

ICCV 2025

citations

Theoretical Analysis of the Inductive Biases in Deep Convolutional Networks

NEURIPS 2023

citations

Learning Transformation-Predictive Representations for Detection and Description of Local Features

CVPR 2023

citations

Describe, Explain, Plan and Select: Interactive Planning with LLMs Enables Open-World Multi-Task Agents

NEURIPS 2023

citations

ESEG: Event-Based Segmentation Boosted by Explicit Edge-Semantic Guidance

AAAI 2025

citations

MSV-PCT: Multi-Sparse-View Enhanced Transformer Framework for Salient Object Detection in Point Clouds

AAAI 2025

citations

Zihao Wang

Affiliations

papers (26)

OnePose: One-Shot Object Pose Estimation Without CAD Models

ProAgent: Building Proactive Cooperative Agents with Large Language Models

Concept Algebra for (Score-Based) Text-Controlled Generative Models

Weakly-supervised 3D Shape Completion in the Wild

Open-World Multi-Task Control Through Goal-Aware Representation Learning and Adaptive Horizon Prediction

GROOT: Learning to Follow Instructions by Watching Gameplay Videos

Selecting Large Language Model to Fine-tune via Rectified Scaling Law

Transforming and Combining Rewards for Aligning Large Language Models

Posterior Collapse of a Linear Latent Variable Model

MCU: An Evaluation Framework for Open-Ended Game Agents

Where am I? Cross-View Geo-localization with Natural Language Descriptions

ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting

Quasi-Balanced Self-Training on Noise-Aware Synthesis of Object Point Clouds for Closing Domain Gap

ACE: Anti-Editing Concept Erasure in Text-to-Image Models

Learning Hierarchical Polynomials with Three-Layer Neural Networks

NestE: Modeling Nested Relational Structures for Knowledge Graph Reasoning

A User-Friendly Framework for Generating Model-Preferred Prompts in Text-to-Image

Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models

Learning Hierarchical Polynomials of Multiple Nonlinear Features

Transtreaming: Adaptive Delay-aware Transformer for Real-time Streaming Perception

Open-World Skill Discovery from Unsegmented Demonstration Videos

Theoretical Analysis of the Inductive Biases in Deep Convolutional Networks

Learning Transformation-Predictive Representations for Detection and Description of Local Features

Describe, Explain, Plan and Select: Interactive Planning with LLMs Enables Open-World Multi-Task Agents

ESEG: Event-Based Segmentation Boosted by Explicit Edge-Semantic Guidance

MSV-PCT: Multi-Sparse-View Enhanced Transformer Framework for Salient Object Detection in Point Clouds

papers (26)

OnePose: One-Shot Object Pose Estimation Without CAD Models

ProAgent: Building Proactive Cooperative Agents with Large Language Models

Concept Algebra for (Score-Based) Text-Controlled Generative Models

Weakly-supervised 3D Shape Completion in the Wild

Open-World Multi-Task Control Through Goal-Aware Representation Learning and Adaptive Horizon Prediction

GROOT: Learning to Follow Instructions by Watching Gameplay Videos

Selecting Large Language Model to Fine-tune via Rectified Scaling Law

Transforming and Combining Rewards for Aligning Large Language Models

Posterior Collapse of a Linear Latent Variable Model

MCU: An Evaluation Framework for Open-Ended Game Agents

Where am I? Cross-View Geo-localization with Natural Language Descriptions

ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting

Quasi-Balanced Self-Training on Noise-Aware Synthesis of Object Point Clouds for Closing Domain Gap

ACE: Anti-Editing Concept Erasure in Text-to-Image Models

Learning Hierarchical Polynomials with Three-Layer Neural Networks

NestE: Modeling Nested Relational Structures for Knowledge Graph Reasoning

A User-Friendly Framework for Generating Model-Preferred Prompts in Text-to-Image

Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models

Learning Hierarchical Polynomials of Multiple Nonlinear Features

Transtreaming: Adaptive Delay-aware Transformer for Real-time Streaming Perception

Open-World Skill Discovery from Unsegmented Demonstration Videos

Theoretical Analysis of the Inductive Biases in Deep Convolutional Networks

Learning Transformation-Predictive Representations for Detection and Description of Local Features

Describe, Explain, Plan and Select: Interactive Planning with LLMs Enables Open-World Multi-Task Agents

ESEG: Event-Based Segmentation Boosted by Explicit Edge-Semantic Guidance

MSV-PCT: Multi-Sparse-View Enhanced Transformer Framework for Salient Object Detection in Point Clouds