Wenqi Shao

papers

1,053

total citations

papers (26)

OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models

ICLR 2024arXiv

341

citations

OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM

CVPR 2024arXiv

144

citations

GUIOdyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices

ICCV 2025arXiv

113

citations

Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation

ICML 2025arXiv

citations

DiffRate : Differentiable Compression Rate for Efficient Vision Transformers

ICCV 2023arXiv

citations

Beyond One-to-One: Rethinking the Referring Image Segmentation

ICCV 2023arXiv

citations

BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation

ICLR 2024arXiv

citations

Not All Models Are Equal: Predicting Model Transferability in a Self-Challenging Fisher Space

ECCV 2022arXiv

citations

Real-Time Controllable Denoising for Image and Video

CVPR 2023arXiv

citations

Foundation Model is Efficient Multimodal Multitask Model Selector

NEURIPS 2023arXiv

citations

OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation

CVPR 2025arXiv

citations

DexHandDiff: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation

CVPR 2025arXiv

citations

Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models

CVPR 2025arXiv

citations

Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM

ICCV 2025arXiv

citations

Distilling Monocular Foundation Model for Fine-grained Depth Completion

CVPR 2025arXiv

citations

Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation

ICLR 2025

citations

DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model

CVPR 2024arXiv

citations

OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data Synthesis

NEURIPS 2025arXiv

citations

JiSAM: Alleviate Labeling Burden and Corner Case Problems in Autonomous Driving via Minimal Real-World Data

CVPR 2025arXiv

citations

Learning Dense Feature Matching via Lifting Single 2D Image to 3D Space

ICCV 2025arXiv

citations

ZipVL: Accelerating Vision-Language Models through Dynamic Token Sparsity

ICCV 2025

citations

Temporal Overlapping Prediction: A Self-supervised Pre-training Method for LiDAR Moving Object Segmentation

ICCV 2025arXiv

citations

Rethinking the Pruning Criteria for Convolutional Neural Network

NEURIPS 2021

citations

Wenqi Shao

papers (26)

OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models

OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM

GUIOdyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices

Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation

DiffRate : Differentiable Compression Rate for Efficient Vision Transformers

Beyond One-to-One: Rethinking the Referring Image Segmentation

BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation

Not All Models Are Equal: Predicting Model Transferability in a Self-Challenging Fisher Space

Real-Time Controllable Denoising for Image and Video

Foundation Model is Efficient Multimodal Multitask Model Selector

OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation

DexHandDiff: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation

Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models

Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM

Distilling Monocular Foundation Model for Fine-grained Depth Completion

Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation

DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model

LiT: Delving into a Simple Linear Diffusion Transformer for Image Generation

Cached Transformers: Improving Transformers with Differentiable Memory Cached

Cross-Subject Mind Decoding from Inaccurate Representations

OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data Synthesis

JiSAM: Alleviate Labeling Burden and Corner Case Problems in Autonomous Driving via Minimal Real-World Data

Learning Dense Feature Matching via Lifting Single 2D Image to 3D Space

ZipVL: Accelerating Vision-Language Models through Dynamic Token Sparsity

Temporal Overlapping Prediction: A Self-supervised Pre-training Method for LiDAR Moving Object Segmentation

Rethinking the Pruning Criteria for Convolutional Neural Network

papers (26)

OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models

OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM

GUIOdyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices

Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation

DiffRate : Differentiable Compression Rate for Efficient Vision Transformers

Beyond One-to-One: Rethinking the Referring Image Segmentation

BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation

Not All Models Are Equal: Predicting Model Transferability in a Self-Challenging Fisher Space

Real-Time Controllable Denoising for Image and Video

Foundation Model is Efficient Multimodal Multitask Model Selector

OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation

DexHandDiff: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation

Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models

Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM

Distilling Monocular Foundation Model for Fine-grained Depth Completion

Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation

DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model

LiT: Delving into a Simple Linear Diffusion Transformer for Image Generation

Cached Transformers: Improving Transformers with Differentiable Memory Cached

Cross-Subject Mind Decoding from Inaccurate Representations

OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data Synthesis

JiSAM: Alleviate Labeling Burden and Corner Case Problems in Autonomous Driving via Minimal Real-World Data

Learning Dense Feature Matching via Lifting Single 2D Image to 3D Space

ZipVL: Accelerating Vision-Language Models through Dynamic Token Sparsity

Temporal Overlapping Prediction: A Self-supervised Pre-training Method for LiDAR Moving Object Segmentation

Rethinking the Pruning Criteria for Convolutional Neural Network