Jie Tang

OpenReview

papers

5,813

total citations

papers (29)

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

ICLR 2025arXiv

1,409

citations

CogView: Mastering Text-to-Image Generation via Transformers

NEURIPS 2021arXiv

934

citations

ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation

NEURIPS 2023arXiv

803

citations

CogAgent: A Visual Language Model for GUI Agents

CVPR 2024arXiv

629

citations

Graph Random Neural Networks for Semi-Supervised Learning on Graphs

NEURIPS 2020arXiv

470

citations

CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers

NEURIPS 2022arXiv

402

citations

MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models

CVPR 2025arXiv

citations

CATANet: Efficient Content-Aware Token Aggregation for Lightweight Image Super-Resolution

CVPR 2025arXiv

citations

Sketch and Refine: Towards Fast and Accurate Lane Detection

AAAI 2024arXiv

citations

TriSampler: A Better Negative Sampling Principle for Dense Retrieval

AAAI 2024arXiv

citations

VPO: Aligning Text-to-Video Generation Models with Prompt Optimization

ICCV 2025arXiv

citations

SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models

ICLR 2025arXiv

citations

AutoLUT: LUT-Based Image Super-Resolution with Automatic Sampling and Adaptive Residual Learning

CVPR 2025arXiv

citations

A Matrix Chernoff Bound for Markov Chains and Its Application to Co-occurrence Matrices

NEURIPS 2020arXiv

citations

Small Language Model Makes an Effective Long Text Extractor

AAAI 2025arXiv

citations

A Hierarchical Reinforcement Learning Based Optimization Framework for Large-scale Dynamic Pickup and Delivery Problems

NEURIPS 2021

citations

UFC-BERT: Unifying Multi-Modal Controls for Conditional Image Synthesis

NEURIPS 2021

citations

Residual Feature Aggregation Network for Image Super-Resolution

CVPR 2020

citations

CogLTX: Applying BERT to Long Texts

NEURIPS 2020

citations

BodyGAN: General-Purpose Controllable Neural Human Body Generation

CVPR 2022

citations

Adaptive Diffusion in Graph Neural Networks

NEURIPS 2021

citations

Jie Tang

papers (29)

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

CogView: Mastering Text-to-Image Generation via Transformers

ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation

CogAgent: A Visual Language Model for GUI Agents

Graph Random Neural Networks for Semi-Supervised Learning on Graphs

CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers

Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos

LVBench: An Extreme Long Video Understanding Benchmark

Robust Object Modeling for Visual Tracking

KoLA: Carefully Benchmarking World Knowledge of Large Language Models

VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents

Bilateral Propagation Network for Depth Completion

Scaling Speech-Text Pre-training with Synthetic Interleaved Data

Towards Efficient Exact Optimization of Language Model Alignment

MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models

CATANet: Efficient Content-Aware Token Aggregation for Lightweight Image Super-Resolution

Sketch and Refine: Towards Fast and Accurate Lane Detection

TriSampler: A Better Negative Sampling Principle for Dense Retrieval

VPO: Aligning Text-to-Video Generation Models with Prompt Optimization

SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models

AutoLUT: LUT-Based Image Super-Resolution with Automatic Sampling and Adaptive Residual Learning

A Matrix Chernoff Bound for Markov Chains and Its Application to Co-occurrence Matrices

Small Language Model Makes an Effective Long Text Extractor

A Hierarchical Reinforcement Learning Based Optimization Framework for Large-scale Dynamic Pickup and Delivery Problems

UFC-BERT: Unifying Multi-Modal Controls for Conditional Image Synthesis

Residual Feature Aggregation Network for Image Super-Resolution

CogLTX: Applying BERT to Long Texts

BodyGAN: General-Purpose Controllable Neural Human Body Generation

Adaptive Diffusion in Graph Neural Networks

papers (29)

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

CogView: Mastering Text-to-Image Generation via Transformers

ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation

CogAgent: A Visual Language Model for GUI Agents

Graph Random Neural Networks for Semi-Supervised Learning on Graphs

CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers

Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos

LVBench: An Extreme Long Video Understanding Benchmark

Robust Object Modeling for Visual Tracking

KoLA: Carefully Benchmarking World Knowledge of Large Language Models

VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents

Bilateral Propagation Network for Depth Completion

Scaling Speech-Text Pre-training with Synthetic Interleaved Data

Towards Efficient Exact Optimization of Language Model Alignment

MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models

CATANet: Efficient Content-Aware Token Aggregation for Lightweight Image Super-Resolution

Sketch and Refine: Towards Fast and Accurate Lane Detection

TriSampler: A Better Negative Sampling Principle for Dense Retrieval

VPO: Aligning Text-to-Video Generation Models with Prompt Optimization

SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models

AutoLUT: LUT-Based Image Super-Resolution with Automatic Sampling and Adaptive Residual Learning

A Matrix Chernoff Bound for Markov Chains and Its Application to Co-occurrence Matrices

Small Language Model Makes an Effective Long Text Extractor

A Hierarchical Reinforcement Learning Based Optimization Framework for Large-scale Dynamic Pickup and Delivery Problems

UFC-BERT: Unifying Multi-Modal Controls for Conditional Image Synthesis

Residual Feature Aggregation Network for Image Super-Resolution

CogLTX: Applying BERT to Long Texts

BodyGAN: General-Purpose Controllable Neural Human Body Generation

Adaptive Diffusion in Graph Neural Networks