α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
Peng Jin
Peng Jin
OpenReview
1
Affiliations
Affiliations
Leshan Normal University
19
papers
1,323
total citations
papers (19)
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
CVPR 2024
arXiv
364
citations
LLaVA-CoT: Let Vision Language Models Reason Step-by-Step
ICCV 2025
arXiv
360
citations
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
ICML 2024
arXiv
141
citations
Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations
NEURIPS 2022
arXiv
87
citations
DiffusionRet: Generative Text-Video Retrieval with Diffusion Model
ICCV 2023
arXiv
84
citations
Video-Text As Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
CVPR 2023
arXiv
81
citations
Act As You Wish: Fine-Grained Control of Motion Diffusion Model with Hierarchical Semantic Graphs
NEURIPS 2023
arXiv
45
citations
MoH: Multi-Head Attention as Mixture-of-Head Attention
ICML 2025
arXiv
40
citations
Parallel Vertex Diffusion for Unified Visual Grounding
AAAI 2024
arXiv
37
citations
MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts
ICLR 2025
arXiv
36
citations
Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation
ECCV 2024
arXiv
13
citations
Multi-granularity Interaction Simulation for Unsupervised Interactive Segmentation
ICCV 2023
arXiv
10
citations
MUSE: Mamba Is Efficient Multi-scale Learner for Text-video Retrieval
AAAI 2025
arXiv
9
citations
Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable Repainting
ECCV 2024
8
citations
Auto-Linear Phenomenon in Subsurface Imaging
ICML 2024
arXiv
7
citations
VSNet: Focusing on the Linguistic Characteristics of Sign Language
CVPR 2025
1
citations
OpenFWI: Large-scale Multi-structural Benchmark Datasets for Full Waveform Inversion
NEURIPS 2022
0
citations
Aligning Instance Brownian Bridge with Texts for Open-Vocabulary Video Instance Segmentation
AAAI 2025
0
citations
$\mathbf{\mathbb{E}^{FWI}}$: Multiparameter Benchmark Datasets for Elastic Full Waveform Inversion of Geophysical Properties
NEURIPS 2023
0
citations