"robotic manipulation" Papers

66 papers found • Page 2 of 2

CyberDemo: Augmenting Simulated Human Demonstration for Real-World Dexterous Manipulation

Jun Wang, Yuzhe Qin, Kaiming Kuang et al.

CVPR 2024arXiv:2402.14795
28
citations

Diffusion Reward: Learning Rewards via Conditional Video Diffusion

Tao Huang, Guangqi Jiang, Yanjie Ze et al.

ECCV 2024arXiv:2312.14134
44
citations

Expert Proximity as Surrogate Rewards for Single Demonstration Imitation Learning

Chia-Cheng Chiang, Li-Cheng Lan, Wei-Fang Sun et al.

ICML 2024arXiv:2402.01057

Foundation Policies with Hilbert Representations

Seohong Park, Tobias Kreiman, Sergey Levine

ICML 2024oralarXiv:2402.15567
59
citations

KISA: A Unified Keyframe Identifier and Skill Annotator for Long-Horizon Robotics Demonstrations

Longxin Kou, Fei Ni, Yan Zheng et al.

ICML 2024oral

Language-Driven 6-DoF Grasp Detection Using Negative Prompt Guidance

Tien Toan Nguyen, Minh Nhat Nhat Vu, Baoru Huang et al.

ECCV 2024arXiv:2407.13842
17
citations

ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation

Xiaoqi Li, Mingxu Zhang, Yiran Geng et al.

CVPR 2024arXiv:2312.16217
182
citations

PEARL: Zero-shot Cross-task Preference Alignment and Robust Reward Learning for Robotic Manipulation

Runze Liu, Yali Du, Fengshuo Bai et al.

ICML 2024arXiv:2306.03615
9
citations

PRISE: LLM-Style Sequence Compression for Learning Temporal Action Abstractions in Control

Ruijie Zheng, Ching-An Cheng, Hal Daumé et al.

ICML 2024oralarXiv:2402.10450
16
citations

Retrieval-Augmented Embodied Agents

Yichen Zhu, Zhicai Ou, Xiaofeng Mou et al.

CVPR 2024arXiv:2404.11699
28
citations

RoboMP$^2$: A Robotic Multimodal Perception-Planning Framework with Multimodal Large Language Models

Qi Lv, Hao Li, Xiang Deng et al.

ICML 2024arXiv:2404.04929
4
citations

Stop Regressing: Training Value Functions via Classification for Scalable Deep RL

Jesse Farebrother, Jordi Orbay, Quan Vuong et al.

ICML 2024arXiv:2403.03950
107
citations

Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action

Jiasen Lu, Christopher Clark, Sangho Lee et al.

CVPR 2024highlightarXiv:2312.17172
280
citations

VinT-6D: A Large-Scale Object-in-hand Dataset from Vision, Touch and Proprioception

Zhaoliang Wan, Yonggen Ling, Senlin Yi et al.

ICML 2024arXiv:2501.00510
9
citations

Visual Representation Learning with Stochastic Frame Prediction

Huiwon Jang, Dongyoung Kim, Junsu Kim et al.

ICML 2024oralarXiv:2406.07398
9
citations

When Do We Not Need Larger Vision Models?

Baifeng Shi, Ziyang Wu, Maolin Mao et al.

ECCV 2024arXiv:2403.13043
71
citations