"robotic manipulation" Papers
66 papers found • Page 2 of 2
Conference
CyberDemo: Augmenting Simulated Human Demonstration for Real-World Dexterous Manipulation
Jun Wang, Yuzhe Qin, Kaiming Kuang et al.
CVPR 2024arXiv:2402.14795
28
citations
Diffusion Reward: Learning Rewards via Conditional Video Diffusion
Tao Huang, Guangqi Jiang, Yanjie Ze et al.
ECCV 2024arXiv:2312.14134
44
citations
Expert Proximity as Surrogate Rewards for Single Demonstration Imitation Learning
Chia-Cheng Chiang, Li-Cheng Lan, Wei-Fang Sun et al.
ICML 2024arXiv:2402.01057
Foundation Policies with Hilbert Representations
Seohong Park, Tobias Kreiman, Sergey Levine
ICML 2024oralarXiv:2402.15567
59
citations
KISA: A Unified Keyframe Identifier and Skill Annotator for Long-Horizon Robotics Demonstrations
Longxin Kou, Fei Ni, Yan Zheng et al.
ICML 2024oral
Language-Driven 6-DoF Grasp Detection Using Negative Prompt Guidance
Tien Toan Nguyen, Minh Nhat Nhat Vu, Baoru Huang et al.
ECCV 2024arXiv:2407.13842
17
citations
ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation
Xiaoqi Li, Mingxu Zhang, Yiran Geng et al.
CVPR 2024arXiv:2312.16217
182
citations
PEARL: Zero-shot Cross-task Preference Alignment and Robust Reward Learning for Robotic Manipulation
Runze Liu, Yali Du, Fengshuo Bai et al.
ICML 2024arXiv:2306.03615
9
citations
PRISE: LLM-Style Sequence Compression for Learning Temporal Action Abstractions in Control
Ruijie Zheng, Ching-An Cheng, Hal Daumé et al.
ICML 2024oralarXiv:2402.10450
16
citations
Retrieval-Augmented Embodied Agents
Yichen Zhu, Zhicai Ou, Xiaofeng Mou et al.
CVPR 2024arXiv:2404.11699
28
citations
RoboMP$^2$: A Robotic Multimodal Perception-Planning Framework with Multimodal Large Language Models
Qi Lv, Hao Li, Xiang Deng et al.
ICML 2024arXiv:2404.04929
4
citations
Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
Jesse Farebrother, Jordi Orbay, Quan Vuong et al.
ICML 2024arXiv:2403.03950
107
citations
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action
Jiasen Lu, Christopher Clark, Sangho Lee et al.
CVPR 2024highlightarXiv:2312.17172
280
citations
VinT-6D: A Large-Scale Object-in-hand Dataset from Vision, Touch and Proprioception
Zhaoliang Wan, Yonggen Ling, Senlin Yi et al.
ICML 2024arXiv:2501.00510
9
citations
Visual Representation Learning with Stochastic Frame Prediction
Huiwon Jang, Dongyoung Kim, Junsu Kim et al.
ICML 2024oralarXiv:2406.07398
9
citations
When Do We Not Need Larger Vision Models?
Baifeng Shi, Ziyang Wu, Maolin Mao et al.
ECCV 2024arXiv:2403.13043
71
citations