Binding Touch to Everything: Learning Unified Multimodal Tactile Representations

112citations
arXiv:2401.18084
112
citations
#225
in CVPR 2024
of 2716 papers
11
Top Authors
4
Data Points

Abstract

The ability to associate touch with other modalities has huge implications for humans and computational systems. However, multimodal learning with touch remains challenging due to the expensive data collection process and non-standardized sensor outputs. We introduce UniTouch, a unified tactile model for vision-based touch sensors connected to multiple modalities, including vision, language, and sound. We achieve this by aligning our UniTouch embeddings to pretrained image embeddings already associated with a variety of other modalities. We further propose learnable sensor-specific tokens, allowing the model to learn from a set of heterogeneous tactile sensors, all at the same time. UniTouch is capable of conducting various touch sensing tasks in the zero-shot setting, from robot grasping prediction to touch image question answering. To the best of our knowledge, UniTouch is the first to demonstrate such capabilities. Project page: https://cfeng16.github.io/UniTouch/

Citation History

Jan 27, 2026
109
Feb 13, 2026
111+2
Feb 13, 2026
112+1
Feb 13, 2026
112