Tactile DreamFusion: Exploiting Tactile Sensing for 3D Generation

NeurIPS 2024

1 Carnegie Mellon University 2 University of Illinois Urbana-Champaign

3D content creation with touch: Our method TactileDreamFusion integrates high-resolution tactile sensing with diffusion-based image priors to enhance fine geometric details for text- or image-to-3D generation. The following results are rendered using Blender, with full-color rendering on the top and normal rendering at the bottom.

Text to 3D

Our Results

"An avocado"

"An avocado"

"A mug"

"A phone case"

"... with avocado texture" avocado texture sample

"A beanie"

"A beanie"

"A toy flower"

"A miffy bunny"

"... with woven texture" avocado texture sample


Abstract

3D generation methods have shown visually compelling results powered by diffusion image priors. However, they often fail to produce realistic geometric details, resulting in overly smooth surfaces or geometric details inaccurately baked in albedo maps. To address this, we introduce a new method that incorporates touch as an additional modality to improve the geometric details of generated 3D assets.

We design a lightweight 3D texture field to synthesize visual and tactile textures, guided by diffusion-based distribution matching losses on both visual and tactile domains. Our method ensures the consistency between visual and tactile textures while preserving photorealism. We further present a multi-part editing pipeline that enables us to synthesize different textures across various regions. To our knowledge, we are the first to leverage high-resolution tactile sensing to enhance geometric details for 3D generation tasks. We evaluate our method on both text-to-3D and image-to-3D settings. Our experiments demonstrate that our method provides customized and realistic fine geometric textures while maintaining accurate alignment between two modalities of vision and touch.


Summary Video


Method


Application: Same Object with Diverse Textures

We show diverse textures synthesized on the same object, which facilitates custom design of 3D assets.

"canvas bag"

"heat resistant rubber with heart shape"

"cantaloupe"

"stripe sculpture steel"

"strawberry"

ClothBag texture sample
OrangeGlove texture sample
cantaloupe texture sample
GoldGoat texture sample
Strawberry texture sample

"A coffee cup with ... texture"


Single Texture Generation

We show 3D generation with a single texture. Our method generates realistic and coherent visual textures and geometric details.

"A chopping board"

"A cork table mat"

"A corn"

"A heat-resistant glove"

Cutting Board texture sample
Cork Mat texture sample
Corn texture sample
Orange Glove texture sample

"An NFL football"

"A potato"

"A strawberry"

"An orange"

Football texture sample
Potato texture sample
Strawberry texture sample
Orange texture sample

Multi-Part Texture Generation

This grid demonstrates different render types for each object: predicted label map, albedo, normal map, zoomed-in normal patch, and full-color rendering.

Input

Predicted Label

Generated Albedo

Generated Normal

Zoomed Patch

Rendered Full Color

cactus input
Zoomed Patch

cactus input
Zoomed Patch

cactus input
Zoomed Patch

Acknowledgment

We thank Sheng-Yu Wang, Nupur Kumari, Gaurav Parmar, Hung-Jui Huang, and Maxwell Jones for their helpful comments and discussion. We are also grateful to Arpit Agrawal and Sean Liu for proofreading the draft. Kangle Deng is supported by the Microsoft research Ph.D. fellowship. Ruihan Gao is supported by the A*STAR National Science Scholarship (Ph.D.).

BibTex

@inproceedings{gao2024exploiting,
      title     = {Tactile DreamFusion: Exploiting Tactile Sensing for 3D Generation},
      author    = {Gao, Ruihan and Deng, Kangle and Yang, Gengshan and Yuan, Wenzhen and Zhu, Jun-Yan},
      booktitle = {Conference on Neural Information Processing Systems (NeurIPS)},
      year      = {2024},
    }