LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models

Wang, Zhengyi, et al. "LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models." arXiv preprint arXiv:2411.09595 (2024).

참고:

1. Background

Active research domain:

Try to harness LLM’s strong capabilities for other modalities

(e.g., understand images)

\(\rightarrow\) This paper: Generating 3D mesh with LLM!

Transform a LLM into a 3D mesh expert!

\(\rightarrow\) By teaching it to understand and generate “3D mesh” objects

How can LLM, trained on text, can understand & generate 3D object??

\(\rightarrow\) Format: OBJ ( = text-based standard for 3D objects )

\(\rightarrow\) Enable the LLM to read&generate this format!

(Original) Vertices’ coordinates are typically provided as decimal values

\(\rightarrow\) Convert these coordinates to integers. ( = Quantization process )

\(\rightarrow\) Trading off some precision for efficiency

Observation: Some spatial knowledge is already embedded in pretrained LLMs

\(\rightarrow\) Nonetheless, unsatisfactory results!

\(\rightarrow\) Constructed a new dataset of text-3D instructions ( + supervised fine-tuning )