Text-to-3D with Classifier Score Distillation

1The University of Hong Kong 2Tsinghua University 3VAST

(Left-Mid) 3D models generated using our method; (Right) Texture synthesis for an existing mesh.

Abstract

Text-to-3D generation has made remarkable progress recently, particularly with methods based on Score Distillation Sampling (SDS) that leverages pre-trained 2D diffusion models. While the usage of classifier-free guidance is well acknowledged to be crucial for successful optimization, it is considered an auxiliary trick rather than the most essential component. In this paper, we re-evaluate the role of classifier-free guidance in score distillation and discover a surprising finding: the guidance alone is enough for effective text-to-3D generation tasks. We name this method Classifier Score Distillation (CSD), which can be interpreted as using an implicit classification model for generation. This new perspective reveals new insights for understanding existing techniques. We validate the effectiveness of CSD across a variety of text-to-3D tasks including shape generation, texture synthesis, and shape editing, achieving results superior to those of state-of-the-art methods.

Example generated objects

Comparison on shape generation

Dreamfusion
Magic3D
Ours

Comparison on texture synthesis

Geometry
Fantasia3D
Magic3D
TEXTure
ProlificDreamer
Ours