ArXiv

Pixal3D: Pixel-Aligned 3D Generation from Images

Authors
Dong-Yang Li, Wang Zhao, Yuxin Chen...
Categories
cs.CV
arXiv
https://arxiv.org/abs/2605.10922v1
PDF
https://arxiv.org/pdf/2605.10922v1

Brief

Pixal3D (Li et al., SIGGRAPH 2026) presents a pixel-aligned 3D generation paradigm that back-projects multi-scale image features into a 3D feature volume to create explicit pixel-to-3D correspondences, generating assets aligned with the input view. The method reportedly substantially raises fidelity—'approaching the fidelity level of reconstruction'—and extends to multi-view and scene synthesis.

Why it matters

Introduces pixel back-projection conditioning that lifts multi-scale image features into a 3D feature volume, establishing explicit pixel-to-3D correspondence and enabling pixel-aligned 3D generation in the input view rather than a canonical pose.

Key details

  • Reports substantial fidelity gains—'approaching the fidelity level of reconstruction'—and extends naturally to multi-view by aggregating back-projected volumes; also presents a modular pipeline for high-fidelity, object-separated 3D scenes (Dong-Yang Li et al., SIGGRAPH 2026; project: https://ldyang694.github.io/projects/pixal3d/).
Source evidence

Abstract

Recent advances in 3D generative models have rapidly improved image-to-3D synthesis quality, enabling higher-resolution geometry and more realistic appearance. Yet fidelity, which measures pixel-level faithfulness of the generated 3D asset to the input image, still remains a central bottleneck. We argue this stems from an implicit 2D-3D correspondence issue: most 3D-native generators synthesize shape in canonical space and inject image cues via attention, leaving pixel-to-3D associations ambiguous. To tackle this issue, we draw inspiration from 3D reconstruction and propose Pixal3D, a pixel-aligned 3D generation paradigm for high-fidelity 3D asset creation from images. Instead of generating in a canonical pose, Pixal3D directly generates 3D in a pixel-aligned way, consistent with the input view. To enable this, we introduce a pixel back-projection conditioning scheme that explicitly lifts multi-scale image features into a 3D feature volume, establishing direct pixel-to-3D correspondence without ambiguity. We show that Pixal3D is not only scalable and capable of producing high-quality 3D assets, but also substantially improves fidelity, approaching the fidelity level of reconstruction. Furthermore, Pixal3D naturally extends to multi-view generation by aggregating back-projected feature volumes across views. Finally, we show pixel-aligned generation benefits scene synthesis, and present a modular pipeline that produces high-fidelity, object-separated 3D scenes from images. Pixal3D for the first time demonstrates 3D-native pixel-aligned generation at scale, and provides a new inspiring way towards high-fidelity 3D generation of object or scene from single or multi-view images. Project page: https://ldyang694.github.io/projects/pixal3d/

Comment: SIGGRAPH 2026. Project page: https://ldyang694.github.io/projects/pixal3d/