Spectral Compressive Imaging via Unmixing-driven Subspace Diffusion Refinement

@ ICLR 2025 Spotlight (Top 4.79%)
1Harvard University, 2Harbin Institute of Technology (Shenzhen)
*Equal contribution, Corresponding author
TLDR We propose a novel Predict-and-unmixing-driven Subspace Diffusion Refinement framework (PSR-SCI) for Spectral Compressive Imaging (SCI) reconstruction. Our method addresses the challenges of limited MSI training data and high computational cost of diffusion models for high-dimensional data, enabling efficient and high-quality MSI recovery.
PSR-SCI Framework Overview

Abstract

Spectral Compressive Imaging (SCI) reconstruction is inherently ill-posed, offering multiple plausible solutions from a single observation. Traditional deterministic methods typically struggle to effectively recover high-frequency details. Although diffusion models offer promising solutions to this challenge, their application is constrained by the limited training data and high computational demands associated with multispectral images (MSIs), complicating direct training. To address these issues, we propose a novel Predict-and-unmixing-driven-Subspace-Refine framework (PSR-SCI). This framework begins with a cost-effective predictor that produces an initial, rough estimate of the MSI. Subsequently, we introduce a unmixing-driven reversible spectral embedding module that decomposes the MSI into subspace images and spectral coefficients. This decomposition facilitates the adaptation of pre-trained RGB diffusion models and focuses refinement processes on high-frequency details, thereby enabling efficient diffusion generation with minimal MSI data. Additionally, we design a high-dimensional guidance mechanism with imaging consistency to enhance the model's efficacy. The refined subspace image is then reconstructed back into an MSI using the reversible embedding, yielding the final MSI with full spectral resolution. Experimental results on the standard KAIST and zero-shot datasets NTIRE, ICVL, and Harvard show that PSR-SCI enhances visual quality and delivers PSNR and SSIM metrics comparable to existing diffusion, transformer, and deep unfolding techniques. This framework provides a robust alternative to traditional deterministic SCI reconstruction methods. [Code and models]

Key Contributions

Related Works

The existing framework for SCI reconstruction predominantly consists of model-based, Plug-and-Play, End-to-end (E2E), and Deep unfolding methods.

Method: PSR-SCI Framework

PSR-SCI Pipeline
Figure: PSR-SCI comprises (a) fast predictor with frequency splitting, (b) unmixing-based reversible embedding, and (c) subspace refinement via pretrained diffusion.

1. Snapshot Compressive Imaging and Problem Setup

In a CASSI system, an MSI \( \mathcal{X} \in \mathbb{R}^{H \times W \times B} \) is projected into a 2D measurement \( \mathcal{Y} \in \mathbb{R}^{H \times (W + d(B-1))} \) via coded spectral modulation. The imaging model can be formulated as: \[ \mathbf{y} = \mathbf{\Phi} \mathbf{x} + \mathbf{n}, \] where \( \mathbf{\Phi} \) encodes spectral-shifted masks, and \( \mathbf{x} \) is the vectorized MSI. Reconstructing \( \mathbf{x} \) from \( \mathbf{y} \) is ill-posed and benefits from strong generative priors.

2. Predict-and-Subspace-Refine Framework

We first estimate a coarse MSI \( \mathcal{X}_{\textit{init}} = \phi(\mathcal{Y}) \) via a trained predictor. Then, a learnable frequency separator \( \tau \) splits it into high- and low-frequency parts: \( (\mathcal{X}^h, \mathcal{X}^l) = \tau(\mathcal{X}_{\textit{init}}) \). The high-frequency part is embedded into subspace form: \[ (\mathcal{A}^h, E) = \psi(\mathcal{X}^h), \] where \( \mathcal{A}^h \) is a low-rank abundance map and \( E \) encodes spectral coefficients. This enables refinement via diffusion on \( \mathcal{A}^h \), with final MSI recovered as: \[ \hat{\mathcal{X}} = \psi^{-1}(\mathcal{A}^h_{\textit{diff}}, E) + \mathcal{X}^l. \]

3. Reversible Spectral Embedding (URSe)

The URSe module implements invertible unmixing via hierarchical convolutions and spectral attention, ensuring minimal information loss. Unlike direct compression, URSe guarantees structural recovery of the MSI from latent subspace.

4. Diffusion Refinement with High-Dimensional Guidance

We adapt pretrained Stable Diffusion (2.1-base) for subspace refinement. A parallel encoder allows tuning on small MSI datasets. To enforce measurement consistency, we introduce a high-dimensional guidance loss: \[ \mathcal{L} = \|\mathcal{Y} - \Phi(\psi^{-1}(\mathcal{D}(\hat{\mathcal{z}}_0), E) + \mathcal{X}^l) \|^2. \] This forms a guided reverse SDE that jointly optimizes perceptual quality and physical realism.

Experimental Results

Quantitative Evaluation

Table 1: Numerical evaluations between our PSR-SCI and state-of-the-art methods across 10 simulated scenes. The best results are in bold and second-best are underlined.
Algorithms Category Reference Average PSNR/SSIM
DeSCI Model TPAMI 2019 25.27 / 0.748
\(\lambda\)-Net CNN ICCV 2019 28.53 / 0.841
TSA-Net CNN ECCV 2020 31.35 / 0.895
HDNet Transformer CVPR 2022 34.66 / 0.946
MST-L Transformer CVPR 2022 34.81 / 0.949
MST++ Transformer CVPR 2022 35.72 / 0.955
DAUHST Deep Unfolding NeurIPS 2022 37.21 / 0.959
DAUHST-3stg Deep Unfolding NeurIPS 2022 37.21 / 0.959
DAUHST-SP2 Subspace prior Information Fusion 2024 37.61 / 0.966
DiffSCI Diffusion CVPR 2024 35.28 / 0.916
PSR-SCI-T Diffusion ICLR 2025 (Ours) 36.68 / 0.961
PSR-SCI-D Diffusion ICLR 2025 (Ours) 38.14 / 0.967

Our PSR-SCI-D model achieves state-of-the-art performance across all metrics on the KAIST dataset, with an average PSNR of 38.14dB and SSIM of 0.967 - an improvement of nearly 1.4dB compared to leading diffusion-based methods. The PSR-SCI-T variant also demonstrates competitive performance, highlighting the effectiveness of our approach.

Qualitative Results

Simulation Results
Figure 2: Visual comparison on the KAIST dataset. Our method yields superior visual effects with cleaner textures and fewer artifacts compared to other state-of-the-art methods.
Real Dataset Results
Figure 3: Visual comparison on a real dataset. PSR-SCI recovers more complete and detailed shapes with fewer artifacts compared to other methods.

Zero-Shot Generalization

Zero-Shot Results
Figure 4: Generalization performance on zero-shot datasets (ICVL, NTIRE, Harvard). Our model consistently outperforms competing methods in both PSNR and perceptual metrics.

Computational Efficiency

Our PSR-SCI model significantly reduces the computational burden, requiring only 8.9 seconds for 50 sampling steps compared to 85 seconds for state-of-the-art diffusion-based methods, while achieving superior performance.

Conclusion

We introduced a new framework for spectral compressive imaging reconstruction that focuses on reconstructing high-frequency details by fine-tuning a diffusion model pre-trained on large-scale RGB images. Our empirical results demonstrate significant improvements in detail quality and superior metrics compared to current state-of-the-art methods. We believe that our work introduces a novel direction in spectral compressive imaging reconstruction and establishes a robust benchmark for future research.

Citation

@inproceedings{zeng2025spectral,
title={Spectral Compressive Imaging via Unmixing-driven Subspace Diffusion Refinement},
author={Zeng, Haijin and Sun, Benteng and Chen, Yongyong and Su, Jingyong and Xu, Yong},
booktitle={The Thirteenth International Conference on Learning Representations}
    year={2025}
}