Example Restorations
Move each divider to compare the degraded observation with the Guess and Guide reconstruction.
Abstract
Pretrained diffusion models are strong priors for Bayesian inverse problems, but many zero-shot posterior-sampling methods require vector-Jacobian products through the denoiser at each denoising step. Guess & Guide (G&G) replaces this expensive guidance with a lightweight likelihood surrogate and sparse pixel-space optimization. This removes backpropagation through the denoiser, and through the encoder-decoder in latent diffusion settings, while maintaining strong reconstruction quality across diverse inverse problems.
Introduction
Inverse problems arise when a signal must be recovered from incomplete, corrupted, or indirect observations. In Bayesian inverse problems, a measurement $\mathbf{y} = \mathcal{A}(\mathbf{x}) + n$ is combined with a prior over plausible clean signals. Diffusion models provide powerful priors for this setting because they can generate realistic samples without retraining for every degradation operator.
The main bottleneck is guidance. Existing zero-shot diffusion solvers often approximate the posterior score with likelihood terms that require vector-Jacobian products through the denoiser, and sometimes through the latent encoder-decoder. Guess & Guide (G&G) avoids this cost by separating learned prior refinement from data-consistency enforcement.
Main Contributions
- Backpropagation-free generative guidance: G&G avoids VJPs through the denoiser and encoder-decoder, using gradients only through the forward operator $\mathcal{A}$ during explicit data-consistency optimization.
- Two-phase inference: A fixed-$t_*$ warm-start phase produces an initial guess through denoise-optimize-renoise iterations, then a sparse guided-denoising phase refines it at selected timesteps.
- Broad inverse-problem coverage: The method is evaluated on FFHQ, ImageNet, and FFHQ latent-diffusion experiments across linear and nonlinear tasks including deblurring, super-resolution, inpainting, JPEG restoration, phase retrieval, and HDR reconstruction.
Method
G&G decomposes posterior sampling into two phases. The key design choice is to keep the diffusion prior forward-only, while enforcing measurement consistency with lightweight optimization outside the denoiser.
Phase 1: Initial Guess (Warm Start)
G&G first seeks an approximate sample at a fixed intermediate timestep $t_*$. Each warm-start iteration:
- denoises the current state and decodes it to image space,
- optimizes the image to better match the observation under $\mathcal{A}$,
- encodes and re-noises the optimized image back to the same timestep.
This creates a useful posterior-informed starting point and avoids running a full guided reverse process from pure noise.
Phase 2: Guided Denoising
Starting from the warm-start sample, G&G follows a sparse grid of timesteps. At each selected timestep, it alternates between:
- Prior refinement: forward-only denoising and diffusion transitions preserve realism.
- Data consistency: pixel-space optimization pulls the reconstruction toward the observation.
- Re-noising: the optimized estimate is returned to the noisy trajectory before continuing.
The guidance schedule concentrates optimization around intermediate noise levels, where the denoiser can still correct optimization artifacts while the sample already carries useful structure.
Results
G&G is evaluated on both pixel-space and latent-space diffusion models. The main paper reports LPIPS as the primary metric, with PSNR and SSIM in the supplementary material.
FFHQ Pixel-Space Results
Lower LPIPS is better. The table below summarizes representative FFHQ results from the main paper.
| Task | G&G | DAPS | DPS |
|---|---|---|---|
| Gaussian Deblur | 0.17 ± 0.06 | 0.19 ± 0.06 | 0.16 ± 0.05 |
| Motion Deblur | 0.13 ± 0.04 | 0.19 ± 0.06 | 0.21 ± 0.06 |
| Super-Resolution ×4 | 0.20 ± 0.07 | 0.19 ± 0.06 | 0.22 ± 0.07 |
| Super-Resolution ×16 | 0.35 ± 0.08 | 0.45 ± 0.10 | 0.36 ± 0.08 |
| Box Inpainting | 0.14 ± 0.04 | 0.12 ± 0.04 | 0.20 ± 0.08 |
| JPEG (QF=2) | 0.16 ± 0.06 | 0.22 ± 0.07 | 0.28 ± 0.07 |
Key Takeaways
- Efficiency: On FFHQ pixel-space experiments, G&G uses 1983 MB and 25 seconds, compared with DPS at 3309 MB and 105 seconds. In FFHQ latent-diffusion experiments, G&G reports 24 seconds compared with 509 seconds for ReSample and 1254 seconds for DAPS.
- Quality-speed tradeoff: G&G is best or second-best on most FFHQ pixel-space tasks while being the fastest method in the reported tables.
- Generality: The same inference framework is used across linear and nonlinear inverse problems, including latent diffusion settings where avoiding encoder-decoder VJPs is especially important.
Limitations
The choice of the warm-start timestep $t_*$ and the spacing of guided timesteps are important. Starting too early increases runtime without clear quality gains, while starting too late can produce artifacts because the denoiser has less opportunity to correct the optimized estimate.
Conclusion
Guess & Guide is a computationally efficient framework for zero-shot diffusion-based inverse problem solving. It replaces dense denoiser-gradient guidance with a two-phase procedure: a fixed-timestep warm start and sparse scheduled pixel-space guidance. This yields faster and more memory-efficient inference while maintaining competitive reconstruction quality across FFHQ, ImageNet, and latent-diffusion experiments.
Citation
@article{shtanchaev2026guess,
title={Guess \& Guide: Gradient-Free Zero-Shot Diffusion Guidance},
author={Shtanchaev, Abduragim and Ilina, Albina and Janati, Yazid and Asadulaev, Arip and Takáč, Martin and Moulines, Eric},
journal={arXiv preprint arXiv:2603.07860},
year={2026}
}