TY  - GEN
T1  - Cascaded Diffusion Model and Segment Anything Model for Medical Image Synthesis via Uncertainty-Guided Prompt Generation
AU  - Pang, Haowen
AU  - Hong, Xiaoming
AU  - Zhang, Peng
AU  - Ye, Chuyang
N1  - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.
PY  - 2026
Y1  - 2026
N2  - Multi-modal medical images of the same anatomical structure enhance the diversity of diagnostic information. However, limitations such as scanning time, economic costs, and radiation dose constraints can hinder the acquisition of certain modalities. In such cases, synthesizing missing images from available images offers a promising solution. In recent years, methods based on deep learning, particularly Diffusion Models (DMs), have shown significant success in medical image synthesis. However, these methods can still struggle with the presence of notable image anomalies caused by pathologies. In this work, to improve the synthesis robustness to anomalies, we propose a model cascading the DM and the Segment Anything Model (SAM) with uncertainty-guided prompt generation for medical image synthesis. SAM is originally a foundational model for image segmentation. We hypothesize that SAM (the medical variant) is beneficial to the synthesis tasks because 1) the SAM encoder trained on large, diverse datasets allows the model to grasp a deep understanding of complex anomaly patterns of pathologies and 2) its ability to take prompt inputs naturally allows the synthesis to pay special attention to abnormal regions that are hard to synthesize. To effectively integrate the DM and SAM, we propose the uncertainty-guided prompt generation framework, where DM synthesis results with higher uncertainty are considered regions potentially with worse synthesis quality and prompts are generated for SAM accordingly to improve the result. First, as the DM produces outputs based on randomly sampled noise, we propose to estimate the uncertainty of its synthesis output by repeated noise sampling and represent the output uncertainty by the prediction standard deviation. Then, prompts are generated based on the standard deviation and given to SAM, together with the DM input image and output. For effective interaction between the prompt, DM input, and DM output, we propose an Uncertainty Guided Cross Attention (UGCA) module, where the prompt serves as a query to guide the model to focus on relevant regions of the DM input and output. Finally, a synthesis decoder replaces the SAM decoder, and it is trained together with UGCA. Experimental results on two public datasets demonstrate that the proposed method outperforms existing state-of-the-art methods when images manifest notable anomalies.
AB  - Multi-modal medical images of the same anatomical structure enhance the diversity of diagnostic information. However, limitations such as scanning time, economic costs, and radiation dose constraints can hinder the acquisition of certain modalities. In such cases, synthesizing missing images from available images offers a promising solution. In recent years, methods based on deep learning, particularly Diffusion Models (DMs), have shown significant success in medical image synthesis. However, these methods can still struggle with the presence of notable image anomalies caused by pathologies. In this work, to improve the synthesis robustness to anomalies, we propose a model cascading the DM and the Segment Anything Model (SAM) with uncertainty-guided prompt generation for medical image synthesis. SAM is originally a foundational model for image segmentation. We hypothesize that SAM (the medical variant) is beneficial to the synthesis tasks because 1) the SAM encoder trained on large, diverse datasets allows the model to grasp a deep understanding of complex anomaly patterns of pathologies and 2) its ability to take prompt inputs naturally allows the synthesis to pay special attention to abnormal regions that are hard to synthesize. To effectively integrate the DM and SAM, we propose the uncertainty-guided prompt generation framework, where DM synthesis results with higher uncertainty are considered regions potentially with worse synthesis quality and prompts are generated for SAM accordingly to improve the result. First, as the DM produces outputs based on randomly sampled noise, we propose to estimate the uncertainty of its synthesis output by repeated noise sampling and represent the output uncertainty by the prediction standard deviation. Then, prompts are generated based on the standard deviation and given to SAM, together with the DM input image and output. For effective interaction between the prompt, DM input, and DM output, we propose an Uncertainty Guided Cross Attention (UGCA) module, where the prompt serves as a query to guide the model to focus on relevant regions of the DM input and output. Finally, a synthesis decoder replaces the SAM decoder, and it is trained together with UGCA. Experimental results on two public datasets demonstrate that the proposed method outperforms existing state-of-the-art methods when images manifest notable anomalies.
KW  - Diffusion model
KW  - Medical image synthesis
KW  - Segment anything model
UR  - http://www.scopus.com/pages/publications/105014492333
U2  - 10.1007/978-3-031-96628-6_14
DO  - 10.1007/978-3-031-96628-6_14
M3  - Conference contribution
AN  - SCOPUS:105014492333
SN  - 9783031966279
T3  - Lecture Notes in Computer Science
SP  - 203
EP  - 217
BT  - Information Processing in Medical Imaging - 29th International Conference, IPMI 2025, Proceedings
A2  - Oguz, Ipek
A2  - Zhang, Shaoting
A2  - Metaxas, Dimitris N.
PB  - Springer Science and Business Media Deutschland GmbH
T2  - 29th International Conference on Information Processing in Medical Imaging, IPMI 2025
Y2  - 25 May 2025 through 30 May 2025
ER  -