Cascaded Diffusion Model and Segment Anything Model for Medical Image Synthesis via Uncertainty-Guided Prompt Generation

Haowen Pang, Xiaoming Hong, Peng Zhang, Chuyang Ye*

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Multi-modal medical images of the same anatomical structure enhance the diversity of diagnostic information. However, limitations such as scanning time, economic costs, and radiation dose constraints can hinder the acquisition of certain modalities. In such cases, synthesizing missing images from available images offers a promising solution. In recent years, methods based on deep learning, particularly Diffusion Models (DMs), have shown significant success in medical image synthesis. However, these methods can still struggle with the presence of notable image anomalies caused by pathologies. In this work, to improve the synthesis robustness to anomalies, we propose a model cascading the DM and the Segment Anything Model (SAM) with uncertainty-guided prompt generation for medical image synthesis. SAM is originally a foundational model for image segmentation. We hypothesize that SAM (the medical variant) is beneficial to the synthesis tasks because 1) the SAM encoder trained on large, diverse datasets allows the model to grasp a deep understanding of complex anomaly patterns of pathologies and 2) its ability to take prompt inputs naturally allows the synthesis to pay special attention to abnormal regions that are hard to synthesize. To effectively integrate the DM and SAM, we propose the uncertainty-guided prompt generation framework, where DM synthesis results with higher uncertainty are considered regions potentially with worse synthesis quality and prompts are generated for SAM accordingly to improve the result. First, as the DM produces outputs based on randomly sampled noise, we propose to estimate the uncertainty of its synthesis output by repeated noise sampling and represent the output uncertainty by the prediction standard deviation. Then, prompts are generated based on the standard deviation and given to SAM, together with the DM input image and output. For effective interaction between the prompt, DM input, and DM output, we propose an Uncertainty Guided Cross Attention (UGCA) module, where the prompt serves as a query to guide the model to focus on relevant regions of the DM input and output. Finally, a synthesis decoder replaces the SAM decoder, and it is trained together with UGCA. Experimental results on two public datasets demonstrate that the proposed method outperforms existing state-of-the-art methods when images manifest notable anomalies.

源语言英语
主期刊名Information Processing in Medical Imaging - 29th International Conference, IPMI 2025, Proceedings
编辑Ipek Oguz, Shaoting Zhang, Dimitris N. Metaxas
出版商Springer Science and Business Media Deutschland GmbH
203-217
页数15
ISBN(印刷版)9783031966279
DOI
出版状态已出版 - 2026
已对外发布
活动29th International Conference on Information Processing in Medical Imaging, IPMI 2025 - Kos, 希腊
期限: 25 5月 202530 5月 2025

出版系列

姓名Lecture Notes in Computer Science
15829 LNCS
ISSN(印刷版)0302-9743
ISSN(电子版)1611-3349

会议

会议29th International Conference on Information Processing in Medical Imaging, IPMI 2025
国家/地区希腊
Kos
时期25/05/2530/05/25

指纹

探究 'Cascaded Diffusion Model and Segment Anything Model for Medical Image Synthesis via Uncertainty-Guided Prompt Generation' 的科研主题。它们共同构成独一无二的指纹。

引用此