Cascaded Diffusion Model and Segment Anything Model for Medical Image Synthesis via Uncertainty-Guided Prompt Generation

Haowen Pang; Xiaoming Hong; Peng Zhang; Chuyang Ye

doi:10.1007/978-3-031-96628-6_14

Cascaded Diffusion Model and Segment Anything Model for Medical Image Synthesis via Uncertainty-Guided Prompt Generation

Haowen Pang, Xiaoming Hong, Peng Zhang, Chuyang Ye^*

^*此作品的通讯作者

Beijing Institute of Technology

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

摘要

Multi-modal medical images of the same anatomical structure enhance the diversity of diagnostic information. However, limitations such as scanning time, economic costs, and radiation dose constraints can hinder the acquisition of certain modalities. In such cases, synthesizing missing images from available images offers a promising solution. In recent years, methods based on deep learning, particularly Diffusion Models (DMs), have shown significant success in medical image synthesis. However, these methods can still struggle with the presence of notable image anomalies caused by pathologies. In this work, to improve the synthesis robustness to anomalies, we propose a model cascading the DM and the Segment Anything Model (SAM) with uncertainty-guided prompt generation for medical image synthesis. SAM is originally a foundational model for image segmentation. We hypothesize that SAM (the medical variant) is beneficial to the synthesis tasks because 1) the SAM encoder trained on large, diverse datasets allows the model to grasp a deep understanding of complex anomaly patterns of pathologies and 2) its ability to take prompt inputs naturally allows the synthesis to pay special attention to abnormal regions that are hard to synthesize. To effectively integrate the DM and SAM, we propose the uncertainty-guided prompt generation framework, where DM synthesis results with higher uncertainty are considered regions potentially with worse synthesis quality and prompts are generated for SAM accordingly to improve the result. First, as the DM produces outputs based on randomly sampled noise, we propose to estimate the uncertainty of its synthesis output by repeated noise sampling and represent the output uncertainty by the prediction standard deviation. Then, prompts are generated based on the standard deviation and given to SAM, together with the DM input image and output. For effective interaction between the prompt, DM input, and DM output, we propose an Uncertainty Guided Cross Attention (UGCA) module, where the prompt serves as a query to guide the model to focus on relevant regions of the DM input and output. Finally, a synthesis decoder replaces the SAM decoder, and it is trained together with UGCA. Experimental results on two public datasets demonstrate that the proposed method outperforms existing state-of-the-art methods when images manifest notable anomalies.

源语言	英语
主期刊名	Information Processing in Medical Imaging - 29th International Conference, IPMI 2025, Proceedings
编辑	Ipek Oguz, Shaoting Zhang, Dimitris N. Metaxas
出版商	Springer Science and Business Media Deutschland GmbH
页	203-217
页数	15
ISBN（印刷版）	9783031966279
DOI	http://doi.org/10.1007/978-3-031-96628-6_14
出版状态	已出版 - 2026
已对外发布	是
活动	29th International Conference on Information Processing in Medical Imaging, IPMI 2025 - Kos, 希腊期限: 25 5月 2025 → 30 5月 2025

出版系列

姓名	Lecture Notes in Computer Science
卷	15829 LNCS
ISSN（印刷版）	0302-9743
ISSN（电子版）	1611-3349

会议

会议	29th International Conference on Information Processing in Medical Imaging, IPMI 2025
国家/地区	希腊
市	Kos
时期	25/05/25 → 30/05/25

访问文件

10.1007/978-3-031-96628-6_14

其它文件与链接

链接到 Scopus 的出版物

引用此

Pang, H., Hong, X., Zhang, P., & Ye, C. (2026). Cascaded Diffusion Model and Segment Anything Model for Medical Image Synthesis via Uncertainty-Guided Prompt Generation. 在 I. Oguz, S. Zhang, & D. N. Metaxas (编辑), Information Processing in Medical Imaging - 29th International Conference, IPMI 2025, Proceedings (页码 203-217). (Lecture Notes in Computer Science; 卷 15829 LNCS). Springer Science and Business Media Deutschland GmbH. http://doi.org/10.1007/978-3-031-96628-6_14

Pang, Haowen ; Hong, Xiaoming ; Zhang, Peng 等. / Cascaded Diffusion Model and Segment Anything Model for Medical Image Synthesis via Uncertainty-Guided Prompt Generation. Information Processing in Medical Imaging - 29th International Conference, IPMI 2025, Proceedings. 编辑 / Ipek Oguz ; Shaoting Zhang ; Dimitris N. Metaxas. Springer Science and Business Media Deutschland GmbH, 2026. 页码 203-217 (Lecture Notes in Computer Science).

@inproceedings{d5a9fd8f0164402699f29a5dac05ca7a,

title = "Cascaded Diffusion Model and Segment Anything Model for Medical Image Synthesis via Uncertainty-Guided Prompt Generation",

abstract = "Multi-modal medical images of the same anatomical structure enhance the diversity of diagnostic information. However, limitations such as scanning time, economic costs, and radiation dose constraints can hinder the acquisition of certain modalities. In such cases, synthesizing missing images from available images offers a promising solution. In recent years, methods based on deep learning, particularly Diffusion Models (DMs), have shown significant success in medical image synthesis. However, these methods can still struggle with the presence of notable image anomalies caused by pathologies. In this work, to improve the synthesis robustness to anomalies, we propose a model cascading the DM and the Segment Anything Model (SAM) with uncertainty-guided prompt generation for medical image synthesis. SAM is originally a foundational model for image segmentation. We hypothesize that SAM (the medical variant) is beneficial to the synthesis tasks because 1) the SAM encoder trained on large, diverse datasets allows the model to grasp a deep understanding of complex anomaly patterns of pathologies and 2) its ability to take prompt inputs naturally allows the synthesis to pay special attention to abnormal regions that are hard to synthesize. To effectively integrate the DM and SAM, we propose the uncertainty-guided prompt generation framework, where DM synthesis results with higher uncertainty are considered regions potentially with worse synthesis quality and prompts are generated for SAM accordingly to improve the result. First, as the DM produces outputs based on randomly sampled noise, we propose to estimate the uncertainty of its synthesis output by repeated noise sampling and represent the output uncertainty by the prediction standard deviation. Then, prompts are generated based on the standard deviation and given to SAM, together with the DM input image and output. For effective interaction between the prompt, DM input, and DM output, we propose an Uncertainty Guided Cross Attention (UGCA) module, where the prompt serves as a query to guide the model to focus on relevant regions of the DM input and output. Finally, a synthesis decoder replaces the SAM decoder, and it is trained together with UGCA. Experimental results on two public datasets demonstrate that the proposed method outperforms existing state-of-the-art methods when images manifest notable anomalies.",

keywords = "Diffusion model, Medical image synthesis, Segment anything model",

author = "Haowen Pang and Xiaoming Hong and Peng Zhang and Chuyang Ye",

note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.; 29th International Conference on Information Processing in Medical Imaging, IPMI 2025 ; Conference date: 25-05-2025 Through 30-05-2025",

year = "2026",

doi = "10.1007/978-3-031-96628-6\_14",

language = "English",

isbn = "9783031966279",

series = "Lecture Notes in Computer Science",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "203--217",

editor = "Ipek Oguz and Shaoting Zhang and Metaxas, \{Dimitris N.\}",

booktitle = "Information Processing in Medical Imaging - 29th International Conference, IPMI 2025, Proceedings",

address = "Germany",

}

Pang, H, Hong, X, Zhang, P & Ye, C 2026, Cascaded Diffusion Model and Segment Anything Model for Medical Image Synthesis via Uncertainty-Guided Prompt Generation. 在 I Oguz, S Zhang & DN Metaxas (编辑), Information Processing in Medical Imaging - 29th International Conference, IPMI 2025, Proceedings. Lecture Notes in Computer Science, 卷 15829 LNCS, Springer Science and Business Media Deutschland GmbH, 页码 203-217, 29th International Conference on Information Processing in Medical Imaging, IPMI 2025, Kos, 希腊, 25/05/25. http://doi.org/10.1007/978-3-031-96628-6_14

Cascaded Diffusion Model and Segment Anything Model for Medical Image Synthesis via Uncertainty-Guided Prompt Generation. / Pang, Haowen; Hong, Xiaoming; Zhang, Peng 等.
Information Processing in Medical Imaging - 29th International Conference, IPMI 2025, Proceedings. 编辑 / Ipek Oguz; Shaoting Zhang; Dimitris N. Metaxas. Springer Science and Business Media Deutschland GmbH, 2026. 页码 203-217 (Lecture Notes in Computer Science; 卷 15829 LNCS).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Cascaded Diffusion Model and Segment Anything Model for Medical Image Synthesis via Uncertainty-Guided Prompt Generation

AU - Pang, Haowen

AU - Hong, Xiaoming

AU - Zhang, Peng

AU - Ye, Chuyang

N1 - Publisher Copyright: © The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.

PY - 2026

Y1 - 2026

N2 - Multi-modal medical images of the same anatomical structure enhance the diversity of diagnostic information. However, limitations such as scanning time, economic costs, and radiation dose constraints can hinder the acquisition of certain modalities. In such cases, synthesizing missing images from available images offers a promising solution. In recent years, methods based on deep learning, particularly Diffusion Models (DMs), have shown significant success in medical image synthesis. However, these methods can still struggle with the presence of notable image anomalies caused by pathologies. In this work, to improve the synthesis robustness to anomalies, we propose a model cascading the DM and the Segment Anything Model (SAM) with uncertainty-guided prompt generation for medical image synthesis. SAM is originally a foundational model for image segmentation. We hypothesize that SAM (the medical variant) is beneficial to the synthesis tasks because 1) the SAM encoder trained on large, diverse datasets allows the model to grasp a deep understanding of complex anomaly patterns of pathologies and 2) its ability to take prompt inputs naturally allows the synthesis to pay special attention to abnormal regions that are hard to synthesize. To effectively integrate the DM and SAM, we propose the uncertainty-guided prompt generation framework, where DM synthesis results with higher uncertainty are considered regions potentially with worse synthesis quality and prompts are generated for SAM accordingly to improve the result. First, as the DM produces outputs based on randomly sampled noise, we propose to estimate the uncertainty of its synthesis output by repeated noise sampling and represent the output uncertainty by the prediction standard deviation. Then, prompts are generated based on the standard deviation and given to SAM, together with the DM input image and output. For effective interaction between the prompt, DM input, and DM output, we propose an Uncertainty Guided Cross Attention (UGCA) module, where the prompt serves as a query to guide the model to focus on relevant regions of the DM input and output. Finally, a synthesis decoder replaces the SAM decoder, and it is trained together with UGCA. Experimental results on two public datasets demonstrate that the proposed method outperforms existing state-of-the-art methods when images manifest notable anomalies.

AB - Multi-modal medical images of the same anatomical structure enhance the diversity of diagnostic information. However, limitations such as scanning time, economic costs, and radiation dose constraints can hinder the acquisition of certain modalities. In such cases, synthesizing missing images from available images offers a promising solution. In recent years, methods based on deep learning, particularly Diffusion Models (DMs), have shown significant success in medical image synthesis. However, these methods can still struggle with the presence of notable image anomalies caused by pathologies. In this work, to improve the synthesis robustness to anomalies, we propose a model cascading the DM and the Segment Anything Model (SAM) with uncertainty-guided prompt generation for medical image synthesis. SAM is originally a foundational model for image segmentation. We hypothesize that SAM (the medical variant) is beneficial to the synthesis tasks because 1) the SAM encoder trained on large, diverse datasets allows the model to grasp a deep understanding of complex anomaly patterns of pathologies and 2) its ability to take prompt inputs naturally allows the synthesis to pay special attention to abnormal regions that are hard to synthesize. To effectively integrate the DM and SAM, we propose the uncertainty-guided prompt generation framework, where DM synthesis results with higher uncertainty are considered regions potentially with worse synthesis quality and prompts are generated for SAM accordingly to improve the result. First, as the DM produces outputs based on randomly sampled noise, we propose to estimate the uncertainty of its synthesis output by repeated noise sampling and represent the output uncertainty by the prediction standard deviation. Then, prompts are generated based on the standard deviation and given to SAM, together with the DM input image and output. For effective interaction between the prompt, DM input, and DM output, we propose an Uncertainty Guided Cross Attention (UGCA) module, where the prompt serves as a query to guide the model to focus on relevant regions of the DM input and output. Finally, a synthesis decoder replaces the SAM decoder, and it is trained together with UGCA. Experimental results on two public datasets demonstrate that the proposed method outperforms existing state-of-the-art methods when images manifest notable anomalies.

KW - Diffusion model

KW - Medical image synthesis

KW - Segment anything model

UR - http://www.scopus.com/pages/publications/105014492333

U2 - 10.1007/978-3-031-96628-6_14

DO - 10.1007/978-3-031-96628-6_14

M3 - Conference contribution

AN - SCOPUS:105014492333

SN - 9783031966279

T3 - Lecture Notes in Computer Science

SP - 203

EP - 217

BT - Information Processing in Medical Imaging - 29th International Conference, IPMI 2025, Proceedings

A2 - Oguz, Ipek

A2 - Zhang, Shaoting

A2 - Metaxas, Dimitris N.

PB - Springer Science and Business Media Deutschland GmbH

T2 - 29th International Conference on Information Processing in Medical Imaging, IPMI 2025

Y2 - 25 May 2025 through 30 May 2025

ER -

Pang H, Hong X, Zhang P, Ye C. Cascaded Diffusion Model and Segment Anything Model for Medical Image Synthesis via Uncertainty-Guided Prompt Generation. 在 Oguz I, Zhang S, Metaxas DN, 编辑, Information Processing in Medical Imaging - 29th International Conference, IPMI 2025, Proceedings. Springer Science and Business Media Deutschland GmbH. 2026. 页码 203-217. (Lecture Notes in Computer Science). doi: 10.1007/978-3-031-96628-6_14

Cascaded Diffusion Model and Segment Anything Model for Medical Image Synthesis via Uncertainty-Guided Prompt Generation

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此