Collaborative surgical instrument segmentation for monocular depth estimation in minimally invasive surgery

Xue Li; Wenxin Chen; Xingguang Duan; Xiaoyi Gu; Changsheng Li

doi:10.1016/j.media.2025.103765

Collaborative surgical instrument segmentation for monocular depth estimation in minimally invasive surgery

Xue Li, Wenxin Chen, Xingguang Duan, Xiaoyi Gu^*, Changsheng Li

^*此作品的通讯作者

Beijing Institute of Technology

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

Depth estimation is essential for image-guided surgical procedures, particularly in minimally invasive environments where accurate 3D perception is critical. This paper proposes a two-stage self-supervised monocular depth estimation framework that incorporates instrument segmentation as a task-level prior to enhance spatial understanding. In the first stage, segmentation and depth estimation models are trained separately on the RIS, SCARED datasets to capture task-specific representations. In the second stage, segmentation masks predicted on the dVPN dataset are fused with RGB inputs to guide the refinement of depth prediction. The framework employs a shared encoder and multiple decoders to enable efficient feature sharing across tasks. Comprehensive experiments on the RIS, SCARED, dVPN, and SERV-CT datasets validate the effectiveness and generalizability of the proposed approach. The results demonstrate that segmentation-aware depth estimation improves geometric reasoning in challenging surgical scenes, including those with occlusions, specularities regions.

源语言	英语
文章编号	103765
期刊	Medical Image Analysis
卷	106
DOI	http://doi.org/10.1016/j.media.2025.103765
出版状态	已出版 - 12月 2025
已对外发布	是

访问文件

10.1016/j.media.2025.103765

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{b27708ef00e4458c80fc4c2ae3ffc33a,

title = "Collaborative surgical instrument segmentation for monocular depth estimation in minimally invasive surgery",

abstract = "Depth estimation is essential for image-guided surgical procedures, particularly in minimally invasive environments where accurate 3D perception is critical. This paper proposes a two-stage self-supervised monocular depth estimation framework that incorporates instrument segmentation as a task-level prior to enhance spatial understanding. In the first stage, segmentation and depth estimation models are trained separately on the RIS, SCARED datasets to capture task-specific representations. In the second stage, segmentation masks predicted on the dVPN dataset are fused with RGB inputs to guide the refinement of depth prediction. The framework employs a shared encoder and multiple decoders to enable efficient feature sharing across tasks. Comprehensive experiments on the RIS, SCARED, dVPN, and SERV-CT datasets validate the effectiveness and generalizability of the proposed approach. The results demonstrate that segmentation-aware depth estimation improves geometric reasoning in challenging surgical scenes, including those with occlusions, specularities regions.",

keywords = "Depth estimation, Instrument segmentation, Multi-task framework, Surgical vision",

author = "Xue Li and Wenxin Chen and Xingguang Duan and Xiaoyi Gu and Changsheng Li",

note = "Publisher Copyright: {\textcopyright} 2025 Elsevier B.V.",

year = "2025",

month = dec,

doi = "10.1016/j.media.2025.103765",

language = "English",

volume = "106",

journal = "Medical Image Analysis",

issn = "1361-8415",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Collaborative surgical instrument segmentation for monocular depth estimation in minimally invasive surgery

AU - Li, Xue

AU - Chen, Wenxin

AU - Duan, Xingguang

AU - Gu, Xiaoyi

AU - Li, Changsheng

PY - 2025/12

Y1 - 2025/12

N2 - Depth estimation is essential for image-guided surgical procedures, particularly in minimally invasive environments where accurate 3D perception is critical. This paper proposes a two-stage self-supervised monocular depth estimation framework that incorporates instrument segmentation as a task-level prior to enhance spatial understanding. In the first stage, segmentation and depth estimation models are trained separately on the RIS, SCARED datasets to capture task-specific representations. In the second stage, segmentation masks predicted on the dVPN dataset are fused with RGB inputs to guide the refinement of depth prediction. The framework employs a shared encoder and multiple decoders to enable efficient feature sharing across tasks. Comprehensive experiments on the RIS, SCARED, dVPN, and SERV-CT datasets validate the effectiveness and generalizability of the proposed approach. The results demonstrate that segmentation-aware depth estimation improves geometric reasoning in challenging surgical scenes, including those with occlusions, specularities regions.

AB - Depth estimation is essential for image-guided surgical procedures, particularly in minimally invasive environments where accurate 3D perception is critical. This paper proposes a two-stage self-supervised monocular depth estimation framework that incorporates instrument segmentation as a task-level prior to enhance spatial understanding. In the first stage, segmentation and depth estimation models are trained separately on the RIS, SCARED datasets to capture task-specific representations. In the second stage, segmentation masks predicted on the dVPN dataset are fused with RGB inputs to guide the refinement of depth prediction. The framework employs a shared encoder and multiple decoders to enable efficient feature sharing across tasks. Comprehensive experiments on the RIS, SCARED, dVPN, and SERV-CT datasets validate the effectiveness and generalizability of the proposed approach. The results demonstrate that segmentation-aware depth estimation improves geometric reasoning in challenging surgical scenes, including those with occlusions, specularities regions.

KW - Depth estimation

KW - Instrument segmentation

KW - Multi-task framework

KW - Surgical vision

UR - http://www.scopus.com/pages/publications/105013992757

U2 - 10.1016/j.media.2025.103765

DO - 10.1016/j.media.2025.103765

M3 - Article

C2 - 40848507

AN - SCOPUS:105013992757

SN - 1361-8415

VL - 106

JO - Medical Image Analysis

JF - Medical Image Analysis

M1 - 103765

ER -

Collaborative surgical instrument segmentation for monocular depth estimation in minimally invasive surgery

摘要

访问文件

其它文件与链接

指纹

引用此