Collaborative surgical instrument segmentation for monocular depth estimation in minimally invasive surgery

Xue Li; Wenxin Chen; Xingguang Duan; Xiaoyi Gu; Changsheng Li

doi:10.1016/j.media.2025.103765

Collaborative surgical instrument segmentation for monocular depth estimation in minimally invasive surgery

Xue Li, Wenxin Chen, Xingguang Duan, Xiaoyi Gu^*, Changsheng Li

^*Corresponding author for this work

Beijing Institute of Technology

Research output: Contribution to journal › Article › peer-review

Abstract

Depth estimation is essential for image-guided surgical procedures, particularly in minimally invasive environments where accurate 3D perception is critical. This paper proposes a two-stage self-supervised monocular depth estimation framework that incorporates instrument segmentation as a task-level prior to enhance spatial understanding. In the first stage, segmentation and depth estimation models are trained separately on the RIS, SCARED datasets to capture task-specific representations. In the second stage, segmentation masks predicted on the dVPN dataset are fused with RGB inputs to guide the refinement of depth prediction. The framework employs a shared encoder and multiple decoders to enable efficient feature sharing across tasks. Comprehensive experiments on the RIS, SCARED, dVPN, and SERV-CT datasets validate the effectiveness and generalizability of the proposed approach. The results demonstrate that segmentation-aware depth estimation improves geometric reasoning in challenging surgical scenes, including those with occlusions, specularities regions.

Original language	English
Article number	103765
Journal	Medical Image Analysis
Volume	106
DOIs	http://doi.org/10.1016/j.media.2025.103765
Publication status	Published - Dec 2025
Externally published	Yes

Keywords

Depth estimation
Instrument segmentation
Multi-task framework
Surgical vision

Access to Document

10.1016/j.media.2025.103765

Cite this

@article{b27708ef00e4458c80fc4c2ae3ffc33a,

title = "Collaborative surgical instrument segmentation for monocular depth estimation in minimally invasive surgery",

abstract = "Depth estimation is essential for image-guided surgical procedures, particularly in minimally invasive environments where accurate 3D perception is critical. This paper proposes a two-stage self-supervised monocular depth estimation framework that incorporates instrument segmentation as a task-level prior to enhance spatial understanding. In the first stage, segmentation and depth estimation models are trained separately on the RIS, SCARED datasets to capture task-specific representations. In the second stage, segmentation masks predicted on the dVPN dataset are fused with RGB inputs to guide the refinement of depth prediction. The framework employs a shared encoder and multiple decoders to enable efficient feature sharing across tasks. Comprehensive experiments on the RIS, SCARED, dVPN, and SERV-CT datasets validate the effectiveness and generalizability of the proposed approach. The results demonstrate that segmentation-aware depth estimation improves geometric reasoning in challenging surgical scenes, including those with occlusions, specularities regions.",

keywords = "Depth estimation, Instrument segmentation, Multi-task framework, Surgical vision",

author = "Xue Li and Wenxin Chen and Xingguang Duan and Xiaoyi Gu and Changsheng Li",

note = "Publisher Copyright: {\textcopyright} 2025 Elsevier B.V.",

year = "2025",

month = dec,

doi = "10.1016/j.media.2025.103765",

language = "English",

volume = "106",

journal = "Medical Image Analysis",

issn = "1361-8415",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Collaborative surgical instrument segmentation for monocular depth estimation in minimally invasive surgery

AU - Li, Xue

AU - Chen, Wenxin

AU - Duan, Xingguang

AU - Gu, Xiaoyi

AU - Li, Changsheng

PY - 2025/12

Y1 - 2025/12

N2 - Depth estimation is essential for image-guided surgical procedures, particularly in minimally invasive environments where accurate 3D perception is critical. This paper proposes a two-stage self-supervised monocular depth estimation framework that incorporates instrument segmentation as a task-level prior to enhance spatial understanding. In the first stage, segmentation and depth estimation models are trained separately on the RIS, SCARED datasets to capture task-specific representations. In the second stage, segmentation masks predicted on the dVPN dataset are fused with RGB inputs to guide the refinement of depth prediction. The framework employs a shared encoder and multiple decoders to enable efficient feature sharing across tasks. Comprehensive experiments on the RIS, SCARED, dVPN, and SERV-CT datasets validate the effectiveness and generalizability of the proposed approach. The results demonstrate that segmentation-aware depth estimation improves geometric reasoning in challenging surgical scenes, including those with occlusions, specularities regions.

AB - Depth estimation is essential for image-guided surgical procedures, particularly in minimally invasive environments where accurate 3D perception is critical. This paper proposes a two-stage self-supervised monocular depth estimation framework that incorporates instrument segmentation as a task-level prior to enhance spatial understanding. In the first stage, segmentation and depth estimation models are trained separately on the RIS, SCARED datasets to capture task-specific representations. In the second stage, segmentation masks predicted on the dVPN dataset are fused with RGB inputs to guide the refinement of depth prediction. The framework employs a shared encoder and multiple decoders to enable efficient feature sharing across tasks. Comprehensive experiments on the RIS, SCARED, dVPN, and SERV-CT datasets validate the effectiveness and generalizability of the proposed approach. The results demonstrate that segmentation-aware depth estimation improves geometric reasoning in challenging surgical scenes, including those with occlusions, specularities regions.

KW - Depth estimation

KW - Instrument segmentation

KW - Multi-task framework

KW - Surgical vision

UR - http://www.scopus.com/pages/publications/105013992757

U2 - 10.1016/j.media.2025.103765

DO - 10.1016/j.media.2025.103765

M3 - Article

C2 - 40848507

AN - SCOPUS:105013992757

SN - 1361-8415

VL - 106

JO - Medical Image Analysis

JF - Medical Image Analysis

M1 - 103765

ER -

Collaborative surgical instrument segmentation for monocular depth estimation in minimally invasive surgery

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this