Collaborative surgical instrument segmentation for monocular depth estimation in minimally invasive surgery

Xue Li, Wenxin Chen, Xingguang Duan, Xiaoyi Gu*, Changsheng Li

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Depth estimation is essential for image-guided surgical procedures, particularly in minimally invasive environments where accurate 3D perception is critical. This paper proposes a two-stage self-supervised monocular depth estimation framework that incorporates instrument segmentation as a task-level prior to enhance spatial understanding. In the first stage, segmentation and depth estimation models are trained separately on the RIS, SCARED datasets to capture task-specific representations. In the second stage, segmentation masks predicted on the dVPN dataset are fused with RGB inputs to guide the refinement of depth prediction. The framework employs a shared encoder and multiple decoders to enable efficient feature sharing across tasks. Comprehensive experiments on the RIS, SCARED, dVPN, and SERV-CT datasets validate the effectiveness and generalizability of the proposed approach. The results demonstrate that segmentation-aware depth estimation improves geometric reasoning in challenging surgical scenes, including those with occlusions, specularities regions.

Original languageEnglish
Article number103765
JournalMedical Image Analysis
Volume106
DOIs
Publication statusPublished - Dec 2025
Externally publishedYes

Keywords

  • Depth estimation
  • Instrument segmentation
  • Multi-task framework
  • Surgical vision

Fingerprint

Dive into the research topics of 'Collaborative surgical instrument segmentation for monocular depth estimation in minimally invasive surgery'. Together they form a unique fingerprint.

Cite this