TY - JOUR
T1 - Collaborative surgical instrument segmentation for monocular depth estimation in minimally invasive surgery
AU - Li, Xue
AU - Chen, Wenxin
AU - Duan, Xingguang
AU - Gu, Xiaoyi
AU - Li, Changsheng
N1 - Publisher Copyright:
© 2025 Elsevier B.V.
PY - 2025/12
Y1 - 2025/12
N2 - Depth estimation is essential for image-guided surgical procedures, particularly in minimally invasive environments where accurate 3D perception is critical. This paper proposes a two-stage self-supervised monocular depth estimation framework that incorporates instrument segmentation as a task-level prior to enhance spatial understanding. In the first stage, segmentation and depth estimation models are trained separately on the RIS, SCARED datasets to capture task-specific representations. In the second stage, segmentation masks predicted on the dVPN dataset are fused with RGB inputs to guide the refinement of depth prediction. The framework employs a shared encoder and multiple decoders to enable efficient feature sharing across tasks. Comprehensive experiments on the RIS, SCARED, dVPN, and SERV-CT datasets validate the effectiveness and generalizability of the proposed approach. The results demonstrate that segmentation-aware depth estimation improves geometric reasoning in challenging surgical scenes, including those with occlusions, specularities regions.
AB - Depth estimation is essential for image-guided surgical procedures, particularly in minimally invasive environments where accurate 3D perception is critical. This paper proposes a two-stage self-supervised monocular depth estimation framework that incorporates instrument segmentation as a task-level prior to enhance spatial understanding. In the first stage, segmentation and depth estimation models are trained separately on the RIS, SCARED datasets to capture task-specific representations. In the second stage, segmentation masks predicted on the dVPN dataset are fused with RGB inputs to guide the refinement of depth prediction. The framework employs a shared encoder and multiple decoders to enable efficient feature sharing across tasks. Comprehensive experiments on the RIS, SCARED, dVPN, and SERV-CT datasets validate the effectiveness and generalizability of the proposed approach. The results demonstrate that segmentation-aware depth estimation improves geometric reasoning in challenging surgical scenes, including those with occlusions, specularities regions.
KW - Depth estimation
KW - Instrument segmentation
KW - Multi-task framework
KW - Surgical vision
UR - http://www.scopus.com/pages/publications/105013992757
U2 - 10.1016/j.media.2025.103765
DO - 10.1016/j.media.2025.103765
M3 - Article
C2 - 40848507
AN - SCOPUS:105013992757
SN - 1361-8415
VL - 106
JO - Medical Image Analysis
JF - Medical Image Analysis
M1 - 103765
ER -