Open-Vocabulary Surgical Tool Detection and Tracking

A real endoscopy pipeline that turns text prompts into surgical tool boxes, masks, tracking overlays, and evaluation metrics using Grounding DINO and SAM2.

active prototype Surgical Robotics Tool Tracking Open-Vocabulary AI Segmentation
Project Website Source Code

project status: active prototype

SurgiPrompt maps text prompts such as forceps, grasper, catheter, and guidewire directly onto real endoscopic and laparoscopic video frames.

The pipeline combines Grounding DINO for open-vocabulary box localization with SAM2 mask refinement and video propagation. It supports real-data inference, dataset evaluation, and fine-tuning against endoscopy datasets exported in COCO-style formats.

Core pieces:

  • Prompt-driven surgical tool localization without fixed closed-set labels
  • SAM2 mask refinement and video tracking overlays
  • Real dataset support for Endoscapes2023, Kvasir-Instrument, and generic COCO exports
  • Tracked metrics for mAP, mask IoU, FPS, and failure-case inspection