Skin Lesion Segmentation on ISIC 2018
A reproducible dermoscopy segmentation benchmark with DeepLabV3 baselines, planned U-Net and transformer comparisons, and promptable SAM-style extensions.
---
project status: completed baseline + active extensions
---
Overview
DermaSeg is a medical image segmentation project focused on automated skin lesion boundary detection from dermoscopic images. The project is built around the ISIC 2018 Task 1 lesion segmentation benchmark and is designed as a reproducible comparison framework across classical CNN segmentation models, attention-based architectures, lightweight transformer models, and promptable foundation-model baselines.
Rather than training a single model in isolation, DermaSeg is structured as a complete segmentation workflow: dataset loading, preprocessing, supervised training, validation-based checkpoint selection, test-set evaluation, qualitative visualization, and metric summarization.
Current Status: Strong ISIC Baseline
The current completed experiment uses a DeepLabV3 model with a pretrained ResNet-50 backbone for binary lesion-vs-background segmentation.
| Model | Best Val Dice | Best Val IoU | Best Val TJ | Test Dice | Test IoU | Test TJ |
|---|---|---|---|---|---|---|
| DeepLabV3 | 0.8900 | 0.8134 | 0.7598 | 0.8782 | 0.7991 | 0.7320 |
These results come from a local run on the official ISIC 2018 split, not from copied leaderboard values. The best checkpoint is selected using validation threshold Jaccard and then evaluated on the held-out test set.
Technical Approach
The project uses a supervised 2-class segmentation setup:
- Dataset: ISIC 2018 Task 1 lesion boundary segmentation.
- Input: RGB dermoscopic images resized to
320 × 320. - Model: DeepLabV3 with a pretrained ResNet-50 backbone.
- Loss: Combined cross-entropy and Dice loss.
- Selection Metric: Validation threshold Jaccard.
- Outputs: Saved checkpoints, metrics JSON files, Markdown experiment summaries, and qualitative prediction panels.
DeepLabV3 was chosen as the first strong baseline because it provides a serious non-U-Net comparison point. Its atrous convolutions and ASPP module allow the model to capture multi-scale lesion context, which is useful for irregular lesion boundaries and variable lesion sizes.
Model Coverage
DermaSeg is designed to compare multiple segmentation model families under a shared training and evaluation pipeline.
Supervised Segmentation Models
- U-Net
- Attention U-Net
- SegNet
- U-Net++
- DeepLabV3
- SwinUNetLite
- SegFormerLite
Promptable Foundation-Model Extensions
- SAM
- MedSAM
- Box-prompted evaluation workflow
- Separate promptable evaluation path to avoid mixing supervised and foundation-model comparisons unfairly
Qualitative Evaluation
The repository also exports qualitative prediction panels from the saved DeepLabV3 checkpoint. Each panel compares:
- the original dermoscopic input image,
- the ground-truth lesion mask overlay, and
- the predicted lesion mask overlay.
This makes the project more interpretable than a metrics-only benchmark and allows direct inspection of boundary quality, under-segmentation, over-segmentation, and lesion-shape errors.
Why This Project Matters
DermaSeg demonstrates practical medical image segmentation beyond a one-off notebook. It shows that the same dataset and evaluation protocol can be used to compare classical encoder-decoder CNNs, atrous-convolution models, transformer-style architectures, and promptable segmentation models.
For my broader work in medical imaging and computer vision, this project strengthens my experience with:
- reproducible segmentation pipelines,
- medical benchmark datasets,
- lesion boundary detection,
- Dice/IoU/Jaccard-based evaluation,
- qualitative model inspection,
- CNN and transformer segmentation architectures,
- and foundation-model extensions for medical image analysis.
Next Steps
The next phase of DermaSeg will focus on making the benchmark more complete and clinically meaningful:
- Loss Ablations: Compare cross-entropy + Dice, Tversky, and focal Tversky losses for imbalanced lesion segmentation.
- Architecture Benchmarking: Run full evaluations across U-Net, Attention U-Net, U-Net++, DeepLabV3, SwinUNetLite, and SegFormerLite.
- Promptable Segmentation: Evaluate SAM and MedSAM using box prompts derived from lesion masks.
- Failure-Mode Analysis: Categorize segmentation errors by lesion size, contrast, boundary ambiguity, and image artifacts.
- Portfolio Visualization: Add qualitative result panels directly to the project page for visual comparison across models.