Model Card for Digepath
Digepath is a self-supervised foundation model for intelligent gastrointestinal pathology images analysis. Arxiv preprint paper: [https://arxiv.org/abs/2505.21928]
The model is a Vision Transformer Large/16 with DINO-V2 [1] self-supervised pre-training on 353 million multi-scale images from 210,043 H&E-stained gastrointestinal related slides.
Introduction of Digepath
Gastrointestinal (GI) diseases represent a clinically significant burden, necessitating precise diagnostic approaches to optimize patient outcomes. Conventional histopathological diagnosis suffers from limited reproducibility and diagnostic variability. To overcome these limitations, we develop Digepath, a specialized foundation model for GI pathology. Our framework introduces a dual-phase iterative optimization strategy combining pretraining with fine-screening, specifically designed to address the detection of sparsely distributed lesion areas in whole-slide images. Digepath was initially pretrained on a large-scale dataset comprising over 353 million multi-scale images derived from 210,043 H&E-stained slides of GI diseases. It was subsequently fine-tuned on 471,443 carefully selected regions of interest (ROIs) in the second stage. It attains state-of-the-art performance on 32 out of 33 tasks related to GI pathology, including pathological diagnosis, protein expression status prediction, gene mutation prediction, and prognosis evaluation. Digepath demonstrates broad applicability across diverse clinical tasks, highlighting its potential for reliable deployment in real-world pathology workflows.
Using Digepath to extract features from gastrointestinal pathology image
import timm
import torch
import torchvision.transforms as transforms
model = timm.create_model('hf_hub:xtxx/Digepath', pretrained=True, init_values=1e-5, dynamic_img_size=True)
preprocess = transforms.Compose([
transforms.Resize(224),
transforms.ToTensor(),
transforms.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),])
model = model.to('cuda')
model.eval()
input = torch.randn([1, 3, 224, 224]).cuda()
with torch.no_grad():
output = model(input) # [1, 1024]
Training Pipeline
- Self Supervised Learning: https://github.com/facebookresearch/dinov2
Evaluation Pipeline
- WSI Classification: https://github.com/lingxitong/MIL_BASELINE
- ROI Classification: https://github.com/lingxitong/HistoROIBench
- ROI Segmentation: https://github.com/lingxitong/PFM_Segmentation
Citation
If Digepath is helpful to you, please cite our work.
@article{zhu2025subspecialty,
title={Subspecialty-specific foundation model for intelligent gastrointestinal pathology},
author={Zhu, Lianghui and Ling, Xitong and Ouyang, Minxi and Liu, Xiaoping and Guan, Tian and Fu, Mingxi and Cheng, Zhiqiang and Fu, Fanglei and Zeng, Maomao and Liu, Liming and others},
journal={arXiv preprint arXiv:2505.21928},
year={2025}
}
References
[1] Oquab, Maxime, et al. "Dinov2: Learning robust visual features without supervision." arXiv preprint arXiv:2304.07193 (2023).
- Downloads last month
- 7
