Mask2Former for Satellite Image Segmentation
This model is a fine-tuned version of Mask2Former for semantic segmentation of satellite/aerial imagery.
Model Description
- Architecture: Mask2Former with Swin Transformer backbone
- Task: Semantic Segmentation (Land Cover Classification)
- Dataset: OpenEarthMap
- Fine-tuned from: facebook/mask2former-swin-base-ade-semantic
Training Results
| Metric | Value |
|---|---|
| Best mIoU | 0.5202 |
| Validation Loss | 44.21 |
| Training Epochs | 17 |
Classes
This model classifies pixels into 9 land cover categories:
| ID | Class |
|---|---|
| 0 | Background |
| 1 | Bareland |
| 2 | Grass |
| 3 | Pavement |
| 4 | Road |
| 5 | Tree |
| 6 | Water |
| 7 | Cropland |
| 8 | Building |
Usage
from transformers import Mask2FormerForUniversalSegmentation, Mask2FormerImageProcessor
from PIL import Image
import torch
# Load model and processor
model = Mask2FormerForUniversalSegmentation.from_pretrained("mfaytin/mask2former-satellite")
processor = Mask2FormerImageProcessor.from_pretrained("mfaytin/mask2former-satellite")
# Load and preprocess image
image = Image.open("satellite_image.tif").convert("RGB")
inputs = processor(images=image, return_tensors="pt")
# Run inference
with torch.no_grad():
outputs = model(**inputs)
# Post-process to get segmentation map
segmentation = processor.post_process_semantic_segmentation(
outputs,
target_sizes=[image.size[::-1]] # (height, width)
)[0]
# segmentation is a tensor of shape (H, W) with class IDs
print(f"Segmentation shape: {segmentation.shape}")
print(f"Unique classes: {torch.unique(segmentation).tolist()}")
Class Labels Mapping
CLASS_LABELS = {
0: "Background",
1: "Bareland",
2: "Grass",
3: "Pavement",
4: "Road",
5: "Tree",
6: "Water",
7: "Cropland",
8: "Building",
}
Intended Use
This model is intended for:
- Land cover classification from satellite/aerial imagery
- Urban planning and environmental monitoring
- Geographic information system (GIS) applications
- Remote sensing research
Limitations
- Geographic Bias: Trained primarily on imagery from specific regions in the OpenEarthMap dataset
- Resolution Sensitivity: Best performance on imagery similar to training data resolution
- Imagery Source: May require fine-tuning for different satellite sensors or aerial platforms
- Seasonal Variation: Performance may vary across different seasons or weather conditions
Citation
If you use this model, please cite the OpenEarthMap dataset:
@inproceedings{xia2023openearthmap,
title={OpenEarthMap: A Benchmark Dataset for Global High-Resolution Land Cover Mapping},
author={Xia, Junshi and others},
booktitle={WACV},
year={2023}
}
- Downloads last month
- 434