Mask2Former for Satellite Image Segmentation

This model is a fine-tuned version of Mask2Former for semantic segmentation of satellite/aerial imagery.

Model Description

  • Architecture: Mask2Former with Swin Transformer backbone
  • Task: Semantic Segmentation (Land Cover Classification)
  • Dataset: OpenEarthMap
  • Fine-tuned from: facebook/mask2former-swin-base-ade-semantic

Training Results

Metric Value
Best mIoU 0.5202
Validation Loss 44.21
Training Epochs 17

Classes

This model classifies pixels into 9 land cover categories:

ID Class
0 Background
1 Bareland
2 Grass
3 Pavement
4 Road
5 Tree
6 Water
7 Cropland
8 Building

Usage

from transformers import Mask2FormerForUniversalSegmentation, Mask2FormerImageProcessor
from PIL import Image
import torch

# Load model and processor
model = Mask2FormerForUniversalSegmentation.from_pretrained("mfaytin/mask2former-satellite")
processor = Mask2FormerImageProcessor.from_pretrained("mfaytin/mask2former-satellite")

# Load and preprocess image
image = Image.open("satellite_image.tif").convert("RGB")
inputs = processor(images=image, return_tensors="pt")

# Run inference
with torch.no_grad():
    outputs = model(**inputs)

# Post-process to get segmentation map
segmentation = processor.post_process_semantic_segmentation(
    outputs,
    target_sizes=[image.size[::-1]]  # (height, width)
)[0]

# segmentation is a tensor of shape (H, W) with class IDs
print(f"Segmentation shape: {segmentation.shape}")
print(f"Unique classes: {torch.unique(segmentation).tolist()}")

Class Labels Mapping

CLASS_LABELS = {
    0: "Background",
    1: "Bareland",
    2: "Grass",
    3: "Pavement",
    4: "Road",
    5: "Tree",
    6: "Water",
    7: "Cropland",
    8: "Building",
}

Intended Use

This model is intended for:

  • Land cover classification from satellite/aerial imagery
  • Urban planning and environmental monitoring
  • Geographic information system (GIS) applications
  • Remote sensing research

Limitations

  • Geographic Bias: Trained primarily on imagery from specific regions in the OpenEarthMap dataset
  • Resolution Sensitivity: Best performance on imagery similar to training data resolution
  • Imagery Source: May require fine-tuning for different satellite sensors or aerial platforms
  • Seasonal Variation: Performance may vary across different seasons or weather conditions

Citation

If you use this model, please cite the OpenEarthMap dataset:

@inproceedings{xia2023openearthmap,
  title={OpenEarthMap: A Benchmark Dataset for Global High-Resolution Land Cover Mapping},
  author={Xia, Junshi and others},
  booktitle={WACV},
  year={2023}
}
Downloads last month
434
Safetensors
Model size
0.1B params
Tensor type
I64
·
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support