Mask2Former for Satellite Image Segmentation

This model is a fine-tuned version of Mask2Former for semantic segmentation of satellite/aerial imagery.

Model Description

Architecture: Mask2Former with Swin Transformer backbone
Task: Semantic Segmentation (Land Cover Classification)
Dataset: OpenEarthMap
Fine-tuned from: facebook/mask2former-swin-base-ade-semantic

Training Results

Metric	Value
Best mIoU	0.5202
Validation Loss	44.21
Training Epochs	17

Classes

This model classifies pixels into 9 land cover categories:

ID	Class
0	Background
1	Bareland
2	Grass
3	Pavement
4	Road
5	Tree
6	Water
7	Cropland
8	Building

Usage

from transformers import Mask2FormerForUniversalSegmentation, Mask2FormerImageProcessor
from PIL import Image
import torch

# Load model and processor
model = Mask2FormerForUniversalSegmentation.from_pretrained("mfaytin/mask2former-satellite")
processor = Mask2FormerImageProcessor.from_pretrained("mfaytin/mask2former-satellite")

# Load and preprocess image
image = Image.open("satellite_image.tif").convert("RGB")
inputs = processor(images=image, return_tensors="pt")

# Run inference
with torch.no_grad():
    outputs = model(**inputs)

# Post-process to get segmentation map
segmentation = processor.post_process_semantic_segmentation(
    outputs,
    target_sizes=[image.size[::-1]]  # (height, width)
)[0]

# segmentation is a tensor of shape (H, W) with class IDs
print(f"Segmentation shape: {segmentation.shape}")
print(f"Unique classes: {torch.unique(segmentation).tolist()}")

Class Labels Mapping

CLASS_LABELS = {
    0: "Background",
    1: "Bareland",
    2: "Grass",
    3: "Pavement",
    4: "Road",
    5: "Tree",
    6: "Water",
    7: "Cropland",
    8: "Building",
}

Intended Use

This model is intended for:

Land cover classification from satellite/aerial imagery
Urban planning and environmental monitoring
Geographic information system (GIS) applications
Remote sensing research

Limitations

Geographic Bias: Trained primarily on imagery from specific regions in the OpenEarthMap dataset
Resolution Sensitivity: Best performance on imagery similar to training data resolution
Imagery Source: May require fine-tuning for different satellite sensors or aerial platforms
Seasonal Variation: Performance may vary across different seasons or weather conditions

Citation

If you use this model, please cite the OpenEarthMap dataset:

@inproceedings{xia2023openearthmap,
  title={OpenEarthMap: A Benchmark Dataset for Global High-Resolution Land Cover Mapping},
  author={Xia, Junshi and others},
  booktitle={WACV},
  year={2023}
}

Downloads last month: 434

Safetensors

Model size

0.1B params

Tensor type

I64

F32