arxiv:2510.16624

Self-Supervised Learning to Fly using Efficient Semantic Segmentation and Metric Depth Estimation for Low-Cost Autonomous UAVs

Published on Oct 18, 2025

Authors:

Sebastian Mocanu ,

Abstract

A vision-only autonomous flight system for small UAVs combines semantic segmentation and monocular depth estimation with adaptive scale factor algorithms and knowledge distillation to enable GPS-free navigation in indoor environments.

AI-generated summary

This paper presents a vision-only autonomous flight system for small UAVs operating in controlled indoor environments. The system combines semantic segmentation with monocular depth estimation to enable obstacle avoidance, scene exploration, and autonomous safe landing operations without requiring GPS or expensive sensors such as LiDAR. A key innovation is an adaptive scale factor algorithm that converts non-metric monocular depth predictions into accurate metric distance measurements by leveraging semantic ground plane detection and camera intrinsic parameters, achieving a mean distance error of 14.4 cm. The approach uses a knowledge distillation framework where a color-based Support Vector Machine (SVM) teacher generates training data for a lightweight U-Net student network (1.6M parameters) capable of real-time semantic segmentation. For more complex environments, the SVM teacher can be replaced with a state-of-the-art segmentation model. Testing was conducted in a controlled 5x4 meter laboratory environment with eight cardboard obstacles simulating urban structures. Extensive validation across 30 flight tests in a real-world environment and 100 flight tests in a digital-twin environment demonstrates that the combined segmentation and depth approach increases the distance traveled during surveillance and reduces mission time while maintaining 100% success rates. The system is further optimized through end-to-end learning, where a compact student neural network learns complete flight policies from demonstration data generated by our best-performing method, achieving an 87.5% autonomous mission success rate. This work advances practical vision-based drone navigation in structured environments, demonstrating solutions for metric depth estimation and computational efficiency challenges that enable deployment on resource-constrained platforms.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2510.16624 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2510.16624 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.16624 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.