Machine Learning Classification of AI‐Generated and Human‐Mapped Buildings in OpenStreetMap Using Morphometric Analysis

Abdulkadir Memduhoğlu

Published online on March 08, 2026

Abstract

["Transactions in GIS, Volume 30, Issue 2, April 2026. ", "\nABSTRACT\nThe proliferation of AI‐generated building footprints in OpenStreetMap (OSM) has transformed crowdsourced mapping, yet the geometric characteristics associated with different digitization methods remain poorly understood. This study presents a comprehensive morphometric analysis of more than 9 million building footprints across 15 geographically diverse cities spanning six continents. To investigate whether buildings labeled with AI source tags exhibit distinct geometric patterns, machine learning classifiers were trained on a regionally balanced dataset of approximately 1 million buildings. Thirty‐two shape‐based features encompassing size metrics, shape regularity, complexity measures, and anomaly detection indicators were extracted, and three gradient boosting classifiers were evaluated to distinguish these contributions. The best‐performing model (LightGBM with class weighting) was used to achieve 74.5% balanced accuracy, 82.9% recall, and 0.819 AUC, providing evidence that buildings labeled with AI source tags exhibit geometric patterns that differ systematically from other buildings in the dataset. Feature importance analysis revealed that orthogonality measurements, vertex density patterns, and shape regularity indices were the strongest discriminators, with orthogonality z‐scores alone accounting for 18.2% of model importance. AI‐generated buildings showed significantly higher orthogonality (angles within 5° of 90°), greater rectangularity, and more consistent vertex spacing compared to human‐digitized footprints. Geographic analysis revealed substantial variation in both AI adoption (0.15% in Berlin to 15.7% in Cairo) and model performance across regions, with balanced accuracy ranging from 67.5% (Asia) to 79.9% (Oceania). However, 34% of human‐mapped buildings exhibited AI‐like geometric patterns: an overlap reflecting multiple competing factors including methodological convergence, label uncertainty from hybrid workflows, OSM's version‐history limitations, and natural variation in human digitization practices, none of which can be definitively separated without independent ground truth validation. These findings provide exploratory evidence that geometric analysis might complement other approaches to understanding data provenance in crowdsourced platforms, though substantial overlap between categories limits the utility of morphometric features for definitive source attribution.\n"]