A GIS‐Integrated Active Learning Framework for Crash Severity Classification in Imbalanced Traffic Data
Published online on March 02, 2026
Abstract
["Transactions in GIS, Volume 30, Issue 2, April 2026. ", "\nABSTRACT\nClassifying urban traffic crash severity remains challenging because severe incidents are underrepresented in highly imbalanced datasets. This challenge is further intensified by spatiotemporal shifts in data distributions, which can degrade model performance over time. To address these challenges, this study proposes an integrated framework that combines Geographic Information Systems (GIS), resampling techniques, and active learning. GIS‐based random undersampling and the Synthetic Minority Oversampling Technique (SMOTE) were applied to generate a balanced dataset while preserving spatial and temporal structures. Five classification algorithms were evaluated, with Random Forest achieving the strongest baseline performance. To adapt to evolving data, an active learning framework was implemented to iteratively select uncertain samples for expert annotation, guided by Shapley Additive Explanations (SHAP). After annotating 20 samples, the proposed model achieved an F1‐score of 0.9064 and an AUC–PR of 0.9087, outperforming the baseline models. The results demonstrate that integrating GIS‐informed resampling with explainability‐guided active learning improves robustness and predictive accuracy under dynamic urban conditions. The proposed approach supports more reliable crash severity classification and offers practical insights for data‐driven traffic safety analysis and intervention planning.\n"]