Comparative Study on Multiple Strategies for Landslide Susceptibility Modeling via Geological Text Feature Fusion
Published online on March 08, 2026
Abstract
["Transactions in GIS, Volume 30, Issue 2, April 2026. ", "\nABSTRACT\nTo overcome the limitations of traditional landslide susceptibility assessment methods, which predominantly rely on numerical features and thus struggle to characterize underlying geological mechanisms, this study proposes a novel assessment method that integrates geological text features. The combined performance of seven text encoding strategies (including BERT, TF‐IDF, and Word2Vec) with six typical machine learning models was systematically compared. Furthermore, model generalization capability was rigorously evaluated using spatial block cross‐validation. Taking Chongqing, China, as the study area, experimental results indicate the following: (1) the introduction of geological text features substantially improved model performance. The optimal combination, LightGBM + Word2Vec, achieved an Area Under the Curve (AUC) of 0.932, outperforming the purely numerical baseline model by over 10%. (2) Amongst text encoding mechanisms, the statistics‐based TF‐IDF and the distributed embedding‐based Word2Vec demonstrated equivalent superior performance (p > 0.05). The former excels at capturing explicit hazard‐causative keywords, whereas the latter possesses advantages in modeling implicit semantic continuity. (3) Spatial block validation revealed that models integrating semantic features maintained high robustness even in unseen areas (AUC > 0.85). Overall, this study confirms that geological semantic features are critical variables for enhancing the accuracy and interpretability of landslide susceptibility mapping. The proposed approach provides a new paradigm for disaster assessment in regions lacking high‐precision geological maps but possessing abundant geological survey data.\n"]