Evaluating the efficiency of U-Net and XGBoost models in extracting building footprint information

Haghighi Gashti, Ehsan; Niroomand, Mohsen; Valadan Zoej, Mohammad Javad

doi:10.48308/gisj.2025.239416.1259

Evaluating the efficiency of U-Net and XGBoost models in extracting building footprint information

Articles in Press

Document Type : Original Article

Authors

¹ School of Surveying and Geospatial Engineering, College of Engineering, University of Tehran, Tehran, Iran

² Faculty of Geomatics Engineering, K. N. Toosi University of Technology, Tehran, Iran

10.48308/gisj.2025.239416.1259

Abstract

Introduction: Building footprint information, as one of the most important elements of spatial data, plays a key role in many urban applications, including urban planning, infrastructure management, environmental studies, and sustainable development (Haghighi Gashti et al., 2024; Zhao & Wang, 2014). Accurate and up-to-date access to this information can provide a suitable foundation for managerial decision-making. Extracting this information from high-resolution aerial and satellite images is one of the main challenges in the field of remote sensing and spatial data analysis (Bittner et al., 2018). In recent years, machine learning and deep learning algorithms have gained attention as advanced tools to address this problem. The main objective of this research is to compare the performance of two common approaches in the field of artificial intelligence—deep learning and machine learning models—for extracting building footprint information from high spatial resolution aerial images. In this regard, the U-Net model and the XGBoost model were examined to comprehensively evaluate these two models in terms of accuracy, the ability to detect precise building boundaries, and other quantitative metrics, with the aim of selecting the most appropriate method for practical applications in the field of geographic information systems.

Materials and Methods: For this study, a dataset consisting of aerial images from four different cities—Chicago, Paris, Zurich, and Berlin—was used. These images featured appropriate spatial and structural diversity, and their building footprint information was obtained from open-source data. The initial images were divided into patches of 512×512 pixels, and corresponding building masks were also generated. The data were then split into three parts: training (70%), validation (20%), and testing (10%). The U-Net model was trained using the Binary Cross Entropy loss function and optimized with the Adam algorithm. On the other hand, the XGBoost model, which is based on the combination of gradient-boosted decision trees, was trained using numerical feature extraction from images and tuning of various parameters, including tree depth, learning rate, and the number of trees. The XGBoost model parameters were selected through grid search.

Results and Discussion: To evaluate the performance of both models, five main metrics were used: precision, Intersection over Union (IoU), accuracy, recall, and F1-score. The results showed that the U-Net model outperformed the XGBoost model in all evaluation metrics. Specifically, the IoU and Accuracy values for the U-Net model were reported as 67.74% and 87.95%, respectively, while for the XGBoost model, they were 55.07% and 75.67%. Additionally, the U-Net model was able to more completely detect the boundaries of buildings while preserving the spatial and structural information of the buildings. Due to its specific architecture—which includes direct connections between the encoder and decoder parts—the U-Net model can extract image features directly without the need for manual feature engineering. However, high computational resource consumption and the requirement for large training datasets are among the challenges of deep learning models. On the other hand, although the XGBoost model is relatively simple and faster, it showed weaker performance in detecting precise building boundaries, especially in urban areas with high density and irregular boundaries, due to its dependency on extracted numerical features and its inability to directly process images. In some cases, this model failed to accurately distinguish between buildings and other similar objects.

Conclusion: The results of this study indicate that for applications such as precise extraction of building footprint information from aerial images—especially in areas with complex and dense urban structures—deep learning models like U-Net perform significantly better than machine learning models like XGBoost. However, in situations where training data are limited and computational resources are not available, using lighter models like XGBoost can also be beneficial. Finally, it is recommended that future research employ hybrid approaches to leverage the advantages of both models and improve the accuracy of spatial information extraction.

Keywords

Iranian Journal of Remote Sensing and GIS

Articles in Press, Accepted Manuscript
Available Online from 13 May 2025

Article View: 89

Evaluating the efficiency of U-Net and XGBoost models in extracting building footprint information

Articles in Press, Accepted Manuscript
Available Online from 13 May 2025

Files

Share

How to cite

Statistics

Evaluating the efficiency of U-Net and XGBoost models in extracting building footprint information

Articles in Press, Accepted Manuscript Available Online from 13 May 2025

Files

Share

How to cite

Statistics

Articles in Press, Accepted Manuscript
Available Online from 13 May 2025