Document Type : علمی - پژوهشی
Authors
1
Department of Geomatics Engineering, Faculty of Civil Engineering, University of Tabriz, Tabriz
2
Associate Prof., Dep. of Geomatics Engineering, Faculty of Civil Engineering, University of Tabriz, Tabriz, Iran
3
Department of Geomatics Engineering, Faculty of Civil Engineering, University of Tabriz, Tabriz, Iran
Abstract
The extraction of 3D geospatial information from the Earth’s surface using remote sensing and photogrammetric data has become a pivotal and widely utilized subject within the field of geosciences, attracting increasing attention from researchers in recent years. One of the most significant outputs of such data is the Digital Surface Model (DSM), which, in addition to representing the Digital Elevation Model (DEM), includes all natural and man-made features such as vegetation, trees, buildings, and other structures. DSM extraction plays a crucial role in a wide array of applications, including urban planning, building detection, disaster management, 3D modeling, and change monitoring. In recent years, remarkable advances in deep learning have significantly influenced the process of 3D information extraction from remote sensing data. Traditional 3D reconstruction methods often face challenges such as managing large datasets, complexity in extracting features, and difficulity in accessing acurate details. In this context, the use of deep neural networks for extracting complex features from multi-view images has introduced a transformative approach in this domain. A novel deep learning-based algorithm, Sat-MVSF, has recently been developed for DSM extraction from multi-view satellite images. This algorithm is designed to extract DSM from multi-view satellite images and performs all steps, from image preprocessing to final DSM generation, based on deep learning. Given the limited availability of training data and the authors' claims regarding the generalizability of the trained model weights, the objective of this study is to evaluate the performance of the Sat-MVSF algorithm in generating DSMs from high-resolution satellite images. The main innovations of this research include:
1) Preparation of three sets of WorldView-3 satellite data and two sets of ZY3-2 satellite data, involving block bundle adjustment for RPC refinement and reference DSM generation using LiDAR point clouds.
2) DSM extraction using the Sat-MVSF algorithm for multi-view images from both WorldView-3 and ZY3-2 sensors, followed by performance comparison against existing algorithms such as S2P and SS-DSM, as well as commercial software including CATALYST and ERDAS IMAGINE.
To ensure a comprehensive evaluation, the performance of all algorithms is analyzed across three types of areas: (1) non-built areas, (2) building areas with moderate elevation changes, and (3) building areas with significant elevation changes. The dataset used in this study consists of five sets of satellite images—three from WorldView-3 and two from ZY3-2—with each set containing three overlapping images. The results demonstrate that Sat-MVSF outperforms many existing algorithms and commercial software in DSM extraction. For WorldView-3 imagery, Sat-MVSF achieves an average vertical accuracy of 1.1 meters and completeness of 87%, surpassing SS-DSM and commercial tools. On the other hand, S2P provides slightly better height accuracy (1.0 meters), suggesting Sat-MVSF is less precise in terms of elevation RMSE but still competitive. However, the performance of the S2P algorithm on the WV3-3 dataset is highly dependent on the study area, given that it has low elevation completeness. In the ZY3-2 datasets, Sat-MVSF achieves elevation accuracies of 2.43 and 3.27 meters, indicating acceptable performance. More specifically, in the first two WorldView-3 datasets, S2P attains the best performance with completeness of 90.76% and 90.16%, and elevation accuracies of 0.94 and 1.1 meters, respectively. In the third dataset, Sat-MVSF leads with a completeness of 83% and a elevation accuracy of 1.04 meters. The obtained results show that S2P performs best in building zones with significant elevation changes with accuracies of 1.03, 1.14, and 0.88 meters for the first, second, and third datasets, respectively and CATALYST application achieves the highest accuracy in non-built-up areas with values of 0.71, 1.12, and 0.68 meters across the same datasets. Overall, commercial software such as CATALYST and ERDAS IMAGINE exhibit higher height errors in built-up areas which have significant elevation differences. The reason for this is that these softwares use interpolation methods to fill gaps, which reduces accuracy in building areas with height differences. Given that if the height threshold limit is considered to be a large number in calculating the height accuracy and height completeness evaluation criteria, the error increases, meaning that pixels with a high height error are considered as correct pixels, and both the height accuracy and height completeness criteria will optimistically have a high value. At a small height threshold limit, both criteria will have a low value.
Keywords