Cloud and snow detection of remote sensing images based on improved Unet3+

2024-12-07 construction of cloud and snow detection datum - setIn this paper , remote sensing image take by Gaofen-2 ( GF-2 ) and Huanjing-1 ( HJ1A ) satellite a

construction of cloud and snow detection datum – set

In this paper , remote sensing image take by Gaofen-2 ( GF-2 ) and Huanjing-1 ( HJ1A ) satellite are used to construct a dataset . GF-2 imager the ground landscape by push and scanning , and can take panchromatic and multispectral image with high resolution , high positioning accuracy and high radiation quality . HJ1A satellite is mainly used for environment and disaster monitoring and prediction , with CCD camera and Hyperspectral imager . In order to test the effectiveness and accuracy of cloud and snow detection method propose in this paper , remote sensing image of various landform are select , as show in Fig . 8a , include ocean , plain , town , frozen soil , snow and desert , etc . When make the cloud and snow detection datum set , due to the diverse shape and irregular spatial distribution of cloud and snow in the image , in order to improve the annotation accuracy , this paper is adopts first adopt the simple linear iterative clustering method ( SLIC )^32,33 to perform super-pixel segmentation on remote sensing image, as shown in Fig. 8b, and determines the contour of cloud and snow at the sub-pixel level. Then, by manual labeling, cloud labels and snow labels are respectively labeled according to the outlined cloud and snow contours, as shown in Fig. 8c, which effectively reduces labeling errors and makes the produced data set more accurate. The label diagram contains three types of labels: red for cloud, green for snow, and black for background. Finally, the completed cloud and snow detection data set is processed with data enhancement, and a total of 5000 images are obtained. Among them, 4500 images are used as training set and 500 images are used as verification set, and the ratio of training set to verification set is 9:1.

Figure 8

Labels of remote sensing images of different landforms obtained after superpixel segmentation. (a) is the original remote sensing image, (b) is the result of super pixel segmentation, (c) is the label image .

Comparison of feature extraction network experiments

Resnet50 can extract deep feature information of image with high precision and moderate computation. In this paper, Resnet50 is used as the feature extraction network of Unet3+, which can accurately extract the feature information of cloud and snow from remote sensing images. In the constructed cloud and snow detection data set, the experimental comparison of VGG16, Resnet34, Resnet50 and Resnet101 is carried out respectively. Mean intersection over union (mIoU), mean pixel accuracy (mPA), mean precision (mPrecision) and Estimated Total Size are used as evaluation indexes, and the results are shown in Table 1. Among them, IoU, PA and Precision are common evaluation indexes in semantic segmentation, which are used to measure the similarity between segmentation results and real images. Their formulas are shown in (9), (10) and (11).

$$I{\text{oU}} = \frac{TP}{{TP + FN + FP}}$$

(9)

$ $ PA = \frac{TP + TN}{{TP + FP + FN + tn}}$$

(10)

$$Precision = \frac{TP}{{TP + FP}}$$

( 11 )

table 1 Comparison of experimental result of feature extraction network .

In the formula , TP represent the number of correctly classify positive class pixel , FP represent the number of correctly classify negative class pixel , FN represent the number of incorrectly classify positive class pixel , TN is represents represent the number of accurately classify negative class pixel .

As shown in the table, compared with other backbone networks, Resnet50 has the highest evaluation index and a moderate number of Estimated Total Size.

In this paper, the deep learning framework of PyTorch³⁴ is used to train and test the network model. The compilation environment is conda 4.12.0, the CPU is i7-10,700, 16 GB RAM, NVIDIA GeForce RTX 3070-8G and CUDA 11.2. In the experiment, batch-size is set to 8, initial learning rate is set to 0.01, and Adam optimizer is used to optimize the network. Adam optimizer is a first-order optimization algorithm that can replace the traditional stochastic gradient descent process. It can update the weight of neural network iteratively based on training data and adjust the gradient descent adaptively according to the size of learning rate. It has the advantages of simple implementation, high computational efficiency, less memory requirement, and the updating of parameters is not affected by the scaling transformation of gradient. Moreover, the hyperparameters have good interpretation, and usually need no adjustment or only fine tuning. To ensure good learning efficiency, every 50 epochs, the learning rate is divided by 10 to train a total of 100 epochs. The training process is shown in Fig. 9. It can be seen from the figure. that when the training arrived at the 70th epoch, the network model had tended to fit.

Figure 9

Schematic diagram of network model training process used in this paper .

Ablation experiment

In order to test the improvement of cloud and snow detection performance by each method describe in “ Methodology ” section , ablation experiment is conduct in this paper , and the result are show in Table 2 .

table 2 Comparison of ablation result .

Multidimensional image input: in the feature extraction stage, the input of the original network is RGB image, and the feature extraction of cloud and snow is insufficient. After HIS image is added, deeper color, texture and other feature information can be extracted, which improves the final cloud and snow detection result. The experimental results are shown in Table 2. mPA value, mIoU value and mPrecision value are improved by 0.54%, 0.17% and 0.68% respectively.

Full – scale feature fusion and cbam : Due to the rich color and wide coverage area of remote sensing image , in addition to the feature of cloud and snow , a lot of feature information of other ground object can be extract . In the process of cloud and snow detection , too much characteristic information is lead will lead to the dilution of cloud and snow feature and affect the accuracy of cloud and snow detection . Full – scale feature fusion is reduce and CBAM can effectively reduce the dilution of cloud and snow feature and strengthen the network ‘s attention to cloud and snow . As can be see from Table 2 , mpa value , miou value and mPrecision value increase by 1.38 % , 0.84 % and 0.41 % on the basis of multidimensional image input .

Weighted cross entropy loss: There is a lot of background information in the cloud and snow detection data set of remote sensing image, and the proportion of cloud and snow is relatively small. In the process of network model training, it is inevitable that the model will be biased to background information. The weighted cross entropy loss designed in this paper can effectively avoid this problem. When the number of pixels of a certain class is smaller, the weight coefficient will be larger, and the network will be better able to learn this class, in addition, it can also solve the problem of sample imbalance. As can be seen from Table 2, mPA value, mIoU value and mPrecision value are further improved. Finally, mPA value, mIoU value and mPrecision value reach 92.76%, 81.74% and 86.49% respectively.

Comparison and analysis of algorithm

Under the constructed cloud and snow detection data set, the proposed method is compared with other cloud detection methods based on Unet³⁵, Deeplabv3+²⁴ and cdunet²³. In this paper, intersection over union (IoU), mean intersection over union (mIoU), mean pixel accuracy (mPA), mean precision (mPrecision) and recall are selected as the measurement indexes of the experimental results, and the results are shown in Fig. 10. As can be seen from the figure, the IoU value, mIoU value, mPA value, mPrecision value and recall value of this method are 0.74, 0.82, 0.93, 0.87 and 0.93 respectively, compared with other cloud detection methods, it has been improved. This is because other methods only focus on clouds in remote sensing images. For snow with very similar features to clouds, it is easy to misjudge cloud and snow due to insufficient feature information extracted.

figure 10

comparison of evaluation result of iou and other 5 quantitative indicator .

In this paper , 20 remote sensing image of different landform include plain , snow field , ocean and desert are select from the validation set , and miou value and mpa value of different method are output respectively . The result are show in Figs . 11 and 12 . As can be see from the figure , the miou value is are and mpa value of the propose method are generally high than those of other method when test under remote sensing image of various landform , with the miou value generally reach above 0.75 and mpa value generally reach above 0.85 . There are a lot of broken cloud in 14 and 18 remote sensing image , and the cloud boundary information is complex , cdunet network is adds add high – frequency feature extractor and multi – scale convolution to refine the cloud boundary , and the detection result of broken cloud is slightly well than the method propose in this paper .

figure 11

comparison of miou value of different method .

figure 12

Comparison of mPA values of different methods.

As shown in Fig. 13, the proposed method is compared with the threshold segmentation method (OTSU) and the above neural network-based method, and the detection results are respectively output in eight landforms, including desert, town, ocean, plain, snow field and river. Figure 13c shows the detection results of OTSU. As only a single threshold value can be set, this method can detect clouds more accurately for a single remote sensing image with only clouds. However, for remote sensing images with snow and white ground objects, misdetection is serious. As shown in c3, c5 and c7, this method also misdetects white islands and snow as clouds. Figure 13d is the output result of Unet-based cloud detection method. Due to simple feature extraction, features of cloud and snow cannot be accurately distinguished, resulting in low cloud detection accuracy of this method and cloud boundary cannot be accurately located, as shown by c2. When there is interference from snow and white ground objects, it is also prone to false detection. As shown in d3, white islands are mistakenly detected as clouds, and snow in d5 is also mistakenly detected as clouds. Figure 13e shows the output of the cloud detection method based on Deeplabv3+. Due to the addition of a simple and effective decoder module, this method has a finer division of object boundary, which is greatly improved compared with Unet method. However, there will be false detection inside the cloud, and the distinction between cloud and snow is not high, as shown in e5 and e8. Figure 13f is the output of cloud detection method based on CDUnet. This method introduces high-frequency feature extractor and multi-scale convolution, which can well detect cloud and cloud shadow in the image, and has high accuracy in the distinction between cloud and snow. However, misdetection may occur at cloud junction, as shown in f4 and f5, false detection occurs at the cloud-snow junction and multi-cloud junction. Figure 13g is the output of the method in this paper, which can not only accurately detect the clouds in remote sensing images, but also distinguish the clouds and snow in images well, with few cases of missed detection and false detection.

Figure 13

prediction result (a) Original images; (b) Label images; (c) OTSU ; (d) Unet; (e) Deeplabv3+; (f) CDUnet; (g) proposed.