Cloud-Type Classification for Southeast China Based on Geostationary Orbit EO Datasets and the LighGBM Model

2024-12-07 1. IntroductionClouds have a significant role in the evolution of weather, climate, the global water cycle, and the global radiative budget with an av

1. Introduction

Clouds have a significant role in the evolution of weather, climate, the global water cycle, and the global radiative budget with an average annual coverage of more than 60% of the Earth [

] . Different cloud types provide crucial cues for weather analysis and forecasting by reflecting various atmospheric states and circulation scenarios. Cloud types have clear indications of weather conditions and weather changes. For example, stratocumulus and nimbostratus are low clouds with clear indications of stratiform precipitation, cumulus, and cumulonimbus are effective indicators of convective precipitation, and cirrus clouds are generally not accompanied by precipitation [

] . In addition, the cloud classification products are important for the accuracy improvement of the retrieval results of meteorological parameters, such as cloud base height and precipitation intensity [

] .

The southeastern region of China is located in the obvious monsoon zone. The abundant water vapor, as well as the power and thermal effects of the Tibetan Plateau, provide favorable conditions for the formation of clouds and precipitation [

] . The strength of the East Asian summer monsoon circulation affects the weather and climate of the region in significant ways. The seasonal rain belts, the beginning and end of the rainy season, and the structure of rain patterns are closely related to the East Asian summer monsoon [

] . Droughts and floods triggered by changes in regional precipitation seriously affect the production, life, and economic activities of local people. Furthermore, strong convective clouds may also trigger strong convective weather such as thunderstorms and gales, short-term heavy precipitation, and hail [

] , result in serious economic loss and even casualty . Therefore , it is is is of great significance to achieve accurate cloud – type classification in the southeastern region of China .

Clouds are classified by meteorological operations into three families and ten genera based on their macro-structural features, including cloud base height, morphology, and structure. This method is considered the most effective way to distinguish cloud types artificially [

] . However, artificial identification has a certain degree of subjectivity, which will have a certain impact on the identification accuracy, and it is not possible to carry out continuous observation over a wide area. Geostationary satellites have the capability to carry out uniform and continuous observation of the same target area for a long time, making them an effective means of cloud observation. However, the traditional classification standard for ground-based cloud observations is not suitable for satellite data due to the different observation angles. The International Satellite Cloud Climatology Project (ISCCP) proposes a set of cloud classification standards [

], which classifies clouds into nine categories according to the cloud top pressure and optical thickness. The cloud classification product data specifically from the Himawari-8 geostationary satellite is based on the ISCCP standard.

currently used cloud detection and classification algorithm can be classify into three category : simple method , statistical method , and artificial intelligence method [

] . The threshold-based classification is the most commonly used simple cloud classification method [

], which occupies less computational resources but has low accuracy. Statistical methods have higher accuracy compared to simple methods. However, those methods also require more computational power, such as K-means [

] and maximum-likelihood estimation [

] . With the recent technological developments, more and more research is focusing on the use of artificial intelligence in mapping clouds and their properties. Artificial intelligence is a method of using computers and machines to mimic the way the human brain thinks to solve problems and make decisions, and machine learning is a subfield of artificial intelligence. Traditional machine learning requires manual extraction of feature parameters for model training. Classical machine learning-based methods train models to classify cloud types by manually selecting feature parameters, such as brightness temperature, and texture. Such methods, for example, Support Vector Machines [

] and random forests [

] , have demonstrate that machine learning method can be successfully used for cloud detection and classification task .

The FY-4A geostationary satellite is has has no product base on the ISCCP cloud classification standard , but some researcher have already conduct research on FY-4A base on the cloud – type classification product of Himawari-8 [

] . Nevertheless, they mostly mapped the Himawari-8 satellite data directly to FY-4A data without considering the inter-satellite parallax effect, which may lead to the wrong matching of cloud-type labels. For supervised learning, label mislabeling will have a great impact on the model accuracy. Therefore, it is an imperative step to perform quality control on labeled data before constructing the dataset to minimize matching errors and the impact of data quality on the model.

Among machine learning algorithm , LightGBM is become has become a research hotspot in recent year due to its fast training speed , low resource consumption , strong generalization ability , and it is not prone to overfitte . LightGBM is is is a decision tree – base gradient – boost algorithmic framework propose by Microsoft in 2017 [

] . It is widely used in various fields such as geomatics [

] , Earth Observation ( EO ) [

], and meteorology [

] .

In the purview of the above, the present study aimed to propose a cloud-type classification retrieval based on the LightGBM model. Currently, there have been several studies on the development of AGRI cloud classification products, but most of them do not consider the effect of parallax. This is where the contribution of this study lies. A sliding detection method was used for the quality control of the cloud-type data and to reduce the effect of parallax on the model. Bayesian optimization was used to optimize the model hyperparameters. Furthermore, comparisons were made with other cloud classification methods to assess the accuracy of the retrieval model in this paper.

5 . conclusion

In this paper, we proposed a cloud classification model for FY-4A satellite observation images that can be used in the southeast region of China. To address the problem of parallax between geostationary satellites, we used the sliding detection method for the quality control of the cloud-type label data to reduce the impact of label-matching errors on the accuracy of the model. The setting of hyperparameters will largely affect the classification effect of the machine learning model, and by introducing the Bayesian optimization method, we obtained the optimal hyperparameter combination of the model. We found that Bayesian optimization adjusted the model hyperparameters well. Compared with the model using only the bright temperature of the infrared channels, the introduction of albedo and bright temperature difference improved the sensitivity of the model for cloud thickness identification to some extent. Finally, we achieved a quantitative assessment of regional cloud-type classification. For validation, we compared it with the Himawari-8 CLP product after quality control and the Acc of the optimal model reached 97.54%, which was greater than the 51.06% of TT, 96.47% of SVM, and 97.49% of RF, and the model achieved the best results. Combined with the feature importance analysis, it further revealed the degree of influence of different input channels on the cloud classification task in the region. The results of this study can provide useful information for meteorological operations in the region.

However, some cloud types were not well recognized due to factors such as the model feature variables and the sample imbalance of each type of cloud. We plan to solve these problems by further improving the dataset and optimizing the model in our future work.