Climate Prediction Using RNN LSTM to Estimate Agricultural Products Based on Koppen Classification

− The yield of an agricultural process is very important and influential, where the harvest is used as a support for human life both as food and a source of income. Many factors can influence the success of agriculture, such as human resources, seed quality and climate that is going on around in the surrounding area. One of the important factors is which climate, the accuracy of determining the climate for agriculture will affect the results obtained. The wrong prediction in determining the future climate will cause crop failure due to incompatibility with the type of plant. In this era, many technologies have been able to predict climate, one of which is technology machine learning that has many types and techniques, which machine learning technology has been widely used in predicting many things. This study aims to predict the climate in an area which is intended to determine crop yields based on the Koppen classification, and also the prediction based on several parameters such as temperature, humidity, duration of sun exposure and rainfall. And the results of this study is have a loss of 0.006 and with the MAPE value as an indicator of the percentage error and as an indicator for determining the accuracy of the prediction results, which is 3.29%, which means that it is included in the very accurate category in predicting climate to estimate agricultural yields.


INTRODUCTION
In Foodstuff is one of the important things for human life that can be produced from the agricultural process [1]. Based on it, the results of agriculture are one of the fields that are very close and can support the advancement of economic progress, the better the results obtained, the better the level of the economy [2]. To produce good quality crops, there are many aspects that need to be considered, such as from the climate aspect [3]. The climate itself is the weather conditions in an area that exist at a certain time and is influenced by several factors such as temperature and humidity [4].
Judging from these aspects, climate can be categorized into several types such as rainy season ad dry season, which every season will be affected from every aspect such as temperature, sunlight or humidity [5]. Other then that, changing climates can be influenced by changes in rainfall and also ambient temperature that occurs over a certain period of time [6].
The process of climate forecasting is a difficult thing to do in view of the uncertain climate change. This makes the yields that will be obtained at a certain time period difficult to know. Incorrect climate forecasts in an area can cause considerable losses in agriculture, such as decreasing crop quality, crop failure, decreasing the number of crops and damaged crops [7].
In this era, there have been many technologies capable of predicting the climate in agriculture, one of which is mechine learning technology [8]. Mechine learning technology is one of the mechine lessons that is intended so that a mechine can have the ability to predict and analyze and recognize a pattern [9]. Mechine learning itself has many methods such as K-means, Recurrent Neural Networks, Decision Trees, Artificial Neural Networks, etc [10].
Several previous studies have examined the mechine learning technology used to predict climate. Mechine learning was used to predict maize and soybean yields using the Convolutional Neural Network and Recurrent Neural Network methods. The parameters used are weather, performance, agricultural and soil management [11].
Previous research has also studied mechine learning used to predict climate to be able to see crop yields with parameters temperature, rainfall, cloud cover, humidity and the method used, namely the Decision Tree [12]. Several other studies have also made predictions using the Recurrent Neural Network method which is used to predict bad weather [13], the temperature for the surrounding area [14], wind velocity [15], temperature for daily [16] and rain [17].
There have been many previous studies that have examined the use of technology learning to predict climate in determining agricultural yields. However, there has not been any research that examines the use of mechine learning using the RNN method with Long Short Term Memory and the application of several parameters such as humidity, temperature, rainfall and solar radiation combined with the use of the Koppen classification in classifying climate types. RNN itself is a learning mechine that conducts learning by reviewing previous information. Whereas LSTM is a type of Recurrent Neural Network method that is able to overcome the weaknesses of RNN so that it can improve performance and increase accuracy [18]. On this basis, this method is used, in which climatic conditions are closely related to climatic conditions in the past [19].
The purpose of this research is to predict the climate which is intended to see the yield from agriculture according to Koppen's classification using the RNN LSTM method using temperature, humidity, rainfall and solar radiation as parameters.

II. RESEARCH METHODOLOGY
The object of this research is the climate in the city of Bandung, West Java, the results of rice farming and the Recurrent Neural Network method for the prediction process. In this study, there are several steps that will be carried out in predicting the climate to determine agricultural yields. The steps taken in predicting climate in this study can be seen in Figure 1.

A. Data Collection
The selection of data in this study is based on the use of Koppen theory which will be used to categorize climate types. The data used in this study are rainfall data, average temperature, duration of sun exposure and humidity for the last 10 years from 2010 to 2020. The data is obtained from the daily climate data of BMKG Bandung City, West Java Province, Bandung Geophysical Station through the official website. BMKG in file format (.csv). This data is daily data, so the amount of data obtained is 3650 x 4 climate parameters = 14600 data

B. Data Training
The training data process will be carried out using the RNN LSTM method which was previously carried out data processing or called data preprocessing, which aims to make the data used for training better and appropriate.

Preprocessing Training Data
In this study, the preprocessing was carried out in four stages, namely interpolation, feature extraction, segmentation, and normalization. − Data Interpolation Interpolation is the process of finding the value between several known data points. The purpose of interpolation is to correct data that is not measured or not recorded by BMKG by finding the middle value between two values.

− Feature Extraction
Feature extraction is a process to find the largest value in each variable, so that the data will become monthly data with the maximum value of each variable. − Normalization Normalization is the process of converting data into normal form. Normalization is needed when the data is very large, very small, or has different units. In this process, the Min-Max normalization is carried out by scaling in the range of zero to one [20].
X -Data to be normalized, max -The highest data in the column, min -The lowest data in the column

− Segmentation
Segmentation is the process of separating and grouping data from raw data into data needed by the system. In this study, data that has become monthly data will be grouped into one year or 12 months with an overlap process, where the one and the next training data has a difference of one month. For example, there are 120 months of climate parameter data, then the distribution of data starts from the 1st month to the 12th month which is used as the first training data, the second month to the 13th month as the second training data, the third month to the 14th month as the third training data, and so on until the 109th training data, namely the 112th month to the 120th month

RNN LSTM Training
In the data normalization stage, 109 sets of data were produced to be used as input for the training process using Recurrent Neural Networks. Each data set has 48 total data obtained from 12 months x 4 variables, so that the neurons for input are 36 units. These neurons are connected to neurons contained in the hidden layer, where in the hidden layer there are cells, this process is what distinguishes between ordinary RNN and LSTM. In this cell, there are several steps to be able to produce output.

C. Climate Prediction
At this prediction stage, the MLP architecture will be used in the Recurrent Neural Networks method to predict climate, with the input data used in the input layer, namely rainfall, average temperature, duration of sun exposure and humidity. The MLP architecture designed in the Recurrent Neural Networks method can be seen in As can be seen in Figure 2 regarding the MLP architecture used in this study to predict climate, where there are 48 neurons in the input layer obtained from 12 months x 4 climate parameters. There are 13 neurons in the hidden layer which are calculated using equation 2.9 and in the output layer there is one neuron that represents the output of the predictions. The testing process uses the weight of the training results that have been stored in the form of a file, and the testing process will produce an output value in the form of a climate prediction value for each variable. Then the climate value will be entered into the range of values that have been determined using the Koppen Classification

D. Prediction Result
This study will produce a prediction of the climate based on four parameters such as rainfall data, average temperature, duration of sun exposure and humidity for the next one month and then the prediction results will be categorized into climate types based on Koppen's theory. The prediction result can be seen in Figure 3.

III. RESULTS AND DISCUSSION
The test consisted of the method effect test, namely between the optimization model effect test, and the test for the effect of the number of epochs. At this stage an analysis of each tested parameter is carried out. Before test it,, the training process has been carried out using data that has passed the data preprocess. The training process will be carried out using the data in the Table 2 by using the RNN LSTM architecture. In addition, training is also carried out using two optimizers, namely Adam and Sigmod with several epoch including 50, 100, 200 and also used 0,001 for learning rate.

A. Optimization Model Influence Test
The optimization model effect test is conducted to determine which optimization model is more suitable for predicting climate. This optimization model is used to update the weights during training using two optimization models, namely Adam optimizer and SGD optimizer. The accuracy results from testing the two optimization models can be seen in Table 3. After testing, it was found that the Adam optimization method has a lower loss than SGD. Graph of SGD and Adam test results can be seen in Figure 3 and Figure 4. It is found that Adam's optimization has a lower loss than SGD optimization, which is 0.006 and with MAPE measurement as an indicator of the percentage error of the prediction results, which is 3.29%, which means that it is included in the very accurate category. Meanwhile, SGD itself has a loss of 0.011 and MAPE of 13.1%, which means it is in the good category

B. Test of the Influence of the Number of Epochs
This study used 200 epochs with low loss results. The results of the epoch test can be seen in Table 4.

IV.
CONCLUSION This research produces climate prediction using RNN LSTM. This prediction can provide results in the form of climate forecasts for the next day. The results of this prediction are in the form of climate types belonging to tropical climates, dry climates, temperate climates, continental climates and arctic climates. This prediction can be used by the user in predicting the future climate as a benchmark to determine which farm will produce good yields for that climate. Climate prediction in this study produces low loss using Adam's optimization and learning rate of 0.001 and the number of epochs is 200, which results in a loss of 0.006 and with the MAPE value as an indicator of the amount of error percentage and as an indicator of determining the accuracy of the prediction results, which is 3.29%. means it falls into the very accurate category.