Prediction of Electrical Energy Consumption Using LSTM Algorithm with Teacher Forcing Technique

− Electrical energy is an important foundation in world economic growth, therefore it requires an accurate prediction in predicting energy consumption in the future. The methods that are often used in previous research are the Time Series and Machine Learning methods, but recently there has been a new method that can predict energy consumption using the Deep Learning Method which can process data quickly for training and testing. In this research, the researcher proposes a model and algorithm which contained in Deep Learning, that is Multivariate Time Series Model with LSTM Algorithm and using Teacher Forcing Technique for predicting electrical energy consumption in the future. Because Multivariate Time Series Model and LSTM Algorithm can receive input with various conditions or seasons of electrical energy consumption. Teacher Forcing Technique is able lighten up the computation so that it can training and testing data quickly. The method used in this study is to compare Teacher Forcing LSTM with Non-Teacher Forcing LSTM in Multivariate Time Series model using several activation functions that produce significant differences. TF value of RMSE 0.006, MAE 0.070 and Non-TF has RMSE and MAE values of 0.117 and 0.246. The value of the two models is obtained from Sigmoid Activation and the worst value of the two models is in the Softmax activation function, with TF values is RMSE 0.423, MAE 0.485 and Non-TF RMSE 0.520, MAE 0.519 .


INTRODUCTION
Energy is an important foundation for economic development in a country [1] and electricity is one of the main energy sources [2].Therefore, energy policy for a country is very important, because it not only helps the development of the country but also affects the environment both in the field of industrial operations and in the realm of low-income to elite residential areas.Due to the large amount of capital investment and the length of time it took to expand the capacity of electrical energy projects.Therefore, a good estimate or prediction is one of the requirements in the development of a more effective energy policy.because it can reduce the possibility of errors in electrical system planning.Therefore, producing an accurate forecast of electricity consumption is very important [3].
In recent years, many studies have used techniques in predicting electrical energy consumption, either using machine learning [4] or deep learning [5].In a study conducted by Karimbatar et al regarding data mining for energy consumption by comparing several algorithms in machine learning with Regression models [6], Neural Network [7], and SVM [8] and taking the best model obtained by the Regression model with a relative error of 0.9%, the output of this prediction shows that the average electricity consumption rate increases by about 3.2% per year and will reach 7,076,796 MW in 2020 from a population growth of 22.28% [9].In another study for the realm of machine learning conducted by Choi regarding the energy consumption analysis for homes using the K-means clustering algorithm from the data generated for K-7 with a silhouette score of 0.799 [10] Another study was also conducted by Nallathambin et al to predict consumption.The electrical energy that will be applied to the USA using the Decision Tree and Random Forest algorithms and experimental solutions states that the RF model provides better accuracy, namely 95.78% than the DT model with an accuracy of 91.6% and each model has an error rate 0.197 and 0.906 [11].
But at this time there is a new method, namely the Deep Learning method, which can process training quickly, at this time there are also many researchers conducting research on predictions using the method as reflected by Kim et al regarding the prediction of household electricity consumption using CNN-LSTM Hybrid Network [12], the proposed method can be quickly and accurately in predicting irregular energy consumption trends in the dataset of household power consumption.However, because the proposed method was processed earlier by the sliding window algorithm [13], this caused a prediction delay in the actual data [14], in other studies carried out by Young-Jun in electrical energy forecasting By comparing the models contained in the Deep Learning [15] including the LSTM, Gru, and SEQ2SEQ models with the results of the LSTM experiment get the best results with RMSE 0.96 [16], but this value is not good enough to use the actual data to use seasonal data features and Long term in forecasting more accurate electrical energy.Therefore, the researcher will propose a multivariate time series model [17] using the LSTM algorithm as a model and electrical energy prediction algorithm and Teacher Forcing Technique [18] to help in long-term predictions using public consumption datasets taken from the Smart Meters in London Some conditions or seasons.Because the Multivariate Time Series LSTM model algorithm can combine several input to training and testing and produce an output, therefore from various conditions or seasons for the consumption of electrical energy in the dataset will be used as input and will produce an output, namely predictions in the future Come accurately and precisely [19] But the algorithm has weaknesses in the long-term prediction because of the high computing side, then the teacher forcing will help in the long-term predictions because the algorithm can train repetitive networks quickly and efficiently due to output from Repeated LSTM will be used as a subsequent input so that it will produce low computing using the basic truth from the previous time step as input [20].With a dataset that the multivariate time series model and the algorithm can reduce RMSE [21] and can predict the consumption of electricity in the long term.
From the results of the training data, a comparison will be made with several activation functions such as ReLu, Softmax [22], Sigmoid [23] and the newest activation function found in RNN, namely Swish [24].

II.
RESEARCH METHODOLOGY At this research stage, there are steps taken by researchers including using literature studies, data analysis using data visualization techniques [25], then using preprocessing data [26] where the data that has been collected and processed is returned to be entered into the model architecture then the result data will be trained to get the results of a prediction of electrical energy consumption, as shown in Figure 1 below.

A. Data Collecting
At this stage are the steps in data collection to be used as a dataset of several seasons that are correlated with data on electrical energy consumption for each housing block.The following is the flow of the data collection stages as in Figure 2 below.

B. Training Model
The model that was built using the LSTM Multivariate Time Series Model with Teacher Forcing Technique, before going to the process, this study will propose to do some comparisons including comparing the LSTM algorithm with Non-Teacher Forcing and Teacher Forcing [27], to find which performance loss is better, then the results of the comparison will be continued using the Multivariate Time Series model and comparisons of several activation functions [28] contained in the Deep Learning method, the following is the flow of the training model that will be used in this study, as shown in Figure 3.

C. Architecture Model
The following is an architectural model using a comparison between LSTM Non-Teacher Forcing and LSTM Teacher Forcing in order to find the best performance loss of the two models, then the model will be entered into the Multivariate Time Series model, then it will be compared with several activations to find RMSE results.better.As in Figure 4 and Figure 5.

D. Activation Function
From the results of the eating model architecture, several activations will be compared, including sigmoid activation, ReLu, Sigmoid and Custom Activation.Development of the Sigmoid Activation, this activation functions to make the Neural Network non-linear [29].Sigmoid will accept a single number and convert the x value into a value that has a range from 0 to 1, which has the following formula.
And next is the ReLU or Rectified Linear Unit activation which is a pretty good activation function because ReLU greatly accelerates the convergence process carried out with stochastic gradient descent when compared to sigmoid/tanh with the following formula.
Because ReLU basically only creates a delimiter on the number zero, meaning that if x ≤ 0 then x = 0 and if x > 0 then x = x.Their experiment shows that Swish [30] tends to perform better than ReLu on deeper models across a number of challenging data sets with the following formula.

E. Measurement
Step experiment carried out which performance is better.Measurement using RMSE and MAE.Root Mean Square Error (RMSE) is the sum of the squared error or the difference between the true value and the predetermined predictive value.With the RMSE.

𝑅𝑀𝑆𝐸 = (𝑌 − 𝑌) 𝑛 (4)
Mean Absolute Error (MAE) shows the mean error value which is the error of the true value with the predicted value.MAE itself is generally used for measuring error prediction in time series analysis.The formula for MAE itself is defined as follows: hat formula (5) shows that Y' is Prediction Value, Y is Actual Value, and n is Total of Data.

A. Preprocessing
At this stage the data will be merged to form multivariate time series data so that data from several files such as weather and energy can be combined into a data file.The following is an example of a dataset that has been merged.The following is a description of the attributes resulting from the merging process, namely Time, Visibility, Temperature, Dew Point, Pressure, Wind Speed, Humidity, ID and Energy.Produces 2904 raw data or 2904 hours of data.Then proceed with the indexing, cleaning, and scaling data stages.Table 2 is an example of the final prepocesing stage.
B. Teacher Forcing and Non-Teacher Forcing with Univariate Time Series Model Figure 4 and Figure 5 have discussed the model architecture for teacher forcing and non-teacher forcing, the data that will be used first is the univariate time series data where the attributes to be learned are the data frame index and energy which have been changed from per hour to per minute, at this stage the data will be learned using an ADAM optimization of 15 epochs and a batch size of 100, here are the results of the model that has been trained.Multivariate Time Series Model At this stage is to conduct training on each attribute that will be used as input to be trained on the model that has been made, the following is a multivariate time series variable model that will be trained on the LSTM model using TF and Non-TF.

IV. CONCLUSION
The training model in this study uses several activation functions including tanh, Softmax, ReLu, Sigmoid, and swish activation.The results obtained show that Teacher Forcing LSTM is better than Non-Teacher Forcing LSTM in terms of performance loss and prediction which results in quite a significant difference.The TF value is RMSE 0.006, MAE 0.070 and Non-TF itself has RMSE and MAE values of 0.117 and 0.246.The value of the two models is obtained from sigmoid activation and the worst value of the two models is in the Softmax activation function, with TF values namely RMSE 0.423, MAE 0.485 and Non-TF RMSE 0.520, MAE 0.519.

Figure 2 .
Figure 2. Data CollectingAs in Figure2, where electricity consumption data is taken from public data, namely www.kaggle.comabout smart meter data in London, data is downloaded by 2 GB and

Figure 7 .
Figure 7. Non-Teacher Forcing Univariate Model From the results of training and testing, the LSTM model using the Teacher Forcing technique is better for performance loss and prediction on the univariate model, next is training on electrical energy consumption data with the Multivariate Time Series model using Teacher Forcing and Non-Teacher Forcing Techniques A. Teacher Forcing and Non-Teacher Forcing withMultivariate Time Series Model At this stage is to conduct training on each attribute that will be used as input to be trained on the model that has been made, the following is a multivariate time series variable model that will be trained on the LSTM model using TF and Non-TF.

Figure 8 .
Figure 8. Multivariate Time Series Electricity ConsumptionIn the previous training, univariate time series data were trained without using an activation function, now multivariate data will be trained using Tanh, Softmax, ReLu, Sigmoid, and Swish activations.The image below is the result of the training.

Figure 9 .
Figure 9. Prediction Result LSTM Teacher Forcing using Activation.And next is to measure the prediction results for the Non-Teacher Forcing LSTM model as follows.