The Prediction of Gold Price Movement by Comparing Naive Bayes, Support Vector Machine, and K-NN

− Gold is a yellow precious metal that can be forged so it is easy to form with various forms of jewelry such as pendants, earrings, rings, bracelets and others, gold has a high value. Gold itself is an exchange rate used in ancient times before the existence of money as it is today. Gold also can be used as an investment that is profitable for the investor and it has less risks. Investment is a form of fund management to give benefit by putting fund in allocation that is predicted will give additional benetifs. Prediction of gold price movements or predictions of gold price in gold stock investment, this research uses 3 (three) algorithms that will be implemented in analysis and increase accuracy, in the discussion or research that was made using the Naïve Bayes algorithm, Support Vector Machine and K-Nearest Neighbor, the dataset is obtained from the website, namely www.finance.yahoo.com the data was then tested using Rapid miner tools so that the average value of the Support Vector Machine algorithm with an accuracy rate of 57.59%, precision 58 ,73% and recall 51,78%. The next is the Naïve Bayes algorithm so that it is known to have an accuracy rate of 55.59%, precision 54.55% and recall 51.70%. Based on the comparison of the three algorithms, it is known that the one with the best accuracy, precision, and recall is the K-NN algorithm with 61.90% accuracy, 60.98% precision, and 60.35% recall. Furthermore, the results of testing the K-Nearst Neighbor algorithm have good results compared to the 3 (three) other algorithm tests and the Naïve Bayes algorithm testing has a low level of accuracy, namely 55.59%, precision 54.55% and recall 51.70%. The research uses 3 algorithms, namely naive bayes, K-nearst neighbor and Support Vector Machine, because the three algorithms are well-established algorithms to be applied to research, especially in time series gold price research and are very good, especially for classification.

INTRODUCTION Gold is one of the most malleable and highly malleable yellow precious metals. Gold is the exchange rate used before the existence of money as it is today therefore the risk impact of investing in gold is very small, this yellow precious metal has two types, namely, gold for investment only, gold for jewelry only such as necklaces, rings, bracelets, earrings and others, gold is investment in gold stocks or gold futures. Investment process contains risk and uncertainty. The investment that everyone can do is a gold investment, so this so that gold becomes an investment that is in great demand and becomes a prima donna among the upper class, upper or lower middle, but basically in investing in gold this is when price fluctuations occur every day, every month or even every year, this fluctuation risk is called time series risk, where the price is always going up and down. In order not to happen and avoid risk, in this study a strong prediction is made about the price of gold by using time series data so that gold investors can know when to invest and when to resell so that they can provide benefits for gold investors according to the plans that have been made, This gold price forecast is made so that investors get profits and according to the plans made in this research. This gold precious metal investment is more profitable, However, investors must know that there are several factors that influence when the gold exchange rate occurs Gold is one of the most valuable commodities and is traded in various parts of the world, such as one of the leading countries, namely Saudi Arabia, with the most investors reaching 70%. Gold is synonymous with a symbol of luxury and glory, but gold is vulnerable and heavily influenced by various economic indicators such as interest rate, inflation, and Bruto Domestic Product (Produk Domestik Bruto). Certain information is utilized its knowledge. One of the approach that can be used to analize a group. Several research methods have been carried out to predict gold prices, one of which is the Prediction of Gold Price Movements in Gold Stock Investments By Comparing the Naïve Bayes Algorithm, KNN and Support Vector Machine [1]. The Naïve Bayes algorithm has been used to predict gold prices as was done by Mohammad Guntur, Yulius Santony and Yuhandri in 2018 with the title Gold Price Prediction Using The Naïve Bayes Method In Investing To Minimize Risk [2] with the results of his research, namely predicting the price of gold which can help make decisions in determining whether to sell or give gold to predict the price of gold for the next 14 days The data used for testing are 16 data and an accuracy of 75% [3] is obtained, but Naïve Bayes has many deficiencies in classification problems, classification so that further research is needed to improve accuracy. In this study, several tests were carried out on the price of gold by adding a dataset so as to get high accuracy Based on the brief discussion of the above problems, predictions are made by increasing accuracy of the gold price prediction, This research uses gold price data with of gold time series data obtained from a time series gold data website, namely www.finance.yahoo.com and processing gold data so that the results of gold price predictions can be known. Based on the explanation described above, this research was conducted so that compare Naïve Bayes [4], Support Vector Machine [5], and K-Nearest Neighbor algorithm [6], [8], [9], [10]".

II.
RESEARCH METHODOLOGY In this study, analysis and methodology were carried out on the gold time series datafacilites research and able to run systemmatically fulfill the purpose as expected therefore steps in research research stage is made and will be run as follows At the stage of data collection, research is carried out using 2 (two) references to the types of data obtained, including:

Primary Data
The data taken in this study is data from a website www.finance.yohoo.com, namely the price of gold time series for 5 (five) years, from 2014 to 2019, this data is used as study material for learning and training in research. next.
At this stage, namely taking existing references, namely looking for references in books or journals and also on papers that are similar to the research being conducted.

Data Processing
At this stage, data processing is carried out by preprocessing the data to be processed. The data obtained in this study are time series gold price data taken from a website www.finance.yahoo.com as an initial stage, the processing is carried out in 3 stages, the first stage is the elimination of some noise data, then the second stage namely dividing into training data and testing data, 90% for training and 10% for testing data. The third stage of processing is carried out using rapidminer tools to find accuracy, precision and recall on rapidminer tools. At the data cleaning stage, it aims to clean up inconsistent data empty values or commonly called empty tuples, duplicate data and correct data errors, the data repair process is carried out manually, with the help of spreadsheet tools 2.4 Data Selection Data selection is the process of selecting data from existing operational data before entering the data and information mining stage. At this stage, the following steps will be carried out: 1. The data sample is taken randomly with the attribute parameters on the data www.finance.yahoo.com. The document that has the largest amount of data to be used as a dataset and ensures that the selected data is suitable for use in the modeling process. 2. After viewing this dataset, you will get the amount of testing data. 3. Choose the attributes that will be used and analyzed, because in the initial data there are some unneeded attributes such as attributes.

Transformation Data
The data transformation stage is the process of changing the initial data format into a standard data format for the process of reading data using the algorithms in the programs or tools used.

Modelling
The modeling in this study was carried out using data mining classification techniques for the Naïve Bayes algorithm, Support Vector Machine, and K-Nearest Neighbor. This technique was chosen because it is a commonly used method in data mining research to classify or recognize new data that has never been studied, especially in predicting gold price movements or predicting gold prices in gold stock investments. The The algorithm that will be used for the analysis of this research is the Naïve Bayes algorithm, Support Vector Machine, and K-Nearest Neighbor [11], [12], [13],14]. These 3 (three) algorithms are algorithms that have been established and is widely implemented in classification techniques. In addition, this algorithm has advantages, namely in the form of good accuracy in handling a processed dataset .

Testing and Validation of Reseach
Method testing is carried out with the aim of knowing the results of the analyzed calculations and measuring the methods and algorithms used whether they function properly or not. The testing process uses the rapidmener tool and sees whether the data is in accordance with the results obtained through the tool. While the validation of the methods and algorithms of Naïve Bayes [15], [16], [17], Support Vector Machine [18], [19], [20], and K-Nearest Neighbor [21], [22] is done by measuring the results of accuracy, percision and recall and can be calculated using the Confusion Matrix as follows:

Proposed Method
The method of this research uses 3 (three) algorithms, namely K-NN, Naive Bayes and Support Vector Machine to train data accuracy in time series gold price prediction assisted by rapid miner software.

K-fold Cross Validation
Using K-Fold Cross-Validation for statistical analysis that will be generalized to independent data sets (Suyanto, 2019). This technique is mainly used to make model predictions and estimate how accurate a predictive model will be when run in practice. The purpose of defining to test the model phase is, data validation, to limit problems such as the occurrence of such To provide insight into how the model will generalize independently of the dataset (i.e., unknown dataset, for example from a real problem), the following is an example of a data iteration table on K-Fold Cross Validation.  The recall value is calculated by dividing the correct data, which is true positive (True Positive) and divided by the number of correct data, namely true positive (True Positive) and false data, namely false negative (False Negative). The recall value will be calculated by dividing the correct data, namely true positive (True Positive) then dividing by the correct number of data, namely true positive (True Positive) and incorrect data, namely false negative (False Negative). The accuracy value is calculated by adding up the correct data which is positive (True Positive) plus the negative value (True Negative) divided by the number of correct data which is positive (True Positive), Negative (True Negative) and added by false data which is positive ( False Positive), Negative (False Negative).  The recall value is calculated by dividing the correct data which is positive (True Positive) with the sum of the correct data which is positive (True Positive) and the incorrect data which is negative (False).. From the sample data as many as 14 data, then the results from these data state the level of accuracy, recall and persicion of the K-NN, SVM and Naïve Bayes algorithms. The following are the results of the accuracy, recall and persicion values

III.
RESULTS AND DISCUSSION In this study using the SVM, Naïve Bayes, K-NN algorithm which will be tested so that it will get the results of accuracy, precision, and recall values as well as predictions that can be used in making decisions when investing in gold. The source of data as an object in this study is gold price data over a period of years The data used in this study consists of attributes or variables such as, Oil, USD, Euro, IHSG, S&P500, Gold, so that it can find out a result of knowledge in predicting gold data and in this test using 2000 data and the data is divided into two 90% training data and 10% testing data, so that it can produce models or values of accuracy, precision and recall obtained from the three algorithms to test gold data, the data used is divided randomly or randomly into subsets, namely t1, t2, t3, ... , t10 with the same data size.
Accuracy The data is saved in excel workbook format which is then converted into a data frame with the read excel command. The training and testing data will be processed using the SVM, Naïve Bayes and K-NN algorithms, the data is tested with the rapidminer tools, this study will evaluate the classification results of the data, the real data that have been tested, so that the results of accuracy, recall and precision can be seen on tools rapidminer determining data result of powder coating production. Here is the 117 overall process of data testing by using rapid minerr. From the testing result with 10 testings randomly has generates highest accuracy level, it is K-NN algorithm. From the testing result with 10 testings randomly has generates highest precision level, it is K-NN algorithm. From the testing result with 10 testings randomly has generates highest recall level, it is K-NN algorithm.
According to the data that has been tested, the result of the data explains the level of accuracy, recall, and precision.
Here below is the graphic of overall data that has been tested by using rapid miner.

Accuracy
The graphic result of accuracy value with result of data training and testing.

Precission
The graphic result of precision value with result of data training and testing.   T1 T2 T3 T4 T5 T6 T7 T8 T9 T10   T1 T2 T3 T4 T5 T6 T7 T8 T9 T10  Figure 6 Grafik Recall The test results from the gold data produce the level of accuracy, precision and recall of each algorithm, which can be explained as follows. The SVM algorithm has an accuracy rate of 57.59%, precision 58.73% and recal 51.78% while the Naïve Bayes algorithm has an accuracy rate of 55.59% precision 54.55% and recall 51.70% and which has a level accuracy, The best precision and recall in comparing the 3 algorithms in testing this gold data is the K-NN algorithm which has an accuracy value of 61.90%, precision 60.98% and recall 60.35% and can be seen from table 4.7 the results of 3 the algorithm. From the test results by conducting 10 random tests with the highest level of accuracy, precision and recall being K-NN, the following is an overview of the graph of the results of testing 3 algorithms.

Analysis
Based on the results of the gold data analysis and the results of the gold data testing that was carried out, the data testing resulted in good results with the composition of the 6 data attributes tested, then the data was tested with the SVM algorithm, K-NN and Naïve Bayes produce good results above 50% and based on the results obtained in this study, the algorithm that produces good accuracy, precision and recall levels of the 3 algorithms is K-NN. In gold investment so that we can analyze and the results obtained from testing by doing random or random tests by generating the average value of the algorithm SVM has 57.59% accuracy, 58.73% precision and 51.78% recall, while the Naïve Bayes algorithm has 55.59% accuracy, 54.55% precision and 51.70% recall and which has the best level of accuracy, precision and recall in comparing the 3 algorithms in testing this gold data is the K-NN algorithm which has 61.90% accuracy, 60.98% precision and 60.35% recall. The results of the K-NN algorithm have quite good results from the 3 tests of the algorithm.
From the testing of the nave Bayes algorithm, it was carried out randomly or randomly in the test and the results of the nave Bayes algorithm test had an accuracy value of 55.59%, precision 54.55% and recall 51.70%. From the prediction results, the data tested or read from the Naive Bayes algorithm has a fairly low gold data classification value or in the process of predicting data up will be read down and data down will be read up, so from the results of this study, random testing was carried out with the same data using 200 testing data and had 6 attributes used because gold data in general can be categorized as good results because the data has 6 attributes that have information about gold investment.

IV.
CONCLUSION Based on the test results using the SVM, K-NN and Naïve Bayes algorithms, the following conclusions can be drawn: In testing the gold data, the data tested resulted in good results with the composition of the 6 data attributes being tested, then the data was tested with the SVM, K-NN and Naïve Bayes algorithms produce good results above 50%, testing by conducting random tests or random by producing an average value of the SVM algorithm has an accuracy rate of 57.59%, precision 58.73% and recall 51.78% while the Naïve Bayes algorithm has an accuracy level of 55.59%, precision is 54.55% and recall is 51.70% and which has a level of accuracy, The best precision and recall in comparing the 3 algorithms in testing this gold data is the K-NN algorithm which has 61.90% accuracy, 60.98% precision and 60.35% recall. the results of the K-NN algorithm have fairly good results from the 3 tests of the algorithm and the naesve Bayes algorithm test has an accuracy value of 55.59%, precision 54.55% and recall 51.70% From the prediction results, the data tested or read from the naïve Bayes algorithm has a fairly low gold data classification value or in the process of predicting the data up will be read down and the data down will be read up. Based on the research conducted, this research can provide some suggestions as follows: Maximize or add more specific and more attributes in Naïve Bayes, SVM and K-NN classifications, further research is needed by testing with other algorithms such as C.45, C.50 and so on in order to obtain comparisons with the highest level of accuracy in making classifications on Naïve Bayes, SVM and K-NN and further research to improve accuracy, precision and recall in classifying by conducting experiments on each parameter.  T1 T2 T3 T4 T5 T6 T7 T8 T9 T10