Naive Bayes and Support Vector Machine Algorithm for Sentiment Analysis Opensea Mobile Application Users in Indonesia

− Opensea is an NFT buying and selling application-based platform that is booming in the community. One way to find out the public's perception of the Opensea application is by sentiment analysis, as done in this study. Data that is used is user review data for the Opensea application in the Indonesian play store. The sentiment analysis technique used is the Naïve Bayes Classifier and the Support Vector Machine (SVM) method. Both are used to compare public responses from sentiment analysis of reviewed data labeled as positive, negative, and neutral. Based on this study, it was found that the Naive Bayes algorithm gives the results that class precision is 87.31%, class recall is 71.02%, and accuracy is 89.81%. While the SVM algorithm gives the results that class precision is 94.23%, class recall 71.96%, and Accuracy 90.78%. It is concluded that the SVM algorithm has a better performance than the Naive Bayes algorithm.

INTRODUCTION Opensea application users began to boom in Indonesia in early 2021 because a student from Semarang City, Sultan Gustaf Al Ghozali, got 1.5 billion from the sale of selfie photos in the form of NFT entitled Ghozali Everyday on the OpenSea platform [1]. The OpenSea Marketplace is another place where collectibles sold for Decentraland can be found. The Decentraland collection is linked to NFT. An NFT is a digital token that functions as a digital certificate of ownership for a digital asset such as an artist's digital collection or image. [2].
OpenSea has recorded $20.37 billion in sales and has over 1.2 million active traders in its network [3]. We opted to collect sales data from OpenSea, the largest active NFT marketplace. Statistics show that OpenSea has accumulated over $20 billion in trade volume, boasting more than 1.2 million traders. The data set was built using the provided OpenSea API, where we made our queries against the Events endpoint [4].
Play Store is a digital content provider service owned by Google that provides various online product stores such as applications, games, movies or music, and books of various categories. Google Play Store can be accessed through the website, android application, and Google TV. In the Google Play Store application, there are several features, one of which is the rating and review feature from users of available applications or services. A review or review is a text or sentence that contains an assessment or comment on a person's work. The importance of these reviews is often used as a benchmark for an application, whether it is recommended or not for new users [5].
Sentiment analysis or opinion mining is the process of understanding, extracting, and processing textual data automatically to obtain sentiment information contained in an opinion sentence. Sentiment analysis is carried out to see opinions or opinion tendencies towards a problem or object by someone, whether they tend to have negative or positive views or opinions [6].
In this study, sentiment analysis was carried out to see reviews from users of the OpenSea application. These reviews could be put into three categories, namely positive, neutral and negative.
Many studies have used machine learning algorithms with support vector machines (SVM) and Naïve Bayes (NB) being the most commonly used. Naïve Bayes (NB) is a technique based on Bayes' theorem. The Naive Bayes algorithm assumes that the presence of certain features in a class does not correlate with the presence of other features. This model is easy to build and very useful for very large data sets. Despite its simplicity, Naive Bayes is known to outperform even the most complex classification methods [7]. Support Vector Machine (SVM) is a classification and regression method commonly used for linear and nonlinear problems. It has the advantage of applying linear splits to high-dimensional non-linear input data, and this is achieved by using the required kernel functions. The effectiveness of the Support Vector Machine is strongly influenced by the type of kernel function selected and 63 applied based on the characteristics of the data. Many studies have reported that the Support Vector Machine is the most accurate method for text classification [8].
In previous research regarding the analysis of sentiment on E-Wallet Review (OVO). This study uses 500 positive reviews and 500 negative reviews as training data. The results of this study indicate that the use of the Naive Bayes algorithm produces an accuracy value of 93.10 percent. In comparison, the research results from the SVM algorithm are 91.30 percent. Based on these results, the accuracy value generated by the Naive Bayes algorithm and SVM was found that SVM is the best algorithm for classifying [9]. Also, previous research regarding the sentiment analysis of the Indonesian Police Mobile Brigade Corps based on Twitter posts using the SVM and NB methods resulted in an accuracy value of 86.96% with the SVM approach, 86.96% precision value, and 86.96% recall value [10].
The purpose of this study is to predict sentiment labels on reviews from users of the OpenSea application on the Google Play Store using the Naïve Bayes method and Support Vector Machine as a classification model.

II. RESEARCH METHODOLOGY
The object of this research is the Indonesian people's tweets against the metaverse on Twitter social media. In this study, there are several steps taken in analyzing the sentiments of the Indonesian people towards metaverse technology. The steps taken in this research can be seen in figure 1.

Data Extraction
Collecting data in this study was obtained from reviews of users of the OpenSea application on the Google Play Store, using the web scraping technique, namely the technique which is used to extract data in large quantities large than the website where the data already extracted is saved in CSV (Comma Separated Value) format [11]. The web scrapper process in the google play store uses the Google-play-scrapper. Google-play-scrapper is Node js module to scrape application data from the Google Play store [12]. The data is then processed using the python language to go to the next stage, which is preprocessing.

Preprocessing Data
Pre-processing is a stage that is carried out to process and improve data so that it can be processed after the data to be analyzed has been obtained [13]. The following are: 1) Labeling the data in this study will be carried out by two people. The first person is tasked with manually classifying positive, negative, and neutral sentiments, while the second person re-examines the correctness of the classification results that have been carried out by the first person.
2) Case Folding is the stage to change sentences that have uppercase (capital letters) into lowercase (lowercase). This is done in order to obtain structured and consistent data in the use of capital letters.
3) Tokenizing is the stage to separate sentences into several pieces of words called tokens. Separate words using space punctuation restrictions. The following is an example of tokenizing in the table.
4) Stop word removal is a step to get rid of various useless words in a sentence with the help of the Sastrawi library. Sastrawi library is a library that can also be used to perform stopword removal with unimportant words in Indonesian [14].

5)
Stemming is a step taken by researchers to remove prefixes and suffixes for each token with the help of Sastrawi stemmer. Sastrawi stemmer is a stemmer library that is used to overcome the problem of changing words with words into basic words in the Indonesian language [15].

Modeling
Modeling is a method in which a model represents correlation relationships between one set of data and the other set of data [16]. The first step in starting data modeling is to partition or divide the data into training data and testing. The data modeling process for the case of sentiment analysis is carried out using several classification methods, including supervised learning, such as Support Vector Machine (SVM) and Naïve Bayes. This algorithm was chosen because it is a commonly used method in sentiment analysis.
Naive Bayes Classifier is the probability used to determine text document class and can process large amounts of data with high accuracy results [17]. SVM (Support Vector Machine) is a machine learning algorithm that is used to divide each data class to find the most optimal hyperplane [18]. The SVM algorithm tries to find a hyperplane to maximize the distance between classes. In this way, SVM can guarantee the ability of high 64 generalization for data that will be predicted [19].

Evaluation
The evaluation process in this research uses a confusion matrix and classification report. The confusion matrix is a table that is used to describe the performance of a classification algorithm. A confusion matrix visualizes and summarizes the classification algorithm's performance in label comparison and machine learning prediction results [20]. Classification reports are used to measure the predictive quality of the classification of a particular algorithm so as to show the precision, recall, and accuracy of an application of the model algorithm [21]. The aim is to see and compare the accuracy, precision, and recall of SVM and Naive Bayes models in analyzing sentiment.

Data Extraction
The dataset that will be used in the research is taken from user reviews of the Opensea application on the play store by doing web scrapping via google-play-scrapper and python language. The dataset collected reviews in Indonesian as many as 1028 reviews.

Preprocessing 3.2.1 Labelling
The dataset obtained is then carried out manually, labeling the sentiment by two people. Where the first person gives the label and the second person checks the correctness of the labeling. The results of the labeling obtained 731 positive sentiments, 231 negative sentiments, and 86 neutral sentiments. The results of the labeling stage can be seen in table 1.

Case Folding
For the dataset that has gone through the labeling process then, every uppercase letter in the comments column will be changed to lowercase, and the number will be removed. The results of the case folding process can be seen in Table 2.

.3 Tokenizing
At this stage, the sentence is broken down into words with punctuation and whitespace boundaries. The results of the tokenizing process can be seen in Table 3.

Stopword Removal
After the dataset goes through the tokenizing process, the next step is to delete words that are not important and interfere with the sentiment analysis process through the Stopword Removal stage. The results of this stage can be seen in table 4.

65
only the basic words. The results of this stage can be seen in Table 5.

Data Modelling
The training and testing data used in this data modeling is 80%: 20%. This means that from 1028 the training data collection owned is 822 records while the testing data owned is 206 records. Based on the results of the tests conducted on the Opensea application, user comment test data, which consists of 3 labels, namely positive, negative, and neutral, using the Naive Bayes classifier obtained a match accuracy with the train data of 89.81%. Meanwhile, using the Support Vector Machine algorithm, the accuracy was 90.78%. This means that the Naive Bayes model is more accurate than SVM in this study.
The visualization of the bar chart of the number of positive, negative, and neutral sentiments from the Support Vector Machine can be seen in table 8, and the distribution of the most dominant words in the positive, neutral, and negative labels are presented in the form of a word cloud. The word cloud in the positive class is shown in figure 9, while the negative class word cloud is shown in figure 10.

Evaluation
After the model is created, it needs to be evaluated using a confusion matrix. Evaluation is done using confusion matrix so we can know the exact result of true positive, true negative, true neutral false positive, false negative, and false neutral. True positive, true negative, true neutral false positive, false negative, and false neutral. True positive is the successful positive class classified as the positive class, the true negative is the successful negative class classified as the negative class, and true neutral is the successful neutral class classified as the positive class. false positive is a negative class, and neutral class is classified as a positive class, false negative is a positive class, and neutral class is classified as a negative class. False neutral is a negative class, and a positive class is classified as a neutral class. The classification report is used to determine the class recall and class precision on a model that is being run.
In the evaluation of the naive bayes model with the confusion matrix, the results obtained the results of the true Positive = 143, false positive = 15, true negative = 37, false negative = 5, true neutral = 5, and false neutral = 1. The results of the confusion matrix can be seen in Figure 6.   Table 6 below is a combination of the results of the confusion matrix with the classification report on the Naive Bayes evaluation and is shown in tabular form so that it is easy to see the correlation. In the evaluation of the SVM model with the confusion matrix, the results of the true positive = 144, false positive = 16, true negative = 38, false negative = 3, true neutral = 5, and false neutral = 0. The results of the confusion matrix can be seen in figure 8.  The combination of the results of the confusion matrix with the classification report on the Support Vector Machine evaluation is shown in tabular form so that it is easy to see the correlation. The result can be seen in table 7.

IV. CONCLUSION
Based on the results of the sentiment analysis in this study, it can be seen that the Opensea Application Review dataset predicted using the Naïve Bayes algorithm and SVM showed significant results.
The Naive Bayes algorithm gives the results that class precision is 87.31%, class recall is 71.02%, and accuracy is 89.81%. While the SVM algorithm gives the results that class precision is 94.23%, class recall 71.96%%, and accuracy 90.78%. It is concluded that the SVM algorithm has a better performance than the Naive Bayes algorithm. This research has not compared the performance with other machine learning algorithms besides Naive Bayes and SVM, so it is necessary to make a comparison with other classification machine learning algorithm models. Such as lexicon, linear regression, and random forest so that later it can improve the accuracy of sentiment classification in similar research.