The Classification of Anxiety, Depression, and Stress on Facebook Users Using the Support Vector Machine

Social media remains an essential platform for connecting people with friends, family, and the world around them. However, when events spread on social media are primarily negative, it will cause depression, anxiety, and stress that tend to increase. This study aims to classify depression, anxiety, and stress using the Support Vector Machine. The data in this study were obtained from active Facebook users using the Depression Anxiety Stress Scale (DASS 21) questionnaire. This study adopted the Knowledge Discover Database process. The result of this study is an evaluation of the performance of the Support Vector Machine classification of depression, anxiety, and stress. The accuracy of the Support Vector Machine in this study is 98.96%.

INTRODUCTION Social media is a computer technology that facilitates sharing of ideas, thoughts, and information through the internet network [1]. Data reported by Internet World Stats states that Indonesia ranks third in the world's most prominent use of social media, Facebook, reaching 176.5 million users in June 2021, which is equivalent to 63.9% of the total population of Indonesia [2]. The 2014 Indonesia Family Life Survey (IFLS) surveyed 22,423 individuals in Indonesia; the survey found that one standard deviation of social media use was associated with a 9% increase in CES-D scores (Center For Epidemiological Studies Depression Scale) [3]. It proves that social media has a negative impact on mental health [3]. Social media itself is seen as social support among users. Still, it can harm mental health, specifically, those who already have a significant degree of depression, anxiety, and stress [4].
Furthermore, Tang, Wang, and Norman (2013) found that activities on social media such as sharing, liking, messaging, and other activities increased stress. Moreover, excessive use of social media Facebook has become a severe source of stress because people often share all kinds of feeds, stories, and comments, from economics, politics, and social issues to personal problems [5]. Another thing is the desire to upload the best photos of yourself to get compliments or likes, and the pressure of bringing out the best of yourself can make the Facebook user feel anxious. In addition to anxiety, friends' achievements on Facebook are one of the factors which affect a person's mental health condition [6]. From the problems above, a classification is needed to classify active Facebook users affected by depression, anxiety, or stress to achieve a good life balance. Positive mental health can help individuals work productively and reach one's full potential.
The initial stage of this research is to collect data. In collecting data, this research was conducted by distributing questionnaires. The questionnaire itself is a research instrument consisting of a series of questions or other types of instructions that aim to collect information from a respondent. Several studies have used previous questionnaires to assess levels of depression, anxiety, and stress, such as the Perceived Stress Scale (PSS-10) [7], Subjective Units of Distress Scale (SUDS) [8], The Hamilton Rating Scale for Depression (HAM-D) [9], Hamilton Anxiety Rating Scale (HAM-A) and Depression, Anxiety, and Stress Scale (DASS 21) [10]. DASS-21 used in this study is because it has been used in several studies and has high consistency [11].
Several previous researchers used the Support Vector Machine to classify depression, anxiety, and stress in conducting the classification. Research by Zhang et al. predicts Social Anxiety Disorder using the Support Vector Machine, and the results of this study show an accuracy of 76.25%. It shows that the Support Vector Machine makes a good diagnosis of the potential for Social Anxiety Disorder [12]. Subsequent research was conducted by Frick et al. using the Support Vector Machine to classify Social Anxiety Disorder, and the result is an accuracy of 72.6% [13]. Another study conducted by Pantazatos et al. used the Support Vector Machine and got high accuracy results of 89% [14]. Therefore, this study identifies the classification of depression, anxiety, and stress on social media Facebook using a Support Vector Machine; this model determines the distance using a support vector, so the computing process becomes faster and produces high accuracy in classification.

II. RESEARCH METHODOLOGY
The object of this research is the social media Facebook, and the subject of this research is active Facebook users. The respondents of this study were obtained by distributing Google Forms via social media such as Facebook and Twitter. The questionnaire contains 76 the questions from Depression Anxiety Stress Scale 21 (DASS21). Figure 1 shows the research stages, starting with the study of literature and data collection to achieve data results that can be processed in the Knowledge Discovery Database and then evaluate the performance of the Support Vector Machine.

A. Literature Study
Literature studies are carried out by reading scientific sources such as books and journals related to the research topic or research question. This stage aims to find how this research relates to existing knowledge. B. Data collection Data collection in this study was carried out by distributing a Google Form containing a Depression Anxiety and Stress Scale (DASS 21) questionnaire to active Facebook users using Convenience Sampling. The questionnaire was shared on several social media platforms, such as Facebook and Twitter. C.KDD The Knowledge Discovery Database in this research transforms data into valuable knowledge. The context of this research is the classification of depression, anxiety, and stress in Facebook users. The stages in the KDD process are explained below. (a). Data Selection The researcher selects the data for the classification process at this data selection stage. The data used comes from Depression Anxiety Stress Scale 21 questionnaire. However, this data is not in accordance with the classification process, so the researcher needs to select the appropriate data.

(b). Data Preprocessing
In this data selection stage, noise or irrelevant data is removed from the previous data collection. This stage is necessary so that there is no duplication of data, inconsistent data, or correcting errors in the data. The results of the DASS 21 questionnaire are then labeled using the formulation below: Total = (∑ ) × 2 The total value of each item calculates by performing an addition to all of the sub-items and then multiple by two. After the total value of each item is obtained, the next step is to compare each item's value, and the highest value of the item is chosen to be a label.

(c). Data Transformation
After cleaning the data, then continued with the Data Transformation stage. In this stage, we change the data format, structure, or value into the form required in the data mining process. (d). Data Mining After the data transformation has been carried out in the previous process, it continues with data mining, extracting potentially valuable patterns. At this stage, the Support Vector Machine is applied. (e). Data Evaluation At this stage, the researcher evaluates the performance from the classification result. The evaluation in this research uses a confusion matrix. The output of this stage is accuracy, precision, f1, and recall.

III.
RESULTS AND DISCUSSION The total respondents to the DASS 21 questionnaire were 193 respondents with, 67 male and 126 female. It can be seen in the image below:  The domicile of the respondents is very diverse, from the islands of Java, Sumatra, and Kalimantan to Sulawesi. The data used is a DASS 21-question instrument. There are 21 questions, with 7 for depression questions, 7 for anxiety questions, and 7 for stress questions. The total number of respondents is 261, but only 193 can be categorized as users who experience depression, anxiety, and stress The following process is labeling. Labeling is done on the respondent's data obtained from the previous process. Labeling was performed using a DASS score of 21.  Calculation of each depression, anxiety, and stress item was carried out using a DASS score of 21. Furthermore, each total mental illness was multiplied by two, and then compared the results of the calculation of each item. Respondent 1 was labeled depression because the calculated DASS 21 score for depression was more significant than the calculated DASS 21 score for anxiety and stress. Respondent 2 was labeled anxiety, and respondent three was labeled depression. The calculation was carried out on all 193-respondent data.
After the data of 193 respondents were labeled depression, anxiety, and stress, it was continued by selecting the data to be used for processing in data mining. Name, email address, gender, age, occupation, and domicile data on the questionnaire results were deleted.
The label data is then transformed from numeric to categorical. Furthermore, the data is processed using the Support Vector Machine to produce a classification of depression, anxiety, and stress. Based on data of 193 respondents who have been tested, results of the calculation of f1, precision, recall, and accuracy are obtained. The Support Vector Machine model produces an accuracy of 98.96%, F1 is 95.75%, precision is 99.15%, and recall is 97.26% ( Where TN (True Negative) is a negative data with true value, FP (False Positive) is negative data that identifies as positive data. For the more TP (True Positive) have actual value, and FN (False Negative) is a positive data identified as a negative data.
Based on the Support Vector Machine classification method, the result shows that depression accuracy is 98.96%, which explains that 118 instances labeled as depression have a correct value. Based on the Support Vector Machine classification method, the result shows that depression accuracy is 98.44%, which explains that 17 instances labeled as depression have a correct value. Based on the Support Vector Machine classification method, the result shows that depression accuracy is 99.48%, it explains that 55 of instances labeled as depression have a correct value.
The second confusion matrix score is precision. The one instance cannot be labeled as depression; it shows that the precision of depression is 99.15%. Then, two instances cannot be labeled as anxiety; it indicates that anxiety's precision is 89.47%. All of the instances labeled as stress show that the precision of stress is 100%.
The recall value for depression is 99.15% because one instance didn't classify as depression. Next, recall's anxiety is 94.44% because one instance didn't classify as anxiety. Finally, the recall value for stress was 98.21% because one instance wasn't classified as stress.
The last confusion matrix is F1. 99.15% for depression, 93.28% for anxiety and 94.82% for stress.

IV. CONCLUSION
From the test results, the Support Vector Machine model has high accuracy because it has advantages such as determining the distance using a support vector to make the computing process faster [15]. The Support Vector Machine creates a decision function or hyperplane that can differentiate between categories. The resulting decision function or hyperplane will be used to predict a predetermined class, so the classification accuracy is high [16]. The Support Vector Machine method is the best method for classifying depression, anxiety, and stress in Facebook users. It was shown with 98.96% accuracy. Furthermore, this research can be developed by adding other data mining methods such as naïve Bayes, random forest, and decision three to see the comparative performance of the model. Furthermore, other social media use can be subject to future research, for example, Twitter users, Instagram users, and YouTube users.