A Grouping of Song-Lyric Themes Using K-Means Clustering
Abstract
One of the automatic way of theme grouping that can be used is K-Means Clustering. In this research, the song theme is taken from the text of song lyrics. The aim of this study is developing a system that can automatically group the song lyric theme and know the accuracy level of the grouping. The process stage is started with the data processing or text processing called as text mining. In text mining, there are some processes. First, the text operation. The text operation consists of tokenizing, stopword, steeming, and word weighting then can be processed using K-Means clustering. In clustering process, it consists of initial centroid initialization uses Variance Initialization, next counts the centroid distance on the data using Euclidean distance until get the proper grouping accurately. The accuracy counting uses confusion matrix. The next step to see the suitability system that has been made, new data is added which then is processed by a system. After that, it can decide the new data is classified into one specific theme. From the research that has been conducted as case study in Masdha Radio Yogyakarta, total data available 400 and divided into four clusters. The clusters consist of love cluster, friendship cluster, religion cluster, and fighting cluster. The result of research song lyric grouping based on the theme works well with 93.25% accuracy for the unique word frequency numbers 121 maximum and unique word 0 minimum.
Keywords – K-Means clustering, Text Operation, Variance Initialization, Confusion Matrix.
Keywords
Full Text:
PDFReferences
William,Graham., 2005, Data Mining Algorithms Cluster Analysis, Journal of Introduction Requirements Measuring Similarity.
Manning, C. D., Ragvana, P., Schütze, H., 2008, Introduction to Information
Retrieval, Cambridge University Press.
Harlian, Milkha., 2006, Jurnal, Melakukan Analisa Keterhubungan Antar Dokumen.
Martiana, E., 2013, Data Preprocessing, Institut Teknologi Surabaya.
Ghozali, Imam., 2006, Statistik Nonparametrik, Semarang: Badan Penerbit UNDIP.
Turban, E., Aronson, J. E., Liang, T. P., 2005, Decision Support Systems and Intelligent Systems. Yogyakarta: Andi Offset
Kohavi dan Provost., 1998, Confusion Matrix.
DOI: https://doi.org/10.31326/jisa.v3i2.658
Refbacks
- There are currently no refbacks.
Copyright (c) 2020 Dionisia Bhisetya Rarasati
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
JOURNAL IDENTITY
Journal Name: JISA (Jurnal Informatika dan Sains)
e-ISSN: 2614-8404, p-ISSN: 2776-3234
Publisher: Program Studi Teknik Informatika Universitas Trilogi
Publication Schedule: June and December
Language: Indonesia & English
APC: The Journal Charges Fees for Publishing
Indexing: EBSCO , DOAJ, Google Scholar, Arsip Relawan Jurnal Indonesia, Directory of Research Journals Indexing, Index Copernicus International, PKP Index, Science and Technology Index (SINTA, S4) , Garuda Index
OAI address: http://trilogi.ac.id/journal/ks/index.php/JISA/oai
Contact: jisa@trilogi.ac.id
Sponsored by: DOI – Digital Object Identifier Crossref, Universitas Trilogi
In Collaboration With: Indonesian Artificial Intelligent Ecosystem(IAIE), Relawan Jurnal Indonesia, Jurnal Teknologi dan Sistem Komputer (JTSiskom)
JISA (Jurnal Informatika dan Sains) is Published by Program Studi Teknik Informatika, Universitas Trilogi under Creative Commons Attribution-ShareAlike 4.0 International License.