A Grouping of Song-Lyric Themes Using K-Means Clustering

Dionisia Bhisetya Rarasati

Abstract


One of the automatic way of theme grouping that can be used is K-Means Clustering. In this research, the song theme is taken from the text of song lyrics. The aim of this study is developing a system that can automatically group the song lyric theme and know the accuracy level of the grouping. The process stage is started with the data processing or text processing called as text mining. In text mining, there are some processes. First, the text operation. The text operation consists of tokenizing, stopword, steeming, and word weighting then can be processed using K-Means clustering. In clustering process, it consists of initial centroid initialization uses Variance Initialization, next counts the centroid distance on the data using Euclidean distance until get the proper grouping accurately. The accuracy counting uses confusion matrix. The next step to see the suitability system that has been made, new data is added which then is processed by a system. After that, it can decide the new data is classified into one specific theme. From the research that has been conducted as case study in Masdha Radio Yogyakarta, total data available 400 and divided into four clusters. The clusters consist of love cluster, friendship cluster, religion cluster, and fighting cluster. The result of research song lyric grouping based on the theme works well with 93.25% accuracy for the unique word frequency numbers 121 maximum and unique word 0 minimum.

Keywords – K-Means clustering, Text Operation, Variance Initialization, Confusion Matrix.


Keywords


K-Means clustering; Text Operation; Variance Initialization; Confusion Matrix

Full Text:

PDF

References


William,Graham., 2005, Data Mining Algorithms Cluster Analysis, Journal of Introduction Requirements Measuring Similarity.

Manning, C. D., Ragvana, P., Schütze, H., 2008, Introduction to Information

Retrieval, Cambridge University Press.

Harlian, Milkha., 2006, Jurnal, Melakukan Analisa Keterhubungan Antar Dokumen.

Martiana, E., 2013, Data Preprocessing, Institut Teknologi Surabaya.

Ghozali, Imam., 2006, Statistik Nonparametrik, Semarang: Badan Penerbit UNDIP.

Turban, E., Aronson, J. E., Liang, T. P., 2005, Decision Support Systems and Intelligent Systems. Yogyakarta: Andi Offset

Kohavi dan Provost., 1998, Confusion Matrix.




DOI: https://doi.org/10.31326/jisa.v3i2.658

Refbacks

  • There are currently no refbacks.


Copyright (c) 2020 Dionisia Bhisetya Rarasati

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.


JOURNAL IDENTITY

Journal Name: JISA (Jurnal Informatika dan Sains)
e-ISSN: 2614-8404, p-ISSN: 2776-3234
Publisher: Program Studi Teknik Informatika Universitas Trilogi
Publication Schedule: June and December 
Language: English
APC: The Journal Charges Fees for Publishing 
IndexingEBSCODOAJGoogle ScholarArsip Relawan Jurnal IndonesiaDirectory of Research Journals Indexing, Index Copernicus International, PKP IndexScience and Technology Index (SINTA, S4) , Garuda Index
OAI addresshttp://trilogi.ac.id/journal/ks/index.php/JISA/oai
Contactjisa@trilogi.ac.id
Sponsored by: DOI – Digital Object Identifier Crossref, Universitas Trilogi

In Collaboration With: Indonesian Artificial Intelligent Ecosystem(IAIE), Relawan Jurnal IndonesiaJurnal Teknologi dan Sistem Komputer (JTSiskom)

 

 


JISA (Jurnal Informatika dan Sains) is Published by Program Studi Teknik Informatika, Universitas Trilogi under Creative Commons Attribution-ShareAlike 4.0 International License.