A Grouping of Song-Lyric Themes Using K-Means Clustering

Dionisia Bhisetya Rarasati

doi:10.31326/jisa.v3i2.658

A Grouping of Song-Lyric Themes Using K-Means Clustering

Dionisia Bhisetya Rarasati

Abstract

One of the automatic way of theme grouping that can be used is K-Means Clustering. In this research, the song theme is taken from the text of song lyrics. The aim of this study is developing a system that can automatically group the song lyric theme and know the accuracy level of the grouping. The process stage is started with the data processing or text processing called as text mining. In text mining, there are some processes. First, the text operation. The text operation consists of tokenizing, stopword, steeming, and word weighting then can be processed using K-Means clustering. In clustering process, it consists of initial centroid initialization uses Variance Initialization, next counts the centroid distance on the data using Euclidean distance until get the proper grouping accurately. The accuracy counting uses confusion matrix. The next step to see the suitability system that has been made, new data is added which then is processed by a system. After that, it can decide the new data is classified into one specific theme. From the research that has been conducted as case study in Masdha Radio Yogyakarta, total data available 400 and divided into four clusters. The clusters consist of love cluster, friendship cluster, religion cluster, and fighting cluster. The result of research song lyric grouping based on the theme works well with 93.25% accuracy for the unique word frequency numbers 121 maximum and unique word 0 minimum.

Keywords – K-Means clustering, Text Operation, Variance Initialization, Confusion Matrix.

Keywords

K-Means clustering; Text Operation; Variance Initialization; Confusion Matrix

Full Text:

PDF

References

William,Graham., 2005, Data Mining Algorithms Cluster Analysis, Journal of Introduction Requirements Measuring Similarity.

Manning, C. D., Ragvana, P., Schütze, H., 2008, Introduction to Information

Retrieval, Cambridge University Press.

Harlian, Milkha., 2006, Jurnal, Melakukan Analisa Keterhubungan Antar Dokumen.

Martiana, E., 2013, Data Preprocessing, Institut Teknologi Surabaya.

Ghozali, Imam., 2006, Statistik Nonparametrik, Semarang: Badan Penerbit UNDIP.

Turban, E., Aronson, J. E., Liang, T. P., 2005, Decision Support Systems and Intelligent Systems. Yogyakarta: Andi Offset

Kohavi dan Provost., 1998, Confusion Matrix.

DOI: https://doi.org/10.31326/jisa.v3i2.658

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

JOURNAL IDENTITY

Journal Name: JISA (Jurnal Informatika dan Sains)
e-ISSN: 2614-8404, p-ISSN: 2776-3234
Publisher: Program Studi Teknik Informatika Universitas Trilogi
Publication Schedule: June and December
Language: English
APC: The Journal Charges Fees for Publishing
Indexing: EBSCO , DOAJ, Google Scholar, Arsip Relawan Jurnal Indonesia, Directory of Research Journals Indexing, Index Copernicus International, PKP Index, Science and Technology Index (SINTA, S4) , Garuda Index
OAI address: http://trilogi.ac.id/journal/ks/index.php/JISA/oai
Contact: jisa@trilogi.ac.id
Sponsored by: DOI – Digital Object Identifier Crossref, Universitas Trilogi

In Collaboration With: Indonesian Artificial Intelligent Ecosystem(IAIE), Relawan Jurnal Indonesia, Jurnal Teknologi dan Sistem Komputer (JTSiskom)

JISA (Jurnal Informatika dan Sains) is Published by Program Studi Teknik Informatika, Universitas Trilogi under Creative Commons Attribution-ShareAlike 4.0 International License.

Username
Password
Remember me