Implementation of TF-IDF Algorithm and K-mean Clustering Method to Predict Words or Topics on Twitter

Muhammad Darwis, Gatot Tri Pranoto, Yusuf Eka Wicaksana, Yaddarabullah Yaddarabullah

Abstract


The social media time line, especially Twitter, is still interesting to follow. Various tweets delivered by the public are very informative and varied. This information should be able to be used further by utilizing the topic of conversation trends at one time. In this paper, the authors cluster the tweet data with the TF-IDF algorithm and the K-Mean method using the python programming language. The results of the tweet data clustering show predictions or possible topics of conversation that are being widely discussed by netizens. Finally, the data can be used to make decisions that utilize community sentiment towards an event through social media like Twitter.

 

 


Keywords


data mining; clustering; K-mean Method; TF-IDF algorithm; twitter; prediction

Full Text:

PDF

References


A. Mukkamala and R. Beck, “The Role Of Social Media For Collective Behavior Development,” ECIS 2018 Proc., 2018.

K. Rudra, S. Ghosh, N. Ganguly, P. Goyal, and S. Ghosh, “Extracting situational information from microblogs during disaster events: A classification-summarization approach,” Int. Conf. Inf. Knowl. Manag. Proc., vol. 19-23-Oct-, pp. 583–592, 2015.

W. E. Nurjanah, R. S. Perdana, and M. A. Fauzi, “Analisis Sentimen Terhadap Tayangan Televisi Berdasarkan Opini Masyarakat pada Media Sosial Twitter menggunakan Metode K-Nearest Neighbor dan Pembobotan Jumlah Retweet,” J. Pengemb. Teknol. Inf. dan Ilmu Komput. Univ. Brawijaya, vol. 1, no. 12, pp. 1750–1757, 2017.

Tiara, M. K. Sabariah, and V. Effendy, “Analisis Sentimen pada Twitter untuk Menilai Performansi Program Televisi dengan Kombinasi Metode Lexicon-Based dan Support Vector Machine,” e-Proceeding Eng., vol. 2, no. 1, pp. 1237–1247, 2015.

V. Friedemann, “Clustering a Customer Base Using Twitter Data,” Cs, vol. 229, no. 1, pp. 1–5, 2015.

R. Soni and K. J. Mathai, “Improved Twitter Sentiment Prediction through Cluster-then-Predict Model,” vol. 4, no. 4, pp. 559–563, 2015.

N. Alnajran, K. Crockett, D. McLean, and A. Latham, “Cluster analysis of twitter data: A review of algorithms,” ICAART 2017 - Proc. 9th Int. Conf. Agents Artif. Intell., vol. 2, no. Icaart, pp. 239–249, 2017.

M. Vicente, F. Batista, and J. P. Carvalho, “Twitter gender classification using user unstructured information,” IEEE Int. Conf. Fuzzy Syst., vol. 2015-Novem, 2015.

X. Wang et al., “A Weakly-supervised Framework for COVID-19 Classification and Lesion Localization from Chest CT,” vol. XX, no. XX, pp. 1–11, 2020.

L. Li et al., “Characterizing the Propagation of Situational Information in Social Media during COVID-19 Epidemic: A Case Study on Weibo,” IEEE Trans. Comput. Soc. Syst., vol. 7, no. 2, pp. 556–562, 2020.

N. Garg and R. Rani, “Analysis and Visualization of Twitter Data using k-means Clustering,” Int. Conf. Intell. Comput. Control Syst. 2017, pp. 670–675, 2017.

A. Sechelea, T. Do Huu, E. Zimos, and N. Deligiannis, “Twitter data clustering and visualization,” 2016 23rd Int. Conf. Telecommun. ICT 2016, vol. 8, pp. 1–5, 2016.

Imamah and F. H. Rachman, “Twitter Sentiment Analysis of Covid-19 Using Term Weighting TF-IDF And Logistic Regresion,” 2020 6th Inf. Technol. Int. Semin., p. pp 238-242, 2020.

Delta Sierra, “Algoritma TF — IDF,” Medium The Startup, 2019. [Online]. Available: https://medium.com/@dltsierra/algoritma-tf-idf-633e17d10a80.

A. Y. Putri et al., “Ekstraksi Fitur Situs Berita Online Untuk Kaleidoskop,” 2018.




DOI: https://doi.org/10.31326/jisa.v3i2.831

Refbacks

  • There are currently no refbacks.


Copyright (c) 2020 Muhammad Darwis, Gatot Tri Pranoto, Yusuf Eka Wicaksana, Yaddarabullah Yaddarabullah

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.


JOURNAL IDENTITY

Journal Name: JISA (Jurnal Informatika dan Sains)
e-ISSN: 2614-8404, p-ISSN: 2776-3234
Publisher: Program Studi Teknik Informatika Universitas Trilogi
Publication Schedule: June and December 
Language: English
APC: The Journal Charges Fees for Publishing 
IndexingEBSCODOAJGoogle ScholarArsip Relawan Jurnal IndonesiaDirectory of Research Journals Indexing, Index Copernicus International, PKP IndexScience and Technology Index (SINTA, S4) , Garuda Index
OAI addresshttp://trilogi.ac.id/journal/ks/index.php/JISA/oai
Contactjisa@trilogi.ac.id
Sponsored by: DOI – Digital Object Identifier Crossref, Universitas Trilogi

In Collaboration With: Indonesian Artificial Intelligent Ecosystem(IAIE), Relawan Jurnal IndonesiaJurnal Teknologi dan Sistem Komputer (JTSiskom)

 

 


JISA (Jurnal Informatika dan Sains) is Published by Program Studi Teknik Informatika, Universitas Trilogi under Creative Commons Attribution-ShareAlike 4.0 International License.