Application of Information Gain to Select Attributes in Improving Naïve Bayes Accuracy in Predicting Customer's Payment Capability
Abstract
The customer is the main factor in the running of PT. XYZ. A good understanding of customers is very important for predicting the capability of customers to pay. The implementation of credit collectibility is used to determine the quality of customer credit, one of which is the customer's capability to pay interest and principal on time. While manually, it is very difficult to accurately predict the capability of customer credit payments. Data mining techniques with the Naïve Bayes algorithm were chosen to classify customers to be able to find patterns, analyze and predict, because they have good performance, are efficient, and simple. The Naïve Bayes algorithm has a weakness in terms of sensitivity to many attributes, so the accuracy is low. Based on the problem stated, his study will apply the Information Gain method to select the most influential attribute on the label in order to increase the accuracy of the Naïve Bayes algorithm. This research produces a new dataset with seven attributes: TENOR, SALARY, DOWN PAYMENT, INSTALLMENT, APPROVAL, OTR CLASS, AGE with Labels: Status and Id: Id number based on the Information Gain method. The dataset comparison process with 995 data records showed an increase in accuracy, precision, and AUC using the new dataset compared to the old dataset, but in the t-Test test with an alpha value = 0.05 there is a difference but not significant. In the evaluation process, performance experienced a significant increase in the use of new datasets with the following percentages of performance improvement: accuracy = 8%, precision = 18.42%, recall = 17.65% and AUC= 0.057%. The results of this study obtained AUC of 0.876, accuracy of 87.88%, precision of 61.90%, and recall of 76.47%, and classified into good classification.
Keywords
Full Text:
PDFReferences
R. Parvizi and M. A. Adibi, “Assessing and Validating Bank Customers Using Data Mining Algorithms for Loan Home,” Int. J. Ind. Eng. Oper. Res., vol. 2, no. 1, 2020.
L. N. Rani, “Klasifikasi Nasabah Menggunakan Algoritma C4.5 Sebagai Dasar Pemberian Kredit,” INOVTEK Polbeng - Seri Inform., vol. 1, no. 2, p. 126, 2016, doi: 10.35314/isi.v1i2.131.
E. G. Dada, J. S. Bassi, H. Chiroma, S. M. Abdulhamid, A. O. Adetunmbi, and O. E. Ajibuwa, “Machine learning for email spam filtering: review, approaches and open research problems,” Heliyon, vol. 5, no. 6, 2019, doi: 10.1016/j.heliyon.2019.e01802.
A. Raorane and R. V Kulkarni, “Data Mining Techniques: A Source for Consumer Behavior Analysis,” Int. J. Database Manag. Syst., vol. 3, no. 3, pp. 45–56, 2011, doi: 10.5121/ijdms.2011.3304.
A. U. Khasanah and Harwati, “A Comparative Study to Predict Student’s Performance Using Educational Data Mining Techniques,” IOP Conf. Ser. Mater. Sci. Eng., vol. 215, no. 1, 2017, doi: 10.1088/1757-899X/215/1/012036.
M. Ala’raj, M. F. Abbod, and M. Majdalawieh, “Modelling customers credit card behaviour using bidirectional LSTM neural networks,” J. Big Data, vol. 8, no. 1, 2021, doi: 10.1186/s40537-021-00461-7.
M. Sudhakar, C. V. K. Reddy, and A. Pradesh, “TWO STEP CREDIT RISK ASSESMENT MODEL FOR RETAIL BANK LOAN APPLICATIONS USING DECISION TREE DATA MINING TECHNIQUE Research Scholar , D epartment of Computer Science and Technology Professor , D epartment of Physics , Rayalaseema University Kurnool , Andhra,” vol. 5, no. 3, 2016.
P. M. Addo, D. Guegan, and B. Hassani, “Credit risk analysis using machine and deep learning models,” Risks, vol. 6, no. 2, pp. 1–20, 2018, doi: 10.3390/risks6020038.
J. Shi and B. Xu, “Credit Scoring by Fuzzy Support Vector Machines with a Novel Membership Function,” J. Risk Financ. Manag., vol. 9, no. 4, p. 13, 2016, doi: 10.3390/jrfm9040013.
A. Krichene, “Using a naive Bayesian classifier methodology for loan risk assessment: Evidence from a Tunisian commercial bank,” J. Econ. Financ. Adm. Sci., vol. 22, no. 42, pp. 3–24, 2017, doi: 10.1108/JEFAS-02-2017-0039.
Jamaluddin and R. Siringoringo, “Improved Fuzzy K-Nearest Neighbor Using Modified Particle Swarm Optimization,” J. Phys. Conf. Ser., vol. 930, no. 1, 2017, doi: 10.1088/1742-6596/930/1/012024.
F. Harahap, A. Y. N. Harahap, E. Ekadiansyah, R. N. Sari, R. Adawiyah, and C. B. Harahap, “Implementation of Naïve Bayes Classification Method for Predicting Purchase,” 2018 6th Int. Conf. Cyber IT Serv. Manag. CITSM 2018, no. April, 2019, doi: 10.1109/CITSM.2018.8674324.
H. Muhamad, C. A. Prasojo, N. A. Sugianto, L. Surtiningsih, and I. Cholissodin, “Optimasi Naïve Bayes Classifier Dengan Menggunakan Particle Swarm Optimization Pada Data Iris,” J. Teknol. Inf. dan Ilmu Komput., vol. 4, no. 3, p. 180, 2017, doi: 10.25126/jtiik.201743251.
A. Harris and A. E. Mintaria, “Komparasi Information Gain , Gain Ratio , CFs-Bestfirst dan CFs-PSO Search Terhadap Performa Deteksi Anomali,” vol. 5, pp. 332–343, 2021, doi: 10.30865/mib.v5i1.2258.
D. Dwihandayani, “Analisis Kinerja Non Performing Loan (Npl) Perbankan Di Indonesia Dan Faktor-Faktor Yang Mempengaruhi Npl,” J. Ilm. Ekon. Bisnis, vol. 22, no. 3, p. 228985, 2017.
U. Al Qoroni, “PROFITABILITAS ( Studi pada PT . Federal International Finance Rangkasbitung ),” vol. 26, no. 1, pp. 1–5, 2015.
S. Rusnaini, H.- Hamirul, and A. M, “Non Performing Loan (Npl) Dan Return on Asset (Roa) Di Koperasi Nusantara Muara Bungo,” J. Ilm. Manajemen, Ekon. Akunt., vol. 3, no. 1, pp. 1–18, 2019, doi: 10.31955/mea.vol3.iss1.pp1-18.
M. Agustiningtyas, “Analisis Faktor-Faktor Yang Mempengaruhi Non Performing Loans Kredit Pada Bank Umum di Indonesia,” vol. 1, no. September, pp. 120–133, 2018.
T. T. Muryono and I. Irwansyah, “Implementasi Data Mining Untuk Menentukan Kelayakan Pemberian Kredit Dengan Menggunakan Algoritma K-Nearest Neighbors (K-Nn),” Infotech J. Technol. Inf., vol. 6, no. 1, pp. 43–48, 2020, doi: 10.37365/jti.v6i1.78.
S. Agarwal, Data mining: Data mining concepts and techniques. 2014.
E. Knowledge, “Data Mining : Extracting Knowledge from Data.”
J. Sinaga and B. Sinaga, “Data Mining Classification Of Filing Credit Customers Without Collateral With K-Nearest Neighbor Algorithm (Case study: PT. BPR Diori Double),” J. Comput. Networks, Archit. High Perform. Comput., vol. 2, no. 2, pp. 204–210, 2020, doi: 10.47709/cnapc.v2i2.401.
M. Otivation and I. Ntroduction, “M Achine L Earning and D Ata M Ining,” no. September, pp. 1–21, 2012, doi: 10.13140/RG.2.2.20395.49446/1.
A. J. Chamatkar and P. Butey, “Importance of Data Mining with Different Types of Data Applications and Challenging Areas,” J. Eng. Res. Appl. www.ijera.com, vol. 4, no. 5, pp. 38–41, 2014.
Konacaklı Enis and KARAARSLAN ENİS, “Artificial Intelligence and Applied Mathematics in Engineering Problems,” Artif. Intell. Appl. Math. Eng. Probl. - Proc. Int. Conf. Artif. Intell. Appl. Math. Eng. (ICAIAME 2019), vol. 43, no. January, 2020, doi: 10.1007/978-3-030-36178-5.
S. Karthika and N. Sairam, “A Naïve Bayesian classifier for educational qualification,” Indian J. Sci. Technol., vol. 8, no. 16, 2015, doi: 10.17485/ijst/2015/v8i16/62055.
A. P. Wibawa et al., “Naïve Bayes Classifier for Journal Quartile Classification,” Int. J. Recent Contrib. from Eng. Sci. IT, vol. 7, no. 2, p. 91, 2019, doi: 10.3991/ijes.v7i2.10659.
A. W. Syaputri, E. Irwandi, and M. Mustakim, “Naïve Bayes Algorithm for Classification of Student Major’s Specialization,” J. Intell. Comput. Heal. Informatics, vol. 1, no. 1, p. 17, 2020, doi: 10.26714/jichi.v1i1.5570.
S. Chormunge and S. Jena, “Efficient feature subset selection algorithm for high dimensional data,” Int. J. Electr. Comput. Eng., vol. 6, no. 4, pp. 1880–1888, 2016, doi: 10.11591/ijece.v6i4.9800.
R. Blanquero, E. Carrizosa, P. Ramírez-Cobo, and M. R. Sillero-Denamiel, “Variable selection for Naïve Bayes classification,” Comput. Oper. Res., vol. 135, p. 105456, 2021, doi: 10.1016/j.cor.2021.105456.
H. Sulistiani and A. Tjahyanto, “Comparative Analysis of Feature Selection Method to Predict Customer Loyalty,” IPTEK J. Eng., vol. 3, no. 1, p. 1, 2017, doi: 10.12962/joe.v3i1.2257.
N. A. Shaltout, M. El-Hefnawi, A. Rafea, and A. Moustafa, “Information gain as a feature selection method for the efficient classification of influenza based on viral hosts,” Lect. Notes Eng. Comput. Sci., vol. 1, no. July, pp. 625–631, 2014.
A. A. Prasetyo and B. Kristianto, “Integration of Iterative Dichotomizer 3 and Boosted Decision Tree to Form Credit Scoring Profile,” Sisforma, vol. 7, no. 2, p. 58, 2020, doi: 10.24167/sisforma.v7i2.2659.
Dr.J.Arunadevi, S.Ramya, and M. R. Raja, “A study of classification algorithms using Rapidminer,” Int. J. Pure Appl. Math., vol. Volume 119, no. 12, pp. 15977–15988, 2018.
DOI: https://doi.org/10.31326/jisa.v4i2.1044
Refbacks
- There are currently no refbacks.
Copyright (c) 2021 Herfandi Herfandi, Mohammad Taufan Asri Zaen, Yuliadi Yuliadi, M. Julkarnain, Fahri Hamdani
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
JOURNAL IDENTITY
Journal Name: JISA (Jurnal Informatika dan Sains)
e-ISSN: 2614-8404, p-ISSN: 2776-3234
Publisher: Program Studi Teknik Informatika Universitas Trilogi
Publication Schedule: June and December
Language: Indonesia & English
APC: The Journal Charges Fees for Publishing
Indexing: EBSCO , DOAJ, Google Scholar, Arsip Relawan Jurnal Indonesia, Directory of Research Journals Indexing, Index Copernicus International, PKP Index, Science and Technology Index (SINTA, S4) , Garuda Index
OAI address: http://trilogi.ac.id/journal/ks/index.php/JISA/oai
Contact: jisa@trilogi.ac.id
Sponsored by: DOI – Digital Object Identifier Crossref, Universitas Trilogi
In Collaboration With: Indonesian Artificial Intelligent Ecosystem(IAIE), Relawan Jurnal Indonesia, Jurnal Teknologi dan Sistem Komputer (JTSiskom)
JISA (Jurnal Informatika dan Sains) is Published by Program Studi Teknik Informatika, Universitas Trilogi under Creative Commons Attribution-ShareAlike 4.0 International License.