Komparasi Metode KNN Imputation Dan Random Forest Untuk Hasil Klasifikasi Data UMKM

Authors

  • antonius sudrajat universitas MDP
  • Idham Cholid Universitas Multi Data Palembang

Keywords:

KNN Imputation, Missing Value, Metode Imputation, Random Forest

Abstract

MSMEs play an important role in economic growth in Indonesia. The improvement of MSMEs carried out by the government is based on precise data. Data incompleteness is a problem in managing MSME data. Handling Missing valus in MSME data is important. The imputation method is a method taken in handling missing data. Many researchers have handled missing data with various methods. The purpose of this study is to compare or compare the K-Nearest Neihbor method with Imputation (KNNI with Random Forest) in overcoming missing data on MSME datasets in one of the districts in South Sumatra. The evaluation is done using the score accuracy and mean absolute percentage error (MAPE) methods. Our results show that Random Forest imputation consistently outperforms KNN imputation across various scenarios. Specifically, the Random Forest approach achieved an accuracy score of 0.9958, while the KNN score achieved an accuracy of 0.9916. In addition, using MAPE, Random Forest has a lower average error rate of 0.41%. In future research, it is necessary to further improve the accuracy results by optimizing each method.

References

W. Sudrajat and I. Cholid, “K-NEAREST NEIGHBOR (K-NN) UNTUK PENANGANAN MISSING VALUE PADA DATA UMKM,” 2023.

idham C. ermatita Wahyu Sudrajat, “Application of the Apriori Algorithm and FP-Growth to tind out the Association Rule between Gender, Education level on wages of SMEs workers in palembang City,” Journal of Small Business and Entrepreneurship Development, vol. 4, no. 1, 2016, doi: 10.15640/jsbed.v4n1a3.

G. Doquire and M. Verleysen, “Feature selection with missing data using mutual information estimators,” Neurocomputing, vol. 90, pp. 3–11, Aug. 2012, doi: 10.1016/j.neucom.2012.02.031.

T. Thomas and E. Rajabi, “A systematic review of machine learning-based missing value imputation techniques,” Data Technologies and Applications, vol. 55, no. 4, pp. 558–585, 2021, doi: 10.1108/DTA-12-2020-0298.

A. R. Ismail, N. Z. Abidin, and M. K. Maen, “Systematic Review on Missing Data Imputation Techniques with Machine Learning Algorithms for Healthcare,” Mar. 01, 2022, Department of Electrical Engineering, Universitas Muhammadiyah Yogyakarta. doi: 10.18196/jrc.v3i2.13133.

S. Y. Siregar, S. St, T. Toharudin, B. Tantular, S. Si, and M. Si, “PERFORMA METODE K NEAREST NEIGHBOR IMPUTATION (KNNI) UNTUK MENANGANI MULTIVARIATE MISSING DATA.”

R. Supriyadi, W. Gata, N. Maulidah, A. Fauzi, I. Komputer, and S. Nusa Mandiri Jalan Margonda Raya No, “Penerapan Algoritma Random Forest Untuk Menentukan Kualitas Anggur Merah,” vol. 13, no. 2, pp. 67–75, 2020, [Online]. Available: http://journal.stekom.ac.id/index.php/E-Bisnis■page67

M. Dan et al., “Application of Random Forest Method to Identify Food and Beverage Industries Experiencing Raw Material Difficulties Penerapan Metode Random Forest untuk Mengidentifikasi Industri,” Indonesian Journal of Statistics and Its Applications, vol. 8, no. 01, pp. 37–46, 2024, doi: 10.29244/ijsa.v8i1p37-46.

F. Yulian Pamuji, Ahmad Rofiqul Muslikh, Rizza Muhammad Arief, and Delviana Muti, “Komparasi Metode Mean dan KNN Imputation dalam Mengatasi Missing Value pada Dataset Kecil,” Jurnal Informatika Polinema, vol. 10, no. 2, pp. 257–264, Feb. 2024, doi: 10.33795/jip.v10i2.5031.

L. Amatullah, Y. Widiastiwi, and N. Chamidah, “Penerapan Klasifikasi Random Forest Terhadap Data Gangguan Spektrum Autisme (ASD) Pada Anak-Anak Menggunakan Seleksi Fitur Principal Component Analysis”.

A. Fadlil, Herman, and D. Praseptian M, “K Nearest Neighbor Imputation Performance on Missing Value Data Graduate User Satisfaction,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 6, no. 4, pp. 570–576, 2022, doi: 10.29207/resti.v6i4.4173.

Published

2024-07-25