ANALISIS KETIDAKSEIMBANGAN KELAS DALAM PENGEMBANGAN MODEL KLASIFIKASI

Keywords: Imbalance Data, Learning Model, Minority Class, SMOTE

Abstract

Ketidakseimbangan kelas atau imbalance class merupakan masalah yang sering terjadi dalam proses pembelajaran mesin, ada banyak pendekatan atau metode yang digunakan untuk menyelesaikan permasalahan tersebut, pada penelitian ini terdapat sebuah dataset memiliki kelas yang tidak seimbang, dimana untuk mengatasi ketidakseimbangan kelas tersebut penelitian ini melakukan pendekatan dengan cara menggabungkan kelas-kelas minoritas. nantinya kelas-kelas minoritas tersebut akan di evaluasi dengan model pembelajaran mesin yang menggunakan delapan algoritma berbeda yaitu Algoritma AdaBoost, Algoritma Gradient Boosting, Algoritma kNN, Algoritma Naïve Bayes, Algoritma Neural Network, Algoritma Random Forest, Algoritma SVM, dan Algoritna Decicion Tree. selain di evaluasi dengan delapan algoritma data tersebut juga akan diterapkan pendekatan oversampling dan undersampling. dari experimental tersebut diharapkan kita dapat melihat hasil evaluasi dari model pembelajaran mesin. Eksperimen ini diharapkan dapat memberikan wawasan tentang efektivitas metode oversampling dan undersampling dalam meningkatkan kinerja model pada dataset dengan ketidakseimbangan kelas. Hasil dari eksperimen ini akan memberikan gambaran yang lebih jelas tentang bagaimana mengatasi ketidakseimbangan kelas dalam konteks tertentu, serta memberikan pemahaman yang lebih mendalam tentang performa model pembelajaran mesin pada dataset tersebut

References

H. Ali, M. N. Mohd Salleh, R. Saedudin, K. Hussain, and M. F. Mushtaq, “Imbalance class problems in data mining: a review,” Indones. J. Electr. Eng. Comput. Sci., vol. 14, no. 3, p. 1552, Jun. 2019, doi: 10.11591/ijeecs.v14.i3.pp1552-1563.

M. Z. Abedin, C. Guotai, P. Hajek, and T. Zhang, “Combining weighted SMOTE with ensemble learning for the class-imbalanced prediction of small business credit risk,” Complex Intell. Syst., vol. 9, no. 4, pp. 3559–3579, 2023, doi: 10.1007/s40747-021-00614-4.

. H., O. S. Sitompul, E. B. Nababan, . T., D. Abdullah, and A. S. Ahmar, “A New Diversity Technique for Imbalance Learning Ensembles,” Int. J. Eng. Amp Technol., vol. 7, no. 2.14, p. 478, Apr. 2018, doi: 10.14419/ijet.v7i2.11251.

K. Raghavendar, I. Batra, and A. Malik, “Novel Framework for Resources Optimization to Solve Class Imbalance Problems,” Proceedings - 2021 International Conference on Computing Sciences, ICCS 2021. Institute of Electrical and Electronics Engineers Inc., pp. 143–147, 2021. doi: 10.1109/ICCS54944.2021.00036.

G. Idakwo, “Structure–activity relationship-based chemical classification of highly imbalanced Tox21 datasets,” Journal of Cheminformatics, vol. 12, no. 1. 2020. doi: 10.1186/s13321-020-00468-x.

“A Review on Class Imbalance Problem: Analysis and Potential Solutions,” Int. J. Comput. Sci. Issues, vol. 14, no. 6, pp. 43–51, Nov. 2017, doi: 10.20943/01201706.4351.

D. Gyoten, M. Ohkubo, and Y. Nagata, “<b>Imbalanced data classification procedure based on SMOTE</b>,” Total Qual. Sci., vol. 5, no. 2, pp. 64–71, Jan. 2020, doi: 10.17929/tqs.5.64.

J. Grzyb, “SVM ensemble training for imbalanced data classification using multi-objective optimization techniques,” Applied Intelligence, vol. 53, no. 12. pp. 15424–15441, 2023. doi: 10.1007/s10489-022-04291-9.

Y. Villuendas-Rey and M. ia Matilde Garc’ ia-Lorenzo, “Mixed Data Balancing through Compact Sets Based Instance Selection,” in Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, Springer Berlin Heidelberg, 2013, pp. 254–261. doi: 10.1007/978-3-642-41822-8_32.

J. Potthast, V. Grimm, and J. Rubart, “Immersive Experience of Multidimensional Data using Mixed Reality based Scatterplots,” ACM International Conference Proceeding Series. Association for Computing Machinery, pp. 594–596, 2022. doi: 10.1145/3543758.3547515.

L. Wang, Q. Zhang, X. Niu, Y. Ren, and J. Xia, “Outlier detection of mixed data based on neighborhood combinatorial entropy,” Computers, Materials and Continua, vol. 69, no. 2. Tech Science Press, pp. 1765–1781, 2021. doi: 10.32604/cmc.2021.017516.

“(PDF) Data Keluhan pelanggan.” Accessed: Oct. 30, 2023. [Online]. Available: https://www.researchgate.net/publication/374169609_Data_Keluhan_pelanggan?channel=doi&linkId=651291102c6cfe2cc21013dd&showFulltext=true

“What is Data Aggregation?,” Data Management. Accessed: Oct. 29, 2023. [Online]. Available: https://www.techtarget.com/searchdatamanagement/definition/data-aggregation

Y. Wang, Y. Yuan, G. Wang, and Y. Ma, “Graph cells: Top-k structural-textual aggregated query over information networks,” Information Sciences, vol. 547. Elsevier Inc., pp. 354–366, 2021. doi: 10.1016/j.ins.2020.08.057.

A. Cuzzocrea, “BigMDHealth: Supporting Multidimensional Big Data Management and Analytics over Big Healthcare Data via Effective and Efficient Multidimensional Aggregate Queries over Key-Value Stores,” Lecture Notes on Data Engineering and Communications Technologies, vol. 165. Springer Science and Business Media Deutschland GmbH, pp. 187–194, 2023. doi: 10.1007/978-981-99-0741-0_13.

R. Hans, “Pelajari Seluk Beluk Tugas Data Analyst & Fungsinya.” Accessed: Oct. 29, 2023. [Online]. Available: https://dqlab.id/pelajari-seluk-beluk-tugas-data-analyst-and-fungsinya

R. Redo and A. Perdana, “PENGGABUNGAN CLASS PADA DATA YANG TIDAK SEIMBANG.” Oct. 2023. doi: 10.13140/RG.2.2.26131.45608.

K. S. Nugroho, “Confusion Matrix untuk Evaluasi Model pada Supervised Learning,” Medium. Accessed: Oct. 30, 2023. [Online]. Available: https://ksnugroho.medium.com/confusion-matrix-untuk-evaluasi-model-pada-unsupervised-machine-learning-bc4b1ae9ae3f

K. Abhishek and G. Hamarneh, “Matthews correlation coefficient loss for deep convolutional networks: Application to skin lesion segmentation,” Proceedings - International Symposium on Biomedical Imaging, vol. 2021-April. IEEE Computer Society, pp. 225–229, 2021. doi: 10.1109/ISBI48211.2021.9433782.

D. Chicco and G. Jurman, “The Matthews correlation coefficient (MCC) should replace the ROC AUC as the standard metric for assessing binary classification,” BioData Mining, vol. 16, no. 1. BioMed Central Ltd, 2023. doi: 10.1186/s13040-023-00322-4.

D. Chicco and G. Jurman, “A statistical comparison between Matthews correlation coefficient (MCC), prevalence threshold, and Fowlkes–Mallows index,” Journal of Biomedical Informatics, vol. 144. Academic Press Inc., 2023. doi: 10.1016/j.jbi.2023.104426.

Published
2024-01-15
How to Cite
Redo, M., & Perdana, A. (2024). ANALISIS KETIDAKSEIMBANGAN KELAS DALAM PENGEMBANGAN MODEL KLASIFIKASI. Proceedings of the National Conference on Electrical Engineering, Informatics, Industrial Technology, and Creative Media, 3(1), 602-610. Retrieved from https://centive.ittelkom-pwt.ac.id/index.php/centive/article/view/137