Missing Value Imputation in Data MCAR for Classification of Type 2 Diabetes Mellitus and its Complications

Andriani, Anik and Hartati, Sri and Afiahayati, Afiahayati and Danawati, Cornelia Wahyu (2024) Missing Value Imputation in Data MCAR for Classification of Type 2 Diabetes Mellitus and its Complications. International Journal of Advanced Computer Science and Applications, 15 (8). pp. 459-466. ISSN 2158107X

[thumbnail of 2.809 Missing-Value-Imputation-in-Data-MCAR-for-Classification-of-Type-2-Diabetes-Mellitus-and-its-ComplicationsInternational-Journal-of-Advanced-Computer-Science-and-Applications.pdf] Text
2.809 Missing-Value-Imputation-in-Data-MCAR-for-Classification-of-Type-2-Diabetes-Mellitus-and-its-ComplicationsInternational-Journal-of-Advanced-Computer-Science-and-Applications.pdf - Published Version
Restricted to Registered users only

Download (963kB) | Request a copy

Abstract

Type 2 diabetes mellitus (T2DM) is a disease that is at risk for many complications. Previous research on the prognosis of T2DM and its complications is limited to the impact of T2DM on one particular disease. Guidebook for T2DM Management in Indonesia has eight categories of T2DM complications. The purpose of this study is to classify T2DM prognosis into eight categories: one controlled class and seven classes of aggravating disorders. The classification was based on medical record data from T2DM patients at Panti Rapih Hospital in Yogyakarta between 2017 and 2022. The problem is that the medical record data has numerous missing values (MV). The dataset had 29% missing values, classified as Missing Completely at Random (MCAR). This study performed imputation on the dataset prior to categorization. For MV imputation, a variety of imputation methods were used, and their accuracy was measured using Mean Absolute Error (MAE) and Root Mean Square Error (RMSE). The best imputation results were utilized to update the dataset. Subsequently, the dataset was used for classification employing several classification methods. The classification results were compared to determine the method with the highest accuracy in this scenario. The Decision Tree method with stratified k-fold cross-validation emerged as the optimal method for this classification. The results revealed an average accuracy value of 0.8529.

Item Type: Article
Uncontrolled Keywords: decision tree; missing completely at random; Missing value; prognosis of diabetes mellitus
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Faculty of Mathematics and Natural Sciences > Computer Science & Electronics Department
Depositing User: Masrumi Fathurrohmah
Date Deposited: 13 Feb 2025 00:55
Last Modified: 13 Feb 2025 00:55
URI: https://ir.lib.ugm.ac.id/id/eprint/14667

Actions (login required)

View Item
View Item