Muhammad Kh.T.Q, Fadhlullah and Wahyono, Wahyono (2024) Classification of Tuberculosis Based on Chest X-Ray Images for Imbalance Data using SMOTE. International Journal of Computing and Digital Systems, 16 (1). 981 -993. ISSN 2210142X
Full text not available from this repository. (Request a copy)Abstract
This research delves into the challenge of dataset imbalance in classifying Chest X-Ray (CXR) images in the TBX11K dataset. To address this, the study employs Random Forest (RF) and XGBoost (XGB) methods, both with and without the Synthetic Minority Over-sampling Technique (SMOTE). The primary objective is to evaluate the impact of SMOTE on the performance of these models in classifying CXR images from the TBX11K dataset. This research applies SMOTE to the RF and XGB classification models to increase the number of minority class samples (TB positive) and address the imbalance with the majority class samples (TB negative). To ensure a comprehensive comparison, each model is assessed using a consistent set of evaluation metrics, including accuracy, precision, recall, and F1 score. The findings indicate that applying SMOTE to both RF and XGB models effectively mitigates class imbalance in the dataset. Specifically, the RF model without SMOTE achieves an accuracy of approximately 93.33%, while the RF model with SMOTE achieves an accuracy of 92.72%. On the other hand, the XGB model without SMOTE achieves an accuracy of 94.11%, and the XGB model with SMOTE reaches 94.33%. Although SMOTE enhances overall model performance, challenges persist in accurately predicting the minority classes’altb’ and’ltb.’ These challenges are attributed to the less representative features of these minority classes, which are difficult to overcome even with resampling techniques. Based on the experimental results, the XGB model with SMOTE emerges as the most optimal model for classifying TBX11K images. Despite the improved performance, further work is needed to enhance the prediction accuracy for minority classes, suggesting that additional techniques or more sophisticated models might be required to address this issue comprehensively.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | machine learning; Random Forest; VGG16; XGBoost |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Divisions: | Faculty of Mathematics and Natural Sciences > Computer Science & Electronics Department |
Depositing User: | Wiyarsih Wiyarsih |
Date Deposited: | 29 Apr 2025 08:33 |
Last Modified: | 29 Apr 2025 08:33 |
URI: | https://ir.lib.ugm.ac.id/id/eprint/16189 |