A Huber M-Estimator Algorithm and Decision Tree Regression Approach to Improve the Prediction Performance of Datasets with Outlier

Basalamah, Salsabila and Sihabuddin, Agus (2024) A Huber M-Estimator Algorithm and Decision Tree Regression Approach to Improve the Prediction Performance of Datasets with Outlier. Intelligent Network and Systems Society, 17 (1). pp. 1-9. ISSN 2185310X

[thumbnail of 3.844 A-Huber-MEstimator-Algorithm-and-Decision-Tree-Regression-Approach-to-Improve-the-Prediction-Performance-of-Datasets-with-OutlierInternational-Journal-of-Intelligent-Engineering-and-Systems.pdf] Text
3.844 A-Huber-MEstimator-Algorithm-and-Decision-Tree-Regression-Approach-to-Improve-the-Prediction-Performance-of-Datasets-with-OutlierInternational-Journal-of-Intelligent-Engineering-and-Systems.pdf - Published Version
Restricted to Registered users only

Download (354kB) | Request a copy

Abstract

utliers can cause the results of the analysis to be biased. Two approaches to dealing with existing outliers are removing the outliers or modifying the method used. Commonly used methods like machine learning (ML) often require enhanced robustness in predicting outliers. One such method is decision tree regression (DTR). However, the DTR method has limitations as it does not consider outliers and makes predictions at leaf nodes based on central values of the data, which can introduce biases into the results. One of the algorithm that retains outliers is the M-estimator from robust regression. This study proposes a modification of the M-estimator for DTR by using Huber weights on leaf nodes for DTR predictions. We used five regression datasets sourced from UCI. The results are that the dataset with outliers provides better predictions on the concrete dataset, superconductivity dataset, Boston dataset, and Airfoil dataset having the best mean absolute error (MAE) of 3.963, 9.140, 2.021, and 1.644, with QSAR fish toxicity the only exception, where has the best MAE of 0.522 for the outlier remover dataset.

Item Type: Article
Uncontrolled Keywords: Decision tree regression; Huber weights; M-estimator; Outliers; Robust regression
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Faculty of Mathematics and Natural Sciences > Computer Science & Electronics Department
Depositing User: Masrumi Fathurrohmah
Date Deposited: 04 Mar 2025 06:51
Last Modified: 04 Mar 2025 06:51
URI: https://ir.lib.ugm.ac.id/id/eprint/15484

Actions (login required)

View Item
View Item