Analysis of Synthetic Data Utilization with Generative Adversarial Network in Flood Classification using K-Nearest Neighbor Algorithm

Afriza, Wahyu and Riasetiawan, Mardhani and Tyas, Dyah Aruming (2023) Analysis of Synthetic Data Utilization with Generative Adversarial Network in Flood Classification using K-Nearest Neighbor Algorithm. International Journal of Advanced Computer Science and Applications, 14 (12). pp. 678-683. ISSN 2158107X

[thumbnail of 1836.Analysis-of-Synthetic-Data-Utilization-with-Generative-Adversarial-Network-in-Flood-Classification-using-KNearest-Neighbor-AlgorithmInternational-Journal-of-Advanced-Computer-Science-and-Applications.pdf] Text
1836.Analysis-of-Synthetic-Data-Utilization-with-Generative-Adversarial-Network-in-Flood-Classification-using-KNearest-Neighbor-AlgorithmInternational-Journal-of-Advanced-Computer-Science-and-Applications.pdf
Restricted to Registered users only

Download (1MB) | Request a copy

Abstract

Indonesia is a country with a tropical climate that has high rainfall rates and is supported by the uncertainty of weather and climate conditions. With the uncertainty of weather and climate as well as flood events, minimal predictive information on flooding, and the lack of availability of data on the causes of flooding, a comparison of synthetic data generation from the minimal data available from BMKG with synthetic data generation from Kaggle online platform data in the form of temperature and humidity data, rainfall, and wind speed from BMKG and annual rain data from Kaggle was analyzed. This research aims to obtain the results of data comparison analysis of synthetic data generation from different datasets with the benchmark of classification system results using K-Nearest Neighbor (KNN) and accuracy evaluation with Confusion Matrix. The research process uses climate data from the BMKG DI Yogyakarta Climatology Station within 20 months, the Geophysical Station within 12 months, and Kerala data with a range of 1901–2018. Synthetic data generation is done using the Conditional Tabular Generative Adversarial Network (CTGAN) model. CTGAN produces quite good data in terms of distribution and data differences if the original data is large and the synthetic data generated is small. The KNN classification system on the BMKG data experienced overfitting, as indicated by the accuracy value in the evaluation increasing in the range of 85–94% and the validation decreasing in the range of 89%–65%. This is because there is no uniqueness in the data and too little original data made into synthetics, which affects the difficulty of the classification system in identifying data that is quite different in distance and data values generated by CTGAN. In Kerala, the accuracy value on evaluation is in the range of 92–95%, and validation is in the range of 0.7–0.83%, with Classifier k1 being the most optimal system.

Item Type: Article
Additional Information: Library Dosen
Uncontrolled Keywords: Classification; GAN; KNN; rainfall; synthetic data
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Faculty of Mathematics and Natural Sciences > Computer Science & Electronics Department
Depositing User: Masrumi Fathurrohmah
Date Deposited: 22 Aug 2024 07:03
Last Modified: 22 Aug 2024 07:03
URI: https://ir.lib.ugm.ac.id/id/eprint/2846

Actions (login required)

View Item
View Item