A medoid-based deviation ratio index to determine the number of clusters in a dataset

Kariyam, Kariyam and Abdurakhman, Abdurakhman and Effendie, Adhitya Ronnie (2023) A medoid-based deviation ratio index to determine the number of clusters in a dataset. MethodsX, 10: 102084. ISSN 22150161

[thumbnail of 101. 1-s2.0-S2215016123000870-main.pdf] Text
101. 1-s2.0-S2215016123000870-main.pdf - Published Version
Restricted to Registered users only

Download (1MB) | Request a copy

Abstract

Most existing methods of determining the number of groups apply to particular data types or are calculated based on the distance matrix for all object pairs. In this paper, we propose a medoid-based Deviation Ratio Index (DRI) to determine the number of clusters. The DRI is calculated based on the distance matrix for each object to k final medoids. These final medoids are produced by the block-based k-medoids algorithm (BlockD-KM). We choose a specific transformation and a suitable distance for certain variables before executing the BlockD-KM. We illustrated the detailed stages of DRI on secondary data in the 2022 environmental index of Asia Pacific countries, so that they are easy to reproduce. We use eight real datasets, namely Breast Cancer, Heart Disease, Iris, Wine, Soybean, Ionosphere, Vote, and Credit Approval data, to validate the DRI method. We compare the DRI method with the Calinski-Harabaz (CH) and the Silhouette index. The experimental results show that the DRI is 100% correct in predicting the number of clusters. While the CH index correctly predicts 62.5% and the Silhouette index of 75%. We also generated three kinds of artificial data to evaluate the proposed method, and 76.7% of the experiments were predicted correctly. • The medoid-based deviation ratio index aids the researcher in determining the number of clusters • The DRI method applicable to any medoids-based partitioning algorithm • This method is suitable for all data types (categorical, numerical, and mixed)

Item Type: Article
Uncontrolled Keywords: Deviation ratio index; K-medoids based on block deviation; The number of clusters
Subjects: Q Science > QA Mathematics
Divisions: Faculty of Mathematics and Natural Sciences > Mathematics Department
Depositing User: Ismu WIDARTO
Date Deposited: 25 Sep 2024 07:33
Last Modified: 25 Sep 2024 07:33
URI: https://ir.lib.ugm.ac.id/id/eprint/7543

Actions (login required)

View Item
View Item