Integrating Decision Tree and BIRCH Clustering Algorithms of BERTopic for Analyzing Public Sentiment on Dirtyvote Movie

Muhajir, Muhammad and Rosadi, Dedi and Danardono, Danardono (2024) Integrating Decision Tree and BIRCH Clustering Algorithms of BERTopic for Analyzing Public Sentiment on Dirtyvote Movie. Mathematical Modelling of Engineering Problems, 11 (12). pp. 3391-3401. ISSN 23690739

[thumbnail of 139.mmep_11.12_17.pdf] Text
139.mmep_11.12_17.pdf - Published Version
Restricted to Registered users only

Download (1MB) | Request a copy

Abstract

This study analyzes public sentiment and topic modeling of YouTube comments on the politically charged film Dirtyvote during Indonesia's election period. Addressing the lack of robust methods for unstructured Indonesian-language social media data, the research proposes an integrative framework. This framework combines a Decision Tree algorithm with Gini Index for interpretable sentiment classification and BERTopic modified with BIRCH clustering to enhance stability and efficiency for large-scale topic modeling. The dataset comprises 76,502 YouTube comments, which were preprocessed to handle noise, informal language, and linguistic variations. Sentiment analysis results demonstrate the superior performance of the Decision Tree with Gini Index, achieving an accuracy of 98.72% and an F1-score of 96%, outperforming other methods such as SVM and Naïve Bayes. Meanwhile, BERTopic with BIRCH clustering achieved higher coherence metrics (e.g., CV, U_Mass, and NPMI) compared to standard BERTopic and K-Means clustering, showcasing its robustness in topic generation. This research contributes methodologically by introducing a scalable and interpretable framework for analyzing unstructured text data in Indonesian. Practically, it offers insights into public opinion dynamics on socio-political issues, highlighting the role of media in shaping perceptions. The findings underline the framework's potential for broader applications in sentiment analysis and topic modeling within diverse socio-political contexts. Copyright:

Item Type: Article
Uncontrolled Keywords: BERTopic; BIRCH clustering; Decision Tree; Dirtyvote; Gini Index; sentiment; topic modeling
Subjects: Q Science > QA Mathematics
Divisions: Faculty of Mathematics and Natural Sciences > Mathematics Department
Depositing User: Ismu WIDARTO
Date Deposited: 24 Jun 2025 04:07
Last Modified: 24 Jun 2025 04:07
URI: https://ir.lib.ugm.ac.id/id/eprint/19017

Actions (login required)

View Item
View Item