Enhancing Spam Comment Detection on Social Media With Emoji Feature and Post-Comment Pairs Approach Using Ensemble Methods of Machine Learning

Chrismanto, Antonius Rachmat and Sari, Anny Kartika and Suyanto, Yohanes (2023) Enhancing Spam Comment Detection on Social Media With Emoji Feature and Post-Comment Pairs Approach Using Ensemble Methods of Machine Learning. IEEE Access, 11. pp. 80246-80265. ISSN 21693536

[thumbnail of 55. Enhancing_Spam_Comment_Detection_on_Social_Media_With_Emoji_Feature_and_Post-Comment_Pairs_Approach_Using_Ensemble_Methods_of_Machine_Learning.pdf] Text
55. Enhancing_Spam_Comment_Detection_on_Social_Media_With_Emoji_Feature_and_Post-Comment_Pairs_Approach_Using_Ensemble_Methods_of_Machine_Learning.pdf - Published Version
Restricted to Registered users only

Download (5MB) | Request a copy

Abstract

Every time a well-known public figure posts something on social media, it encourages many users to comment. Unfortunately, not all comments are relevant to the post. Some are spam comments which can disrupt the overall flow of information. This research employed two strategies to address issues in text spam detection on social media. The first strategy was utilizing emojis that had been frequently discarded in many studies. In fact, many social media users use emojis to convey their intentions. The second strategy was utilizing stacked post-comment pairs, which was different from many spam detection systems that solely focused on comment-only data. The post-comment pairs were required to detect whether a comment was relevant (not spam) or spam irrelevant to the post context. This research used the SpamID-Pair dataset derived from social media for Indonesian spam comment detection. After a comprehensive investigation, the emoji-text feature, the stacked post-comment pairs, and ensemble voting could boost detection performance (in terms of accuracy and F1). Adding manual features also improved detection performance. Based on the experiment, the best stand-alone methods for spam comment detection are the SVM (RBF kernel) and the soft voting ensemble method for the best average performance.

Item Type: Article
Uncontrolled Keywords: emoji feature; ensemble method; post-comment pair; social media; Spam detection
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Faculty of Mathematics and Natural Sciences > Computer Science & Electronics Department
Depositing User: Ismu WIDARTO
Date Deposited: 23 Sep 2024 06:37
Last Modified: 23 Sep 2024 06:37
URI: https://ir.lib.ugm.ac.id/id/eprint/7438

Actions (login required)

View Item
View Item