CLICK-ID: A novel dataset for Indonesian clickbait headlines

William, Andika and Sari, Yunita (2020) CLICK-ID: A novel dataset for Indonesian clickbait headlines. Data in Brief, 32. ISSN 23523409

Full text not available from this repository. (Request a copy)

Abstract

News analysis is a popular task in Natural Language Processing (NLP). In particular, the problem of clickbait in news analysis has gained attention in recent years [1, 2]. However, the majority of the tasks has been focused on English news, in which there is already a rich representative resource. For other languages, such as Indonesian, there is still a lack of resource for clickbait tasks. Therefore, we introduce the CLICK-ID dataset of Indonesian news headlines extracted from 12 Indonesian online news publishers. It is comprised of 15,000 annotated headlines with clickbait and non-clickbait labels. Using the CLICK-ID dataset, we then developed an Indonesian clickbait classification model achieving favourable performance. We believe that this corpus will be useful for replicable experiments in clickbait detection or other experiments in NLP areas.

Item Type: Article
Uncontrolled Keywords: Indonesian; Natural Language Processing; News articles; Clickbait; Text-classification
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Faculty of Mathematics and Natural Sciences > Computer Science & Electronics Department
Depositing User: Sri JUNANDI
Date Deposited: 20 May 2025 01:44
Last Modified: 20 May 2025 01:44
URI: https://ir.lib.ugm.ac.id/id/eprint/17142

Actions (login required)

View Item
View Item