William, Andika and Sari, Yunita (2020) CLICK-ID: A novel dataset for Indonesian clickbait headlines. Data in Brief, 32. ISSN 23523409
Full text not available from this repository. (Request a copy)Abstract
News analysis is a popular task in Natural Language Processing (NLP). In particular, the problem of clickbait in news analysis has gained attention in recent years [1, 2]. However, the majority of the tasks has been focused on English news, in which there is already a rich representative resource. For other languages, such as Indonesian, there is still a lack of resource for clickbait tasks. Therefore, we introduce the CLICK-ID dataset of Indonesian news headlines extracted from 12 Indonesian online news publishers. It is comprised of 15,000 annotated headlines with clickbait and non-clickbait labels. Using the CLICK-ID dataset, we then developed an Indonesian clickbait classification model achieving favourable performance. We believe that this corpus will be useful for replicable experiments in clickbait detection or other experiments in NLP areas.
| Item Type: | Article |
|---|---|
| Uncontrolled Keywords: | Indonesian; Natural Language Processing; News articles; Clickbait; Text-classification |
| Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
| Divisions: | Faculty of Mathematics and Natural Sciences > Computer Science & Electronics Department |
| Depositing User: | Sri JUNANDI |
| Date Deposited: | 20 May 2025 01:44 |
| Last Modified: | 20 May 2025 01:44 |
| URI: | https://ir.lib.ugm.ac.id/id/eprint/17142 |
