Puspitasari, Devi Ambarwati and Sutrisno, Adi (2023) Identify Fake Author in Indonesia Crime Cases: A Forensic Authorsip Analysis Using N-gram and Stylometric Features. In: ICADEIS 2023 - International Conference on Advancement in Data Science, E-Learning and Information Systems: Data, Intelligent Systems, and the Applications for Human Life, Proceeding, 2 Agustus 2023, Bali, Indonesia.
Identify_Fake_Author_in_Indonesia_Crime_Cases_A_Forensic_Authorsip_Analysis_Using_N-gram_and_Stylometric_Features.pdf - Published Version
Restricted to Registered users only
Download (4MB) | Request a copy
Abstract
cases of violations of the Law on Electronic Information and Transaction (UU ITE) which are dominated by cases related to hacking and fake documents. An interesting fact is that a number of cases were found with evidence that points to the falsification of texts and someone's authorship. Proofing authorship dispute cases in Indonesia has not reached an analysis of authorship, because of the difficulty of identifying personal identity in electronic texts, especially in short texts with limited characters and words. This study examines Indonesian text set to investigate and describe linguistic profiles based on N-gram analysis and style characteristics. The data source in this study is Corpora of electronic text sets from 50 unique authors, including 8 authors and evidence from criminal cases, which are limited to 2000 characters or 500 words. All texts are personal texts that are collected from volunteers and case documents that are permitted to be accessed. Data analysis was carried out by determining and calculating the n-grams, both on the character-level and word-level, and performing stylometric features that the Natural Language Toolkit (NLTK) library extracts. The results of the data analysis show that lexically, the character-level n-gram analysis, as the smallest n-unit, shows an important element of authorship attribution, such as the use of alphabetic and non-alphabetic characters, capitalization, and punctuation. Diction is a significant factor to identify the author's profile and distinguish between one author and another. The results of using the small text set are able to demonstrate authorship attribution to identify authors, with the various stylistic features, resulting in a classification accuracy of between 92% and 98.5%.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Uncontrolled Keywords: | Authorship Attribution; Authorship Identification; N-Grams; Short text; Stylometry |
Subjects: | P Language and Literature > P Philology. Linguistics |
Divisions: | Faculty of Cultural Sciences > English Literature Department |
Depositing User: | OKTAVIANA DWI P |
Date Deposited: | 06 Sep 2024 06:22 |
Last Modified: | 06 Sep 2024 06:22 |
URI: | https://ir.lib.ugm.ac.id/id/eprint/6700 |