A Systematic Literature Review of Text Classification: Datasets and Methods

Riduan, Gusti Muhammad and Soesanti, Indah and Adji, Teguh Bharata (2021) A Systematic Literature Review of Text Classification: Datasets and Methods. In: IEEE International Conference on Information Technology, Information Systems and Electrical Engineering, ICITISEE 2021.

[thumbnail of A_Systematic_Literature_Review_of_Text_Classification_Datasets_and_Methods.pdf] Text
A_Systematic_Literature_Review_of_Text_Classification_Datasets_and_Methods.pdf
Restricted to Registered users only

Download (1MB) | Request a copy

Abstract

We study the literature in major journals and conferences on the usage of shallow learning and deep learning methods for text classification. Shallow learning techniques such as Naive Bayes, Support Vector Machine, Random Forests were initially widely used to solve problems in text classification. however, these techniques generally require the presence of a precise feature extraction model, which is often very complex to produce precise accuracy. For this reason, researchers continue to try to find other learning techniques that are more efficient and provide a significant increase in accuracy. So currently deep learning methods such as Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) are more widely used to solve text classification cases. From 2016 up to the present, this literature study aimed to recognize and assess research methods and datasets utilized in text classification studies. Seventy-three text classification research articles posted from January 2016 until July 2021 were retained and chosen to be explored further based on the established inclusion and exclusion criteria. This literature review was conducted in a methodical manner. A systematic literature review is defined as a method for recognizing, evaluating, and interpreting all available study materials for the purpose of answer certain research questions. The following diagram depicts the overall distribution of text classification methods. Furthermore, public datasets were used in 85 percent of the research projects, whereas private datasets were used in 15 percent of the research studies. Twenty different strategies have been used. Eight of the most commonly used approaches in text classification were identified from the twenty methods. Researchers recommended integrating various machine learning methods, employing an increased algorithm, appending feature selection, and applying parameter optimization for some classifiers to improve the accuracy of machine learning classifiers for text classification. The findings of this study also revealed that are frequently mentioned and thus significant in the field of text classification. © 2021 IEEE.

Item Type: Conference or Workshop Item (Paper)
Additional Information: Cited by: 3
Uncontrolled Keywords: Convolutional neural networks; Decision trees; Feature extraction; Learning algorithms; Long short-term memory; Support vector machines; Text processing; Classification datasets; Classification methods; Dataset; Learning methods; Learning techniques; Literature reviews; Method; System literature review; Systematic literature review; Text classification; Classification (of information)
Subjects: T Technology > T Technology (General)
T Technology > TK Electrical engineering. Electronics Nuclear engineering
Divisions: Faculty of Engineering > Electrical and Information Technology Department
Depositing User: Sri JUNANDI
Date Deposited: 25 Oct 2024 08:38
Last Modified: 25 Oct 2024 08:38
URI: https://ir.lib.ugm.ac.id/id/eprint/8636

Actions (login required)

View Item
View Item