Optimization of AES using BERT and BiLSTM for Grading the Online Exams

Azhari, Azhari and Santoso, Agus and Ratna, Anak Agung Putri and Prestiliano, Jasson (2024) Optimization of AES using BERT and BiLSTM for Grading the Online Exams. International Journal of Intelligent Engineering and Systems, 17 (5). pp. 395-411. ISSN 2185310X

Text
2.944 Optimization-of-AES-using-BERT-and-BiLSTM-for-Grading-the-Online-ExamsInternational-Journal-of-Intelligent-Engineering-and-Systems.pdf - Published Version
Restricted to Registered users only
Download (660kB) | Request a copy

Official URL: https://oaji.net/articles/2023/3603-1723961238.pdf

Abstract

Essays are one of the most used exams to assess students. Universitas Terbuka Indonesia (Open University) conducts three online essay exams within a week for each first-year course, accounting for 30% of the mid-test score. The university has over 500.000 students and hundreds of courses. However, the limited number of correctors resulted in a time-consuming and ineffective process of checking and scoring each student’s essay response. Even score results can be subjective, unfair, lack detailed feedback from students, miss contextual and creative aspects, and be less reliable with complex or non-standard writing. This research proposes a hybrid approach of deep learning models and natural semantic grammar to improve and optimize an AES system, with the following steps: First, the datasets are collected from hundreds of students’ answers, each representing a single question. The datasets are pre-processed and augmented to enhance the quantity and variety of the original data and scores. Second, the BERT approach was utilized to transform each text dataset into vector feature spaces using pre-trained model weights. Finally, a prediction score model was generated using the BiLSTM method. The experiment’s results show that the model had an average Cohen’s Kappa score of 0.749 and the highest Cohen’s Kappa score of 0.91. This BERT-BiLSTM optimization model also has a better Cohen’s Kappa Score average (0.820) than the ATT-CNN-LSTM, BERT-XLNET, R2BERT, and CNN-BiLSTM models. After conducting a test on 46 lecturers, the results showed that the average time taken to examine one course for each student decreased to 1 minute and 2 seconds. Additionally, 92.75% of the lecturers found the process of checking responses to be fairer and more objective. In the results of the trial with 200 students, the mean percentage of question indications related to UI/UX, fairness, responsiveness, rubric, feedback, and transparency across students was 93.72%.

Item Type:	Article
Uncontrolled Keywords:	Automated essay score; BERT; BiLSTM; Cohen’s kappa score; Machine learning; Online exam
Subjects:	Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:	Faculty of Mathematics and Natural Sciences > Computer Science & Electronics Department
Depositing User:	Masrumi Fathurrohmah
Date Deposited:	13 Feb 2025 07:57
Last Modified:	13 Feb 2025 07:57
URI:	https://ir.lib.ugm.ac.id/id/eprint/14691

Actions (login required)

: View Item