Solihah, Binti and Azhari, Azhari and Musdholifah, Aina (2020) Enhancement of conformational B-cell epitope prediction using CluSMOTE. PEERJ COMPUTER SCIENCE. ISSN 2376-5992
![[thumbnail of peerj-cs-275.pdf]](https://ir.lib.ugm.ac.id/style/images/fileicons/text.png)
peerj-cs-275.pdf
Restricted to Registered users only
Download (697kB) | Request a copy
Abstract
Background. A conformational B-cell epitope is one of the main components of
vaccine design. It contains separate segments in its sequence, which are spatially close
in the antigen chain. The availability of Ag-Ab complex data on the Protein Data Bank
allows for the development predictive methods. Several epitope prediction models also
have been developed, including learning-based methods. However, the performance of
the model is still not optimum. The main problem in learning-based prediction models
is class imbalance.
Methods. This study proposes CluSMOTE, which is a combination of a clusterbased
undersampling method and Synthetic Minority Oversampling Technique. The
approach is used to generate other sample data to ensure that the dataset of the
conformational epitope is balanced. The Hierarchical DBSCAN algorithm is performed
to identify the cluster in the majority class. Some of the randomly selected data is
taken from each cluster, considering the oversampling degree, and combined with the
minority class data. The balance data is utilized as the training dataset to develop a
conformational epitope prediction. Furthermore, two binary classification methods,
Support Vector Machine and Decision Tree, are separately used to develop model
prediction and to evaluate the performance of CluSMOTE in predicting conformational
B-cell epitope. The experiment is focused on determining the best parameter for optimal
CluSMOTE. Two independent datasets are used to compare the proposed prediction
model with state of the art methods. The first and the second datasets represent the
general protein and the glycoprotein antigens respectively.
Result. The experimental result shows that CluSMOTE Decision Tree outperformed the
Support Vector Machine in terms of AUC and Gmean as performance measurements.
The mean AUC of CluSMOTE Decision Tree in the Kringelum and the SEPPA 3 test
sets are 0.83 and 0.766, respectively. This shows that CluSMOTE Decision Tree is better
than other methods in the general protein antigen, though comparable with SEPPA 3
in the glycoprotein antigen.
Item Type: | Article |
---|---|
Additional Information: | Library Dosen |
Uncontrolled Keywords: | Cluster-based undersampling; SMOTE; Class imbalance; Hybrid sampling; Hierarchical DBSCAN, Vaccine design |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Divisions: | Faculty of Mathematics and Natural Sciences > Computer Science & Electronics Department |
Depositing User: | Sri JUNANDI |
Date Deposited: | 11 Jun 2025 01:45 |
Last Modified: | 11 Jun 2025 01:45 |
URI: | https://ir.lib.ugm.ac.id/id/eprint/17536 |