Heryawan, Lukman and Novitaningrum, Dian and Nastiti, Kartika Rizqi and Mahmudah, Salsabila Nurulfarah (2024) Medical Record Document Search with TF-IDF and Vector Space Model (VSM). International Journal on Advanced Science, Engineering and Information Technology, 14 (3). pp. 847-852. ISSN 20885334
![[thumbnail of 3.295 Medical-Record-Document-Search-with-TFIDF-and-Vector-Space-Model-VSMInternational-Journal-on-Advanced-Science-Engineering-and-Information-Technology.pdf]](https://ir.lib.ugm.ac.id/style/images/fileicons/text.png)
3.295 Medical-Record-Document-Search-with-TFIDF-and-Vector-Space-Model-VSMInternational-Journal-on-Advanced-Science-Engineering-and-Information-Technology.pdf - Published Version
Restricted to Registered users only
Download (1MB) | Request a copy
Abstract
The growth of medical record documents is increasing over time, and the various types of diseases and therapies needed are increasing. However, this has not been followed by an effective and efficient search process. This study aims to deal with search problems that often take a long time with search results that are not necessarily as expected by building a search model for medical record documents using the vector space model (VSM) and TF-IDF methods. The VSM method allows retrieval of results that are not the same as the search queries entered by the user but are expected to provide still results relevant to the user’s desired needs. The model development process was taken based on the data in the FS_ANAMNESA and FS_DIAGNOSA columns, followed by preprocessing, which consists of deleting blank lines, lowercase, removing punctuation marks, HTML tags, stop words, excess spaces between words, and normalizing typo words, then forming a TF-IDF matrix based on the frequency of occurrence of each word feature, and followed by the calculation of the similarity value of the search query compared to medical record documents based on the cosine similarity formula. The retrieval results were all columns of each existing medical record document and were sorted based on 10 rows with the highest similarity value. The model evaluation results were based on 1000 medical record documents and tested with 20 search queries in this study, which gave an average precision value of 0.548 and an average recall value of 0.796.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | cosine similarity; evaluation metric; Medical records; preprocessing; TF-IDF |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Divisions: | Faculty of Mathematics and Natural Sciences > Computer Science & Electronics Department |
Depositing User: | Masrumi Fathurrohmah |
Date Deposited: | 19 Feb 2025 08:30 |
Last Modified: | 19 Feb 2025 08:30 |
URI: | https://ir.lib.ugm.ac.id/id/eprint/14757 |