UmetaFlow: an untargeted metabolomics workflow for high-throughput data processing and analysis

Kontou, Eftychia E. and Walter, Axel and Alka, Oliver and Pfeuffer, Julianus and Sachsenberg, Timo and Mohite, Omkar S. and Nuhamunada, Matin and Kohlbacher, Oliver and Weber, Tilmann (2023) UmetaFlow: an untargeted metabolomics workflow for high-throughput data processing and analysis. Journal of Cheminformatics, 15 (1). pp. 1-12. ISSN 17582946

[thumbnail of s13321-023-00724-w.pdf] Text
s13321-023-00724-w.pdf - Published Version
Restricted to Repository staff only
Available under License Creative Commons Attribution.

Download (1MB) | Request a copy

Abstract

Metabolomics experiments generate highly complex datasets, which are time and work-intensive, sometimes even error-prone if inspected manually. Therefore, new methods for automated, fast, reproducible, and accurate data processing and dereplication are required. Here, we present UmetaFlow, a computational workflow for untargeted metabolomics that combines algorithms for data pre-processing, spectral matching, molecular formula and structural predictions, and an integration to the GNPS workflows Feature-Based Molecular Networking and Ion Identity Molecular Networking for downstream analysis. UmetaFlow is implemented as a Snakemake workflow, making it easy to use, scalable, and reproducible. For more interactive computing, visualization, as well as development, the workflow is also implemented in Jupyter notebooks using the Python programming language and a set of Python bindings to the OpenMS algorithms (pyOpenMS). Finally, UmetaFlow is also offered as a web-based Graphical User Interface for parameter optimization and processing of smaller-sized datasets. UmetaFlow was validated with in-house LC–MS/MS datasets of actinomycetes producing known secondary metabolites, as well as commercial standards, and it detected all expected features and accurately annotated 76 of the molecular formulas and 65 of the structures. As a more generic validation, the publicly available MTBLS733 and MTBLS736 datasets were used for benchmarking, and UmetaFlow detected more than 90 of all ground truth features and performed exceptionally well in quantification and discriminating marker selection. We anticipate that UmetaFlow will provide a useful platform for the interpretation of large metabolomics datasets.

Item Type: Article
Additional Information: Library Dosen
Uncontrolled Keywords: Untargeted metabolomics; Processing; Analysis; High-throughput workfow; Software
Subjects: Biology
Divisions: Faculty of Biology > Doctoral Program in Biology
Depositing User: Rusna Nur Aini Aini
Date Deposited: 10 Dec 2024 02:50
Last Modified: 10 Dec 2024 02:50
URI: https://ir.lib.ugm.ac.id/id/eprint/9637

Actions (login required)

View Item
View Item