An interpretable machine learning model for covid-19 screening
Keywords:COVID-19, machine learning, artificial intelligence, pandemia
Introduction: the Coronavirus Disease 2019 (COVID-19) is a viral disease which has been declared a pandemic by the WHO. Diagnostic tests are expensive and are not always available. Researches using machine learning (ML) approach for diagnosing SARS-CoV-2 infection have been proposed in the literature to reduce cost and allow better control of the pandemic.
Objective: we aim to develop a machine learning model to predict if a patient has COVID-19 with epidemiological data and clinical features.
Methods: we used six ML algorithms for COVID-19 screening through diagnostic prediction and did an interpretative analysis using SHAP models and feature importances.
Results: our best model was XGBoost (XGB) which obtained an area under the ROC curve of 0.752, a sensitivity of 90%, a specificity of 40%, a positive predictive value (PPV) of 42.16%, and a negative predictive value (NPV) of 91.0%. The best predictors were fever, cough, history of international travel less than 14 days ago, male gender, and nasal congestion, respectively.
Conclusion: We conclude that ML is an important tool for screening with high sensitivity, compared to rapid tests, and can be used to empower clinical precision in COVID-19, a disease in which symptoms are very unspecific.
WHO Coronavirus Disease (COVID-19) Dashboard. World Health Organization – Avaliable from: <https://covid19.who.int> (2020).
Bustin, S. & Nolan, T. RT-qPCR Testing of SARS-CoV-2: A Primer. International Journal of Molecular Sciences 21, 3004 (2020). DOI: https://doi.org/10.3390/ijms21083004
Wynants, L. et al. Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal. BMJ 369, m1328 (2020). DOI: https://doi.org/10.1136/bmj.m1328
Peiffer-Smadja, N., Maatoug, R., Lescure, FX. et al. Machine Learning for COVID-19 needs global collaboration and data-sharing. Nat Mach Intell 2, 293–294 (2020). DOI: https://doi.org/10.1038/s42256-020-0181-6
Meng, Z. et al. Development and utilization of an intelligent application for aiding COVID-19 diagnosis. medRxiv, (2020). DOI: https://doi.org/10.1101/2020.03.18.20035816
Yan, L. et al. A machine learning-based model for survival prediction in patients with severe COVID-19 infection. medRxiv, (2020). DOI: https://doi.org/10.1101/2020.02.27.20028027
Zangirolami-Raimundo, J., Echeimberg, J. & Leone, C. Research methodology topics: Cross-sectional studies. Journal of Human Growth and Development 28, 356–360 (2018).DOI: https://doi.org/10.7322/jhgd.152198
Orientações para o Manejo de Pacientes de COVID-19. Federal Government of Brazil (2020). Preprint at: <https://www.gov.br/saude/pt-br>.
Cascella, M., Rajnik, M., Cuomo, A., Dulebohn, S. & Napoli, R. Features, Evaluation, and Treatment of Coronavirus. (StatPearls Publishing LLC., 2020).
McIntosh, K., Hirsch, M. & Bloom, A. Coronavirus disease 2019 (COVID-19): Epidemiology, virology, and prevention. Uptodate (2020). Preprint at <https://www.uptodate.com/contents/coronavirus-disease-2019-covid-19-epidemiolog y-virology-clinical-features-diagnosis-and-prevention#H3103904400>
Batista, G., Prati, R. & Monard, M. A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter 6, 20-29 (2004). DOI: https://doi.org/10.1145/1007730.1007735
Moons, K. et al. Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): Explanation and Elaboration. Annals of Internal Medicine 162, W1-W73 (2015).DOI: https://doi.org/10.7326/M14-0698
Shah, P., Kendall, F., Khozin, S. et al. Artificial intelligence and machine learning in clinical development: a translational perspective. npj Digit. Med. 2, 69 (2019). DOI: https://doi.org/10.1038/s41746-019-0148-3
Finding a role for AI in the pandemic. Nat Mach Intell 2, 291 (2020). DOI: https://doi.org/10.1038/s42256-020-0196-z
Batista, A., Miraglia, J., Donato, T. & Chiavegatto Filho, A. COVID-19 diagnosis prediction in emergency care patients: a machine learning approach. medRxiv (2020). DOI: https://doi.org/10.1101/2020.04.04.20052092
Ribeiro, M., Singh, S. & Guestrin, C. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. Association for Computing Machinery 1135–1144 (2016). DOI: https://doi.org/10.1145/2939672.2939778
Lundberg, S. & Lee, S. A Unified Approach to Interpreting Model Predictions. 31st Conference on Neural Information Processing Systems NIPS (2017). Preprint at <https://proceedings.neurips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67 767-Abstract.html>
Couzin-Frankel, J. The mystery of the pandemic’s ‘happy hypoxia’. Science 368, 455-456 (2020). DOI: https://doi.org/10.1126/science.368.6490.455
Copyright (c) 2022 Pinasco GC, de Mattos Farina EMJ, Barcellos Filho FN, Fiorotti WF, Ferreira MCM, Souza Cruz SC, Colodette AL, Loureiro LR, Comério T, Farias DCS, Lima EFA, Manhambusque KV
This work is licensed under a Creative Commons Attribution 4.0 International License.