An interpretable machine learning model for covid-19 screening

Authors

  • Gustavo Carreiro Pinasco Universidade Federal do Espírito Santo – UFES, Brazil;
  • Eduardo Moreno Júdice de Mattos Farina Universidade Federal de São Paulo – UNIFESP, Brazil;
  • Fabiano Novaes Barcellos Filho cEscola Superior de Ciências da Santa Casa de Misericórdia de Vitória – EMESCAM, Brazil;
  • Willer França Fiorotti cEscola Superior de Ciências da Santa Casa de Misericórdia de Vitória – EMESCAM, Brazil;
  • Matheus Coradini Mariano Ferreira dPrefeitura Municipal de Vitória, Brazil.
  • Sheila Cristina de Souza Cruz dPrefeitura Municipal de Vitória, Brazil.
  • Andre Louzada Colodette cEscola Superior de Ciências da Santa Casa de Misericórdia de Vitória – EMESCAM, Brazil;
  • Luciene Rossati Loureiro dPrefeitura Municipal de Vitória, Brazil.
  • Tatiane Comério dPrefeitura Municipal de Vitória, Brazil.
  • Dilzilene Cunha Sivirino Farias dPrefeitura Municipal de Vitória, Brazil.
  • Eliane de Fátima Almeida Lima aUniversidade Federal do Espírito Santo – UFES, Brazil;
  • Katia Valéria Manhambusque aUniversidade Federal do Espírito Santo – UFES, Brazil;

DOI:

https://doi.org/10.36311/jhgd.v32.13324

Keywords:

COVID-19, machine learning, artificial intelligence, pandemia

Abstract

Introduction: the Coronavirus Disease 2019 (COVID-19) is a viral disease which has been declared a pandemic by the WHO. Diagnostic tests are expensive and are not always available. Researches using machine learning (ML) approach for diagnosing SARS-CoV-2 infection have been proposed in the literature to reduce cost and allow better control of the pandemic.

Objective: we aim to develop a machine learning model to predict if a patient has COVID-19 with epidemiological data and clinical features.

Methods: we used six ML algorithms for COVID-19 screening through diagnostic prediction and did an interpretative analysis using SHAP models and feature importances.

Results: our best model was XGBoost (XGB) which obtained an area under the ROC curve of 0.752, a sensitivity of 90%, a specificity of 40%, a positive predictive value (PPV) of 42.16%, and a negative predictive value (NPV) of 91.0%. The best predictors were fever, cough, history of international travel less than 14 days ago, male gender, and nasal congestion, respectively.

Conclusion: We conclude that ML is an important tool for screening with high sensitivity, compared to rapid tests, and can be used to empower clinical precision in COVID-19, a disease in which symptoms are very unspecific.

 

Downloads

Download data is not yet available.

References

WHO Coronavirus Disease (COVID-19) Dashboard. World Health Organization – Avaliable from: <https://covid19.who.int> (2020).

Bustin, S. & Nolan, T. RT-qPCR Testing of SARS-CoV-2: A Primer. International Journal of Molecular Sciences 21, 3004 (2020). DOI: https://doi.org/10.3390/ijms21083004

Wynants, L. et al. Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal. BMJ 369, m1328 (2020). DOI: https://doi.org/10.1136/bmj.m1328

Peiffer-Smadja, N., Maatoug, R., Lescure, FX. et al. Machine Learning for COVID-19 needs global collaboration and data-sharing. Nat Mach Intell 2, 293–294 (2020). DOI: https://doi.org/10.1038/s42256-020-0181-6

Meng, Z. et al. Development and utilization of an intelligent application for aiding COVID-19 diagnosis. medRxiv, (2020). DOI: https://doi.org/10.1101/2020.03.18.20035816

Yan, L. et al. A machine learning-based model for survival prediction in patients with severe COVID-19 infection. medRxiv, (2020). DOI: https://doi.org/10.1101/2020.02.27.20028027

Zangirolami-Raimundo, J., Echeimberg, J. & Leone, C. Research methodology topics: Cross-sectional studies. Journal of Human Growth and Development 28, 356–360 (2018).DOI: https://doi.org/10.7322/jhgd.152198

Orientações para o Manejo de Pacientes de COVID-19. Federal Government of Brazil (2020). Preprint at: <https://www.gov.br/saude/pt-br>.

Cascella, M., Rajnik, M., Cuomo, A., Dulebohn, S. & Napoli, R. Features, Evaluation, and Treatment of Coronavirus. (StatPearls Publishing LLC., 2020).

McIntosh, K., Hirsch, M. & Bloom, A. Coronavirus disease 2019 (COVID-19): Epidemiology, virology, and prevention. Uptodate (2020). Preprint at <https://www.uptodate.com/contents/coronavirus-disease-2019-covid-19-epidemiolog y-virology-clinical-features-diagnosis-and-prevention#H3103904400>

Batista, G., Prati, R. & Monard, M. A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter 6, 20-29 (2004). DOI: https://doi.org/10.1145/1007730.1007735

Moons, K. et al. Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): Explanation and Elaboration. Annals of Internal Medicine 162, W1-W73 (2015).DOI: https://doi.org/10.7326/M14-0698

Shah, P., Kendall, F., Khozin, S. et al. Artificial intelligence and machine learning in clinical development: a translational perspective. npj Digit. Med. 2, 69 (2019). DOI: https://doi.org/10.1038/s41746-019-0148-3

Finding a role for AI in the pandemic. Nat Mach Intell 2, 291 (2020). DOI: https://doi.org/10.1038/s42256-020-0196-z

Batista, A., Miraglia, J., Donato, T. & Chiavegatto Filho, A. COVID-19 diagnosis prediction in emergency care patients: a machine learning approach. medRxiv (2020). DOI: https://doi.org/10.1101/2020.04.04.20052092

Ribeiro, M., Singh, S. & Guestrin, C. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. Association for Computing Machinery 1135–1144 (2016). DOI: https://doi.org/10.1145/2939672.2939778

Lundberg, S. & Lee, S. A Unified Approach to Interpreting Model Predictions. 31st Conference on Neural Information Processing Systems NIPS (2017). Preprint at <https://proceedings.neurips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67 767-Abstract.html>

Couzin-Frankel, J. The mystery of the pandemic’s ‘happy hypoxia’. Science 368, 455-456 (2020). DOI: https://doi.org/10.1126/science.368.6490.455

Downloads

Published

2022-06-23

Issue

Section

ORIGINAL ARTICLES