An interpretable machine learning model for covid-19 screening


  • Gustavo Carreiro Pinasco Universidade Federal do Espírito Santo – UFES, Brazil;
  • Eduardo Moreno Júdice de Mattos Farina Universidade Federal de São Paulo – UNIFESP, Brazil;
  • Fabiano Novaes Barcellos Filho cEscola Superior de Ciências da Santa Casa de Misericórdia de Vitória – EMESCAM, Brazil;
  • Willer França Fiorotti cEscola Superior de Ciências da Santa Casa de Misericórdia de Vitória – EMESCAM, Brazil;
  • Matheus Coradini Mariano Ferreira dPrefeitura Municipal de Vitória, Brazil.
  • Sheila Cristina de Souza Cruz dPrefeitura Municipal de Vitória, Brazil.
  • Andre Louzada Colodette cEscola Superior de Ciências da Santa Casa de Misericórdia de Vitória – EMESCAM, Brazil;
  • Luciene Rossati Loureiro dPrefeitura Municipal de Vitória, Brazil.
  • Tatiane Comério dPrefeitura Municipal de Vitória, Brazil.
  • Dilzilene Cunha Sivirino Farias dPrefeitura Municipal de Vitória, Brazil.
  • Eliane de Fátima Almeida Lima aUniversidade Federal do Espírito Santo – UFES, Brazil;
  • Katia Valéria Manhambusque aUniversidade Federal do Espírito Santo – UFES, Brazil;



COVID-19, machine learning, artificial intelligence, pandemia


Introduction: the Coronavirus Disease 2019 (COVID-19) is a viral disease which has been declared a pandemic by the WHO. Diagnostic tests are expensive and are not always available. Researches using machine learning (ML) approach for diagnosing SARS-CoV-2 infection have been proposed in the literature to reduce cost and allow better control of the pandemic.

Objective: we aim to develop a machine learning model to predict if a patient has COVID-19 with epidemiological data and clinical features.

Methods: we used six ML algorithms for COVID-19 screening through diagnostic prediction and did an interpretative analysis using SHAP models and feature importances.

Results: our best model was XGBoost (XGB) which obtained an area under the ROC curve of 0.752, a sensitivity of 90%, a specificity of 40%, a positive predictive value (PPV) of 42.16%, and a negative predictive value (NPV) of 91.0%. The best predictors were fever, cough, history of international travel less than 14 days ago, male gender, and nasal congestion, respectively.

Conclusion: We conclude that ML is an important tool for screening with high sensitivity, compared to rapid tests, and can be used to empower clinical precision in COVID-19, a disease in which symptoms are very unspecific.



WHO Coronavirus Disease (COVID-19) Dashboard. World Health Organization – Avaliable from: <> (2020).

Bustin, S. & Nolan, T. RT-qPCR Testing of SARS-CoV-2: A Primer. International Journal of Molecular Sciences 21, 3004 (2020). DOI:

Wynants, L. et al. Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal. BMJ 369, m1328 (2020). DOI:

Peiffer-Smadja, N., Maatoug, R., Lescure, FX. et al. Machine Learning for COVID-19 needs global collaboration and data-sharing. Nat Mach Intell 2, 293–294 (2020). DOI:

Meng, Z. et al. Development and utilization of an intelligent application for aiding COVID-19 diagnosis. medRxiv, (2020). DOI:

Yan, L. et al. A machine learning-based model for survival prediction in patients with severe COVID-19 infection. medRxiv, (2020). DOI:

Zangirolami-Raimundo, J., Echeimberg, J. & Leone, C. Research methodology topics: Cross-sectional studies. Journal of Human Growth and Development 28, 356–360 (2018).DOI:

Orientações para o Manejo de Pacientes de COVID-19. Federal Government of Brazil (2020). Preprint at: <>.

Cascella, M., Rajnik, M., Cuomo, A., Dulebohn, S. & Napoli, R. Features, Evaluation, and Treatment of Coronavirus. (StatPearls Publishing LLC., 2020).

McIntosh, K., Hirsch, M. & Bloom, A. Coronavirus disease 2019 (COVID-19): Epidemiology, virology, and prevention. Uptodate (2020). Preprint at < y-virology-clinical-features-diagnosis-and-prevention#H3103904400>

Batista, G., Prati, R. & Monard, M. A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter 6, 20-29 (2004). DOI:

Moons, K. et al. Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): Explanation and Elaboration. Annals of Internal Medicine 162, W1-W73 (2015).DOI:

Shah, P., Kendall, F., Khozin, S. et al. Artificial intelligence and machine learning in clinical development: a translational perspective. npj Digit. Med. 2, 69 (2019). DOI:

Finding a role for AI in the pandemic. Nat Mach Intell 2, 291 (2020). DOI:

Batista, A., Miraglia, J., Donato, T. & Chiavegatto Filho, A. COVID-19 diagnosis prediction in emergency care patients: a machine learning approach. medRxiv (2020). DOI:

Ribeiro, M., Singh, S. & Guestrin, C. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. Association for Computing Machinery 1135–1144 (2016). DOI:

Lundberg, S. & Lee, S. A Unified Approach to Interpreting Model Predictions. 31st Conference on Neural Information Processing Systems NIPS (2017). Preprint at < 767-Abstract.html>

Couzin-Frankel, J. The mystery of the pandemic’s ‘happy hypoxia’. Science 368, 455-456 (2020). DOI: