Prediction of COVID-19 Using a Clinical Dataset With Machine Learning Approaches.

Authors

  • A. Suruliandi Manonmaniam Sundaranar University.
  • R. Ame Rayan Manonmaniam Sundaranar University.
  • S. P. Raja Vellore Institute of Technology.

DOI:

https://doi.org/10.9781/ijimai.2025.01.003

Keywords:

Blood Samples, Classification, COVID-19 Prediction, Feature Selection, Machine Learning

Abstract

COVID-19 is an infectious disease that spreads quickly from person to another. The pandemic, which spread worldwide over time, presents huge risks in terms of blood clotting, breathing problems and heart attacks, sometimes with fatal consequences if not detected early. The PCR test, CT scans, X-rays, and blood tests are methods commonly employed to detect the disease, though the PCR test is, without question, considered the gold standard. The American Center for Disease Control and Prevention (CDC) reports that the PCR has an 80% accuracy rate. An alternative to the PCR is clinical data, which is less expensive, easy to collect, and offers better accuracy. Machine learning, with its rich feature selection and classification methods, helps detect COVID-19 at the earliest stages, using clinical test results. This research proposes a clinical dataset and offers a comparative analysis of feature selection and classification algorithms for detecting COVID-19. Filter-based feature selection methods such as the ANOVA-F, chi-square, mutual information and Pearson correlation, along with wrapperbased methods such as Recursive Feature Elimination (RFE) and Sequential Forward Selection (SFS) were used to choose a subset of features from the feature set. The selected features were thereafter applied to the Support Vector Machine (SVM), Naïve Bayes, K-NN (K-Nearest Neighbor) and Logistic Regression(LR) classification algorithms to detect Coronavirus Disease. The experimental results of the comparative study show that the clinical dataset provides better accuracy at 94.8%, with mutual information and the SVM classifier.

Downloads

Download data is not yet available.

References

T. Singhal, “A Review of Coronavirus Disease-2019 (COVID-19),” Indian Journal of Pediatrics, vol. 87, no.4, pp.281-286, 2020, doi: 10.1007/s12098-020-03263-6.

S. Mubareka, J. B. Gubbay, W. C. Chan, “Diagnosing COVID-19: the disease and tools for detection,” American Chemical Society Nano, vol. 14 no.4, pp. 3822-35, 2020, doi:10.1021/acsnano.0c02624.

D. Li, D. Wang, J. Dong, N. Wang, H. Huang, H. Xu, C. Xia, “False-negative results of real-time reverse-transcriptase polymerase chain reaction for severe acute respiratory syndrome coronavirus 2: role of deep-learning-based CT diagnosis and insights from two cases,” Korean journal of radiology, vol. 21, no. 4, pp. 505-508, 2020, doi: 10.3348/kjr.2020.0146.

A. Ulhaq, J. Born, A. Khan, D. P. S. Gomes, S. Chakraborty, M. Paul, “COVID-19 Control by Computer Vision Approaches: A Survey,” IEEE Access, vol. 8, pp. 179437-179456, 2020, doi: 10.1109/ACCESS.2020. 3027685.

J. Bao, C. Li, K. Zhang, H. Kang, W.Chen, B. Gu, “Comparative analysis of laboratory indexes of severe and non-severe patients infected with COVID-19,” Clinica Chimica Acta, vol. 509, pp. 180-194, 2020, doi: 10.1016/j.cca.2020.06.009.

B. E. Fan, “Hematologic parameters in patients with COVID-19 infection: a reply,” American journal of hematology vol. 95, no. 6, 2020, doi: 10.1002/ajh.25774.

Y. Gao, T. Li, M. Han, X. Li, D. Wu, Y. Xu, Y. Zhu, Y. Liu, X. Wang, L. Wang, “Diagnostic utility of clinical laboratory data determinations for patients with the severe COVID‐19,” Journal of medical virology, vol. 92, no. 7, pp. 791-796, 2020, doi: 10.1002/jmv.25770.

T. A. Khartabil, H. Russcher, A. Ven, Y. B. Rijke, “A summary of the diagnostic and prognostic value of hemocytometry markers in COVID-19 patients,” Critical reviews in clinical laboratory sciences, vol. 57, no. 6, pp. 415-431, 2020, doi:10.1080/10408363. 2020.1774736.

A. J. Rodriguez-Morales, J. A. Cardona-Ospina, E. Gutiérrez-Ocampo, R.Villamizar-Peña, Y. Holguin-Rivera, J. P. Escalera-Antezana, et al., “Clinical, laboratory and imaging features of COVID-19: A systematic review and meta-analysis,” Travel medicine and infectious disease, vol.34, 2020, doi: 10.1016/j.tmaid.2020.101623.

J. A. Siordia, “Epidemiology and clinical features of COVID-19: A review of current literature,” Journal of Clinical Virology, vol. 127, 2020, doi: 10.1016/j.jcv.2020.104357.

Y. Liu, Y. Yang, C. Zhang, F.Huang, F. Wang, J. Yuan, et al., “Clinical and biochemical indexes from 2019-nCoV infected patients linked to viral loads and lung injury,” Science China Life Sciences., vol. 63, no. 3, pp. 364-74, 2020, doi: 10.1007/s11427-020-1643-8.

L. Muhammad, M. M. Islam, S. S. Usman, S. I. Ayon, “Predictive data mining models for novel coronavirus (COVID-19) infected patients’ recovery,” SN Computer Science, vol. 1, no. 4, pp. 1–7, 2020, doi: 10.1007/s42979-020-00216-w.

J. Wu, P. Zhang, L. Zhang, W. Meng, J. Li, C. Tong, Y. Li, J. Cai, Z. Yang, J. Zhu et al., “Rapid and accurate identification of COVID-19 infection through machine learning based on clinical available blood test results,” MedRxiv, 2020, doi:10.1101/2020.04.02.20051136.

A. Bastug, H. Bodur, S. Erdogan, D. Gokcinar, S. Kazancioglu, B. D. Kosovali, B. O. Ozbay, G. Gok, I. O. Turan, G .Yilmaz, C. C. Gonen, F. M. Yilmaz. “Clinical and laboratory features of COVID-19: Predictors of severe prognosis,” International Immunopharmacology, vol. 88, no. 106950, 2020, doi: 10.1016/j.intimp.2020.106950.

D. Brinati, A. Campagner, D. Ferrari, M. Locatelli, G. Banfi, and F. Cabitza, “Detection of COVID-19 infection from routine blood exams with machine learning: a feasibility study,” Journal of medical systems, vol. 44, no. 8, pp. 1–12, 2020, doi:10.1007/s10916-020-01597-4.

M. Kukar, G. Gunčar, T. Vovko et al., “COVID-19 diagnosis by routine blood tests using machine learning,” Scientific Reports, vol. 11, no. 10738, 2021, doi:10.1038/s41598-021-90265-9.

K. Chadaga, S. Prabhu, K. V. Bhat, S. Umakanth and N. Sampathila,. “Medical diagnosis of COVID-19 using blood tests and machine learning,” Journal of Physics: Conference Series, vol. 2161(1),2022, doi:10.1088/1742-6596/2161/1/012017

M. AlJame, I. Ahmad, A. Imtiaz, A. Mohammed, “Ensemble learning model for diagnosing COVID-19 from routine blood tests,” Informatics in Medicine Unlocked, vol. 1, no. 21, 2020, doi: 10.1016/j.imu.2020.100449

A. F. M. Batista, J.L. Miraglia, T. H. R. Donato, A. D. P. C. Filho, “COVID-19 diagnosis prediction in emergency care patients: A machine learning approach,” medRxiv, 2020, doi: 10.1101/2020.04.04.20052092 [CrossRef].

V. A. F. Barbosa, J. C. Gomes, M. A. Santana, J. E. A. Albuquerque, R. F. Souza, R. E. Souza, W. P. Santos, “Heg.IA: an intelligent system to support diagnosis of COVID-19 based on blood tests,” Research on Biomedical Engineering, vol. 38, no. 1, pp. 99–116, 2022, doi: 10.1007/s42600-020-00112-5.

M. Almansoor and N. M. Hewahi, ”Exploring The Relation Between Blood Tests And Covid-19 Using Machine Learning,” International Conference on Data Analytics for Business and Industry: Way Towards a Sustainable Economy (ICDABI), pp. 1-6,2020, doi: 10.1109/ICDABI51230.2020.9325673.

F. Cabitza, A. Campagner, D. Ferrari, C. Di Resta, D. Ceriotti, E. Sabetta, A. Colombini, E. De Vecchi, G. Banfi, M. Locatelli, A. Carobene, “Development, evaluation, and validation of machine learning models for COVID-19 detection based on routine blood tests,” Clinical Chemistry and Laboratory Medicine, vol. 59, no. 2, pp. 421-431, 2021, doi: 10.1515/cclm-2020-1294.

A. Akhtar, S. Akhtar, B. Bakhtawar, A.A. Kashif, N. Aziz, M. S. Javeid, “COVID-19 Detection from CBC using Machine Learning Techniques,” International Journal of Technology, Innovation and Management, vol. 1, no. 2, 2021, doi:10.54489/ijtim.v1i2.22.

O. O. Abayomi-Alli, R. Damaševičius, R. Maskeliūnas, S. Misra, “An Ensemble Learning Model for COVID-19 Detection from Blood Test Samples,” Sensors (Basel), vol. 22, no. 6, 2022, doi: 10.3390/s22062224.

H. Gong, M. Wang, H. Zhang, M.F. Elahe, M. Jin, “An Explainable AI Approach for the Rapid Diagnosis of COVID-19 Using Ensemble Learning Algorithms,” Front Public Health, vol. 10, no. 874455, 2022, doi:10.3389/fpubh.2022.874455.

P. K. Roy, A. Singh, “COVID-19 Disease Prediction Using Weighted Ensemble Transfer Learning,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 8, no. 1, pp.13-22, 2023, doi:10.9781/ijimai.2023.02.006.

A. Andueza, M. Á. D. Arco-Osuna, B. Fornés, R. González-Crespo, J. M. Martín-Álvarez, “Using the Statistical Machine Learning Models

ARIMA and SARIMA to Measure the Impact of Covid-19 on Official Provincial Sales of Cigarettes in Spain,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 8, no. 1, pp. 73-87, 2023, doi:10.9781/ijimai.2023.02.010.

H. P. Cowley, M. S. Robinette, J. K. Matelsky et al. “Using machine learning on clinical data to identify unexpected patterns in groups of COVID-19 patients,” Scientific Reports, vol. 13, no. 2236, 2023, doi:10.1038/s41598-022-26294-9

J.T. Hancock and T.M. Khoshgoftaar, “Survey on categorical data for neural networks,” Jornal of Big Data, vol. 7, no. 28, pp. 1-41, 2020, doi: 10.1186/s40537-020-00305-w.

W. S. A. Farizi, I. Hidayah, M. N. Rizal, “Isolation Forest Based Anomaly Detection: A Systematic Literature Review,” 2021 8th International Conference on Information Technology, Computer and Electrical Engineering (ICITACEE), Semarang, Indonesia, 2021, pp. 118-122, doi: 10.1109/ICITACEE53184.2021.9617498.

N. Pudjihartono, T. Fadason, A.W. Kempa-Liehr and J.M. O’Sullivan, “A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction,” Frontiers in Bioinformatics, vol. 2, no. 927312, 2022, doi: 10.3389/fbinf.2022.927312.

K. Dissanayake and M. G. Md Johar, “Comparative Study on Heart Disease Prediction Using Feature Selection Techniques on Classification Algorithms,” Applied Computational Intelligence and Soft Computing, vol. 2021, no. 1, 2021. doi:10.1155/2021/5581806

J. Li, K. Cheng, S. Wang, F. Morstatter, R. P. Trevino, J. Tang, H. Liu, “Feature Selection: A Data Perspective,” Association for Computing Machinery, vol. 50, no. 6, pp. 1-45, 2017, doi: 10.1145/3136625.

V. V. Iyer and A. E. Yilmaz, “Using the ANOVA F-Statistic to Isolate Information-Revealing Near-Field Measurement Configurations for Embedded Systems,” 2021 IEEE International Joint EMC/SI/PI and EMC Europe Symposium, Raleigh, NC, USA, pp. 1024-1029, 2021. doi: 10.1109/EMC/SI/PI/EMCEurope52599.2021.9559360.

A. O. Odetunm , O. A. Adejumo, A. T. Anake, “A study of Hepatitis B virus infection using chi-square statistic,” Journal of Physics Conference Series. vol. 1734, no. 01, 2021, doi:10.1088/1742-6596/1734/1/012010.

N. Carrara and J. Ernst, “On the estimation of mutual information,” Proceedings of The 39th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering,vol. 33, no.1, 2020, doi:10.3390/proceedings2019033031.

I. M. Nasir, M. A. Khan, M. Yasmin, J. H. Shah, M. Gabryel, R. Scherer, R. Damaševičius, “Pearson Correlation-Based Feature Selection for Document Classification Using Balanced Training”, Sensors , vol. 20, no. 23 pp. 1-18, 2020, doi : 10.3390/s20236793.

F. Saberi-Movahed, M. Mohammadifard, A. Mehrpooya, M. Rezaei-Ravari, K. Berahmand, M. Rostami, S. Karami, et al., “Decoding Clinical Biomarker Space of COVID-19: Exploring Matrix Factorization-based Feature Selection Methods,” medRxiv [Preprint], 2021, doi: 10.1101/2021.07.07.21259699.

C. A. Ramezan, “Transferability of Recursive Feature Elimination(RFE)-Derived Feature Sets for Support Vector Machine Land Cover Classification,” Remote Sensing, vol. 14, no. 24, 2022, doi: 10.3390/rs14246218.

C. Zhang, Y. Yi, L. Wang, X. Zhang, S. Chen, Z. Su, S. Zhang, Y. Xue, “Estimation of the Bio-Parameters of Winter Wheat by Combining Feature Selection with Machine Learning Using Multi-Temporal Unmanned Aerial Vehicle Multispectral Images,” Remote Sensing, vol. 16, no. 3, pp. 1-22, 2024, doi:10.3390/rs16030469

A. Suruliandi, K. Ranjini, S. P. Raja, “Balancing Assisted Reproductive Technology Dataset for Improving the Efficiency of Incremental Classifiers and Feature Selection Techniques,” Journal of Circuits, Systems, and Computers, World Scientific, vol. 30, no. 06, 2130007,2021, doi:10.1142/S0218126621300075

I. H. Sarker, “Machine Learning: Algorithms, Real-World Applications and Research Directions,” SN Computer Science, vol. 2, no. 160, 2021, doi:10.1007/s42979-021-00592-x.

J. Cervantes, F. Garcia-Lamont, L. Rodríguez-Mazahua, A. Lopez, “A comprehensive survey on support vector machine classification: Applications, challenges and trends,” Neurocomputing, vol. 408, pp. 189-215, 2020, doi: 10.1016/j.neucom.2019.10.118.

W. G. Gadallah, N. M. Omar and H. M. Ibrahim, “Machine Learning-based Distributed Denial of Service Attacks Detection Technique using New Features in Software-defined Networks,” International Journal of Computer Network and Information Security, vol. 3, pp. 15-27, 2021, doi:10.5815/ijcnis.2021.03.02.

C. N. Villavicencio, J. J. E. Macrohon, X. A. Inbaraj, J. H. Jeng, J. G. Hsieh, “COVID-19 Prediction Applying Supervised Machine Learning Algorithms with Comparative Analysis Using WEKA,” Algorithms, vol.14, no. 7, 2021, doi:10.3390/a14070201.

B. Mahesh, “Machine learning algorithms-a review,” International Journal of Science and Research (IJSR).[Internet], vol. 9, no. 1 pp.381-386, 2020.

M. Rohini, K. R. Naveena, G. Jothipriya, S. Kameshwaran, M. Jagadeeswari, ”A Comparative Approach to Predict Corona Virus Using Machine Learning,” Proceedings of the International Conference on Artificial Intelligence and Smart Systems, Coimbatore, India, 2021, pp. 331-337, doi: 10.1109/ICAIS50930.2021.9395827.

T. Rymarczyk, E. Kozłowski, G. Kłosowski, K. Niderla, “Logistic Regression for Machine Learning in Process Tomography,” Sensors, vol. 19, no. 15, 2019, doi:10.3390/s19153400.

S. A. Hicks, I. Strümke, V. Thambawita, M. Hammou, M. A. Riegler, P. Halvorsen, S. Parasa, “On evaluation metrics for medical applications of artificial intelligence,” Scientific Reports, vol. 12, no. 1. 2022, doi: 10.1038/s41598-022-09954-8.

R. A. Rayan, A. Suruliandi, S.P. Raja, H. B. F. David, “A survey on an analysis of big data open source datasets, techniques and tools for the prediction of corona virus disease,” Journal of Circuits, Systems and Computers, vol. 32, no. 12, 2023, doi:10.1142/S0218126623300039.

J. White, S. D. Power, “k-Fold Cross-Validation Can Significantly Over-Estimate True Classification Accuracy in Common EEG-Based Passive BCI Experimental Designs: An Empirical Investigation,” Sensors (Basel), vol. 23, no. 13, 2023, doi: 10.3390/s23136077

Downloads

Published

2025-08-29
Metrics
Views/Downloads
  • Abstract
    378
  • PDF
    73

How to Cite

Suruliandi, A., Ame Rayan, R., and Raja, S. P. (2025). Prediction of COVID-19 Using a Clinical Dataset With Machine Learning Approaches. International Journal of Interactive Multimedia and Artificial Intelligence, 9(4), 82–98. https://doi.org/10.9781/ijimai.2025.01.003