A Combination of EDA, Machine Learning, and Artificial Neural Networks for Accurate Prediction of Heart Disease

Authors

https://doi.org/10.22105/ahse.v2i1.29

Abstract

Cardiovascular diseases cause millions of deaths each year, with cases continuing to rise, making early prediction increasingly important. Although data science and Artificial Intelligence (AI) have been utilized to address this issue, further studies that enhance predictability and generalization are crucial, as they significantly reduce mortality rates and healthcare costs. This study employs Exploratory Data Analysis (EDA), a variety of conventional Machine Learning (ML) algorithms, and an Artificial Neural Network (ANN) to predict heart disease accurately and fill research gaps. A dataset from Kaggle, containing 1025 training samples and 303 test samples, with 14 attributes, including 13 predictive variables and a binary target indicating heart disease presence, was used. Normalization, feature importance analysis, K-fold cross-validation, and grid search were meticulously applied to improve model performance, generalization, and robustness. These methodologies led to impressive results, with most models achieving 100% accuracy, precision, recall, and F1-score on the test data, without signs of overfitting, data leakage, or bias. Principal Component Analysis (PCA) was also conducted to evaluate the richness of the features and their potential for dimension reduction. Lastly, in-depth discussions were made to clarify the study’s outcomes, compare results with the most related studies, and comprehensively examine real-world applicability.

Keywords:

Heart disease, Exploratory data analysis, Machine learning, Artificial intelligence, Healthcare

References

  1. [1] Mohan, S., Thirumalai, C., & Srivastava, G. (2019). Effective heart disease prediction using hybrid machine learning techniques. IEEE Access, 7, 81542–81554. https://doi.org/10.1109/ACCESS.2019.2923707

  2. [2] Al-Alshaikh, H. A., P., P., Poonia, R. C., Saudagar, A. K. J., Yadav, M., AlSagri, H. S., & AlSanad, A. A. (2024). Comprehensive evaluation and performance analysis of machine learning in heart disease prediction. Scientific reports, 14(1), 7819. https://doi.org/10.1038/s41598-024-58489-7

  3. [3] Baashar, Y., Alkawsi, G., Alhussian, H., Capretz, L. F., Alwadain, A., Alkahtani, A. A., & Almomani, M. (2022). Effectiveness of artificial intelligence models for cardiovascular disease prediction: Network meta‐analysis. Computational intelligence and neuroscience, 2022(1), 5849995. https://doi.org/10.1155/2022/5849995

  4. [4] Baghdadi, N. A., Farghaly Abdelaliem, S. M., Malki, A., Gad, I., Ewis, A., & Atlam, E. (2023). Advanced machine learning techniques for cardiovascular disease early detection and diagnosis. Journal of big data, 10(1), 144. https://doi.org/10.1186/s40537-023-00817-1

  5. [5] Babu, S. V., Ramya, P., & Gracewell, J. (2024). Revolutionizing heart disease prediction with quantum-enhanced machine learning. Scientific reports, 14(1), 7453. https://www.nature.com/articles/s41598-024-55991-w

  6. [6] Shah, D., Patel, S., & Bharti, S. K. (2020). Heart disease prediction using machine learning techniques. Springer nature computer science, 1(6), 345. https://doi.org/10.1007/s42979-020-00365-y

  7. [7] Bharti, R., Khamparia, A., Shabaz, M., Dhiman, G., Pande, S., & Singh, P. (2021). Prediction of heart disease using a combination of machine learning and deep learning. Computational intelligence and neuroscience, 2021(1), 8387680. https://doi.org/10.1155/2021/8387680

  8. [8] Rokhva, S., Teimourpour, B., & Soltani, A. H. (2024). Computer vision in the food industry: Accurate, real-time, and automatic food recognition with pretrained MobileNetV2. Food and humanity, 3, 100378. https://doi.org/10.1016/j.foohum.2024.100378

  9. [9] Bhavekar, G. S., Das Goswami, A., Vasantrao, C. P., Gaikwad, A. K., Zade, A. V., & Vyawahare, H. (2024). Heart disease prediction using machine learning, deep Learning, and optimization techniques, semantic review. Multimedia tools and applications, 83(39), 86895–86922. https://doi.org/10.1007/s11042-024-19680-0

  10. [10] Rokhva, S., Teimourpour, B., & Soltani, A. H. (2024). AI in the food industry: Utilizing EfficientNet B7 & transfer learning for accurate and real-time food recognition. https://dx.doi.org/10.2139/ssrn.4903767

  11. [11] Talaei Khoei, T., Ould Slimane, H., & Kaabouch, N. (2023). Deep learning: Systematic review, models, challenges, and research directions. Neural computing and applications, 35(31), 23103–23124. https://doi.org/10.1007/s00521-023-08957-4

  12. [12] Alijani, S., Fayyad, J., & Najjaran, H. (2024). Vision transformers in domain adaptation and domain generalization: a study of robustness. Neural computing and applications, 36(29), 17979–18007. https://doi.org/10.1007/s00521-024-10353-5

  13. [13] Freiesleben, T., & Grote, T. (2023). Beyond generalization: A theory of robustness in machine learning. Synthese, 202(4), 109. https://doi.org/10.1007/s11229-023-04334-9

  14. [14] Ali, M. M., Paul, B. K., Ahmed, K., Bui, F. M., Quinn, J. M. W., & Moni, M. A. (2021). Heart disease prediction using supervised machine learning algorithms: Performance analysis and comparison. Computers in biology and medicine, 136, 104672. https://doi.org/10.1016/j.compbiomed.2021.104672

  15. [15] Zhou, C., Dai, P., Hou, A., Zhang, Z., Liu, L., Li, A., & Wang, F. (2024). A comprehensive review of deep learning-based models for heart disease prediction. Artificial intelligence review, 57(10), 263. https://doi.org/10.1007/s10462-024-10899-9

  16. [16] Ogundepo, E. A., & Yahya, W. B. (2023). Performance analysis of supervised classification models on heart disease prediction. Innovations in systems and software engineering, 19(1), 129–144. https://doi.org/10.1007/s11334-022-00524-9

  17. [17] kaggle. (2024). Heart disease prediction using RFC and LR 100%. https://www.kaggle.com/code/devbatrax/heart-disease-prediction-using-rfc-and-lr-100

  18. [18] Sepehri, M. M., Naderi, S., & Naderi, M. (2017). Through improving the model for patients with chest pain in the heart emergency department. Payavard-salamt, 11(2), 235-246. (In Persian). http://payavard.tums.ac.ir/article-1-6240-en.html

  19. [19] Farki, A., & Noughabi, E. A. (2023, May). Real-time blood pressure prediction using apache spark and kafka machine learning. In 2023, the 9th international conference on web research (ICWR) (pp. 161-166). IEEE. https://doi.org/10.1109/ICWR57742.2023.10138962

  20. [20] Salehi, A., Aghdasi, M., Khatibi, T., & SheikhMohammadI, M. (2023). Data quality in process mining: A systematic review. Sciences and techniques of information management, 9(3), 103–160. https://doi.org/10.22091/stim.2022.7800.1737

  21. [21] Mamdouh Farghaly, H., & Abd El-Hafeez, T. (2023). A high-quality feature selection method based on frequent and correlated items for text classification. Soft computing, 27(16), 11259–11274. https://doi.org/10.1007/s00500-023-08587-x

  22. [22] Hall, M. A. (1999). Correlation-based feature selection for machine learning. https://hdl.handle.net/10289/15043

  23. [23] Woodman, R. J., & Mangoni, A. A. (2023). A comprehensive review of machine learning algorithms and their application in geriatric medicine: Present and future. Aging clinical and experimental research, 35(11), 2363–2397. https://doi.org/10.1007/s40520-023-02552-2

  24. [24] Géron, A. (2022). Hands-on machine learning with scikit-learn, keras, and tensorflow. O’Reilly Media. https://books.google.com/books?id=X5ySEAAAQBAJ

  25. [25] kaggle. (2024). Heart disease predictions. https://kaggle.com/code/desalegngeb/heart-disease-predictions

Published

2025-03-10

How to Cite

Rashedi Gazari, A., Rokhva, S. ., Khatibi, T. ., Akhondzade Noughabi, E. ., & Teimourpour, B. . (2025). A Combination of EDA, Machine Learning, and Artificial Neural Networks for Accurate Prediction of Heart Disease. Annals of Healthcare Systems Engineering, 2(1), 38-46. https://doi.org/10.22105/ahse.v2i1.29

Similar Articles

1-10 of 13

You may also start an advanced similarity search for this article.