A Combination of EDA, Machine Learning, and Artificial Neural Networks for Accurate Prediction of Heart Disease
Abstract
Cardiovascular diseases cause millions of deaths each year, with cases continuing to rise, making early prediction increasingly important. Although data science and Artificial Intelligence (AI) have been utilized to address this issue, further studies that enhance predictability and generalization are crucial, as they significantly reduce mortality rates and healthcare costs. This study employs Exploratory Data Analysis (EDA), a variety of conventional Machine Learning (ML) algorithms, and an Artificial Neural Network (ANN) to predict heart disease accurately and fill research gaps. A dataset from Kaggle, containing 1025 training samples and 303 test samples, with 14 attributes, including 13 predictive variables and a binary target indicating heart disease presence, was used. Normalization, feature importance analysis, K-fold cross-validation, and grid search were meticulously applied to improve model performance, generalization, and robustness. These methodologies led to impressive results, with most models achieving 100% accuracy, precision, recall, and F1-score on the test data, without signs of overfitting, data leakage, or bias. Principal Component Analysis (PCA) was also conducted to evaluate the richness of the features and their potential for dimension reduction. Lastly, in-depth discussions were made to clarify the study’s outcomes, compare results with the most related studies, and comprehensively examine real-world applicability.
Keywords:
Heart disease, Exploratory data analysis, Machine learning, Artificial intelligence, HealthcareReferences
- [1] Mohan, S., Thirumalai, C., & Srivastava, G. (2019). Effective heart disease prediction using hybrid machine learning techniques. IEEE Access, 7, 81542–81554. https://doi.org/10.1109/ACCESS.2019.2923707
- [2] Al-Alshaikh, H. A., P., P., Poonia, R. C., Saudagar, A. K. J., Yadav, M., AlSagri, H. S., & AlSanad, A. A. (2024). Comprehensive evaluation and performance analysis of machine learning in heart disease prediction. Scientific reports, 14(1), 7819. https://doi.org/10.1038/s41598-024-58489-7
- [3] Baashar, Y., Alkawsi, G., Alhussian, H., Capretz, L. F., Alwadain, A., Alkahtani, A. A., & Almomani, M. (2022). Effectiveness of artificial intelligence models for cardiovascular disease prediction: Network meta‐analysis. Computational intelligence and neuroscience, 2022(1), 5849995. https://doi.org/10.1155/2022/5849995
- [4] Baghdadi, N. A., Farghaly Abdelaliem, S. M., Malki, A., Gad, I., Ewis, A., & Atlam, E. (2023). Advanced machine learning techniques for cardiovascular disease early detection and diagnosis. Journal of big data, 10(1), 144. https://doi.org/10.1186/s40537-023-00817-1
- [5] Babu, S. V., Ramya, P., & Gracewell, J. (2024). Revolutionizing heart disease prediction with quantum-enhanced machine learning. Scientific reports, 14(1), 7453. https://www.nature.com/articles/s41598-024-55991-w
- [6] Shah, D., Patel, S., & Bharti, S. K. (2020). Heart disease prediction using machine learning techniques. Springer nature computer science, 1(6), 345. https://doi.org/10.1007/s42979-020-00365-y
- [7] Bharti, R., Khamparia, A., Shabaz, M., Dhiman, G., Pande, S., & Singh, P. (2021). Prediction of heart disease using a combination of machine learning and deep learning. Computational intelligence and neuroscience, 2021(1), 8387680. https://doi.org/10.1155/2021/8387680
- [8] Rokhva, S., Teimourpour, B., & Soltani, A. H. (2024). Computer vision in the food industry: Accurate, real-time, and automatic food recognition with pretrained MobileNetV2. Food and humanity, 3, 100378. https://doi.org/10.1016/j.foohum.2024.100378
- [9] Bhavekar, G. S., Das Goswami, A., Vasantrao, C. P., Gaikwad, A. K., Zade, A. V., & Vyawahare, H. (2024). Heart disease prediction using machine learning, deep Learning, and optimization techniques, semantic review. Multimedia tools and applications, 83(39), 86895–86922. https://doi.org/10.1007/s11042-024-19680-0
- [10] Rokhva, S., Teimourpour, B., & Soltani, A. H. (2024). AI in the food industry: Utilizing EfficientNet B7 & transfer learning for accurate and real-time food recognition. https://dx.doi.org/10.2139/ssrn.4903767
- [11] Talaei Khoei, T., Ould Slimane, H., & Kaabouch, N. (2023). Deep learning: Systematic review, models, challenges, and research directions. Neural computing and applications, 35(31), 23103–23124. https://doi.org/10.1007/s00521-023-08957-4
- [12] Alijani, S., Fayyad, J., & Najjaran, H. (2024). Vision transformers in domain adaptation and domain generalization: a study of robustness. Neural computing and applications, 36(29), 17979–18007. https://doi.org/10.1007/s00521-024-10353-5
- [13] Freiesleben, T., & Grote, T. (2023). Beyond generalization: A theory of robustness in machine learning. Synthese, 202(4), 109. https://doi.org/10.1007/s11229-023-04334-9
- [14] Ali, M. M., Paul, B. K., Ahmed, K., Bui, F. M., Quinn, J. M. W., & Moni, M. A. (2021). Heart disease prediction using supervised machine learning algorithms: Performance analysis and comparison. Computers in biology and medicine, 136, 104672. https://doi.org/10.1016/j.compbiomed.2021.104672
- [15] Zhou, C., Dai, P., Hou, A., Zhang, Z., Liu, L., Li, A., & Wang, F. (2024). A comprehensive review of deep learning-based models for heart disease prediction. Artificial intelligence review, 57(10), 263. https://doi.org/10.1007/s10462-024-10899-9
- [16] Ogundepo, E. A., & Yahya, W. B. (2023). Performance analysis of supervised classification models on heart disease prediction. Innovations in systems and software engineering, 19(1), 129–144. https://doi.org/10.1007/s11334-022-00524-9
- [17] kaggle. (2024). Heart disease prediction using RFC and LR 100%. https://www.kaggle.com/code/devbatrax/heart-disease-prediction-using-rfc-and-lr-100
- [18] Sepehri, M. M., Naderi, S., & Naderi, M. (2017). Through improving the model for patients with chest pain in the heart emergency department. Payavard-salamt, 11(2), 235-246. (In Persian). http://payavard.tums.ac.ir/article-1-6240-en.html
- [19] Farki, A., & Noughabi, E. A. (2023, May). Real-time blood pressure prediction using apache spark and kafka machine learning. In 2023, the 9th international conference on web research (ICWR) (pp. 161-166). IEEE. https://doi.org/10.1109/ICWR57742.2023.10138962
- [20] Salehi, A., Aghdasi, M., Khatibi, T., & SheikhMohammadI, M. (2023). Data quality in process mining: A systematic review. Sciences and techniques of information management, 9(3), 103–160. https://doi.org/10.22091/stim.2022.7800.1737
- [21] Mamdouh Farghaly, H., & Abd El-Hafeez, T. (2023). A high-quality feature selection method based on frequent and correlated items for text classification. Soft computing, 27(16), 11259–11274. https://doi.org/10.1007/s00500-023-08587-x
- [22] Hall, M. A. (1999). Correlation-based feature selection for machine learning. https://hdl.handle.net/10289/15043
- [23] Woodman, R. J., & Mangoni, A. A. (2023). A comprehensive review of machine learning algorithms and their application in geriatric medicine: Present and future. Aging clinical and experimental research, 35(11), 2363–2397. https://doi.org/10.1007/s40520-023-02552-2
- [24] Géron, A. (2022). Hands-on machine learning with scikit-learn, keras, and tensorflow. O’Reilly Media. https://books.google.com/books?id=X5ySEAAAQBAJ
- [25] kaggle. (2024). Heart disease predictions. https://kaggle.com/code/desalegngeb/heart-disease-predictions