Optimization and Validation of Artificial Intelligence Models in Cardiovascular Disease Diagnosis
Abstract
In today's world, cardiovascular diseases are recognized as one of the leading causes of global mortality. Early diagnosis of these conditions using machine learning techniques can play a vital role in reducing risk and improving treatment quality. This article examines and compares standard methods for predicting heart disease based on the UCI Heart Disease dataset, which includes 920 records and 16 features. Baseline methods such as Random Forest, without any advanced feature engineering, achieve an accuracy of around 75%. In contrast, the proposed approach, by incorporating newly engineered features such as a composite risk index, age grouping, the heart rate-to-age ratio, and BMI estimation, and by optimizing the model using GridSearchCV and an automated pipeline, achieves over 85% accuracy. These innovations not only reveal hidden patterns in the data but also reduce model uncertainty through permutation importance and cross-validation. The results show a 10% improvement in F1-score and a significant reduction in false negatives. Ultimately, it is recommended that similar innovations be applied to other heart-disease-related datasets to help develop more accurate and reliable clinical decision-support systems.
Keywords:
Heart disease prediction, Feature engineering, Machine learning, Model optimization, Uncertainty reductionReferences
- [1] Hagan, R., Gillan, C. J., & Mallett, F. (2021). Comparison of machine learning methods for the classification of cardiovascular disease. Informatics in medicine unlocked, 24, 100606. https://doi.org/10.1016/j.imu.2021.100606
- [2] Dayana, K., Nandini, S., & Varshini, R. S. (2024). Comparative study of machine learning algorithms in detecting cardiovascular diseases. https://doi.org/10.48550/arXiv.2405.17059
- [3] Dimopoulos, A. C., Nikolaidou, M., Caballero, F. F., Engchuan, W., Sanchez-Niubo, A., Arndt, H., … ., & Panagiotakos, D. B. (2018). Machine learning methodologies versus cardiovascular risk scores, in predicting disease risk. BMC medical research methodology, 18(1), 179. https://doi.org/10.1186/s12874-018-0644-1
- [4] Suliman, A., Masud, M., Serhani, M. A., Abdullahi, A. S., & Oulhaj, A. (2024). Predictive performance of machine learning compared to statistical methods in time-to-event analysis of cardiovascular disease: A systematic review protocol. BMJ open, 14(4), 1–5. https://doi.org/10.1136/bmjopen-2023-082654
- [5] Hemmati, A., Kaveh, F., Abolghasemian, M., & Pourghader Chobar, A. (2024). Simulating the line balance to provide an improvement plan for optimal production and costing in petrochemical industries. Engineering management and soft computing, 10(1), 190–212. (In Persian). https://doi.org/10.22091/jemsc.2024.11189.1198
- [6] Abolghasemian, M., Kheiri, A. O., & Saberifard, N. (2024). Prioritizing factors affecting the flexibility and performance of the digital supply chain system in the Iranian Food Industry. System engineering and productivity, 4(1), 68–93. (In Persian). https://doi.org/10.22034/msb.2024.2025240.1194
- [7] Hasanpour, J. Z. S., Hassannayebi, E., Abolghasemian, M. (2024). Optimization models for vehicle routing problems with simultaneous delivery and pickup under time window constraints. Operations research in its applications, 21(2), 35–55. (In Persian). https://www.sid.ir/paper/1170760/en
- [8] Edalatpanah, S. A., Hassani, F. S., Smarandache, F., Sorourkhah, A., Pamucar, D., & Cui, B. (2024). A hybrid time series forecasting method based on neutrosophic logic with applications in financial issues. Engineering applications of artificial intelligence, 129, 107531. https://doi.org/10.1016/j.engappai.2023.107531
- [9] Qiu, P., Sorourkhah, A., Kausar, N., Cagin, T., & Edalatpanah, S. A. (2023). Simplifying the complexity in the problem of choosing the best private-sector partner. Systems, 11(2), 1-12. https://doi.org/10.3390/systems11020080
- [10] Li, X., Zhang, Y., Sorourkhah, A., & Edalatpanah, S. A. (2024). Introducing antifragility analysis algorithm for assessing digitalization strategies of the agricultural economy in the small farming section. Journal of the knowledge economy, 15(3), 12191–12215. https://doi.org/10.1007/s13132-023-01558-5
- [11] Mehrabi, M., Sorourkhah, A., Edalatpanah, S. A. (2023). Decision-making regarding the granting of facilities to Sepah Bank loan applicants based on credit risk factors considering hesitant fuzzy sets. Financial and banking strategic studies, 1(3), 153–166. (In Persian). https://doi.org/10.22105/fbs.2023.181500
- [12] Ogunpola, A., Saeed, F., Basurra, S., Albarrak, A. M., & Qasem, S. N. (2024). Machine learning-based predictive models for detection of cardiovascular diseases. Diagnostics, 14(2), 144. https://doi.org/10.3390/diagnostics14020144
- [13] Sreeja, M. U., Philip, A. O., & Supriya, M. H. (2024). Towards explainability in artificial intelligence frameworks for heartcare: A comprehensive survey. Journal of king saud university - computer and information sciences, 36(6), 102096. https://doi.org/10.1016/j.jksuci.2024.102096
- [14] Omkari, D. Y., & Shaik, K. (2024). An integrated two-layered voting (TLV) framework for coronary artery disease prediction using machine learning classifiers. IEEE access, 12, 56275–56290. https://doi.org/10.1109/ACCESS.2024.3389707
- [15] Ahamed, J., Mir, R. N., & Chishti, M. A. (2022). Industry 4.0 oriented predictive analytics of cardiovascular diseases using machine learning, hyperparameter tuning and ensemble techniques. Industrial robot: The international journal of robotics research and application, 49(3), 544–554. https://doi.org/10.1108/IR-10-2021-0240

