Classification models, which uses data obtained from patients admitted to the ICU for Heart Failure, to predict whether a patient will experience in-hospital mortality.
Overview
The primary objective of this project is to develop a classification model that predicts the likelihood of in-hospital mortality for patients admitted to the Intensive Care Unit for Heart Failure. The project follows the data science workflow, encompassing steps such as data loading, preprocessing, cleaning, exploratory data analysis, feature selection (using LASSO and Random Forest), principal component analysis, modeling, and hypothesis testing.
Various models were explored and evaluated, including Logistic Regression, Decision Tree, and K-Nearest Neighbors. Then, an evaluation was performed to identify the best performing model for this task.
Technical Details:
Tech Stack: Python (pandas, numpy, sklearn, seaborn, statsmodels)
- Data loading and preprocessing in Python Pandas, including categorical feature encoding, data cleaning, and sampling for unbalanced datasets.
- Exploratory Data Analysis (EDA) to gain insight into the data and problem being addressed.
- Feature selection techniques, including Variance Inflation Factor, LASSO, and Random Forest, to reduce overfitting and computational/memory requirements.
- Training and evaluation of models, with the K-Nearest Neighbors Classifier identified as the best performing model.
Results
Please review presentation for screenshots and results: Slide Deck
Want to connect?
Connect with me through LinkedIn, or reach out to me via email or phone number.