Use Machine Learning Models to Predicting a Stroke

Machine Learning

Project Overview

We built a stroke prediction model based on the existing physical data of stroke patients and healthy people. Through this model, the risk of a person's stroke can be predicted, so that the person can change their living habits in time to avoid stroke or tell people ahead of time to prepare them for a stroke to reduce mortality.

My Contributions

I employed two distinct methods to handle missing data in the dataset. Additionally, I generated count plots for each parameter and utilized scatter plots to compare various parameters. To handle categorical columns, I employed the OneHotEncoder() technique for encoding. Furthermore, I trained diverse prediction models, including RandomForest and XGBoost, to attain accurate results.

According to the World Health Organization (WHO), stroke is the 2nd leading cause of death globally, responsible for approximately 11% of total deaths. In this notebook, we attempted to visualize some key indicators that lead to strokes. Here data is sampled from a wide range of age groups, gender, habits, and health-related issues. Most of the visualizations are self-explanatory and try to stick to simple visualization but effective methods to convey most of the information.
‍
After the multiple visualizations of our and going through all the performances of the models. We tune the hyperparameters with the help of GridSearch to get models. After that, We came to the conclusion that RandomForestClassifier is the best model for this dataset.

Background & Conclusion

Dec 2022

These plots serve as representative visuals that effectively convey key insights throughout the prediction process:

Want to work together?

If you like what you see and want to work together, get in touch!

jin.xiaoya@northeastern.edu