๐ฆ Data Science & Analytics Internship Task 3 | ๐ Customer Churn Prediction โ Decoding Why Bank Customers Leave
Welcome to my Customer Churn Prediction Project! ๐๐ This project dives deep into the world of banking, customer behavior, and predictive intelligence โ where data reveals the unseen patterns behind customer loyalty and attrition.
In todayโs hyper-competitive financial landscape, retaining customers is far more impactful than acquiring new ones. Banks thrive when customers stay โ and suffer when they silently walk away. ๐ฃ๐ฆ This project transforms raw, real-world banking data into strategic intelligence. Through statistical analysis, machine learning, and visual storytelling, I uncover why customers leave, who is at risk, and how banks can prevent churn before it happens. Just like detective work, churn prediction reveals the hidden signals inside customer behavior โ turning data into decisions, and decisions into retention power. ๐ก๐
The Customer Churn Prediction Project is an end-to-end machine learning initiative where I explore, preprocess, analyze, model, and interpret customer data from a real bank โ with the goal of predicting which customers are likely to exit and why. From encoding features to training ML models and visualizing patterns, this project showcases the full workflow of predictive analytics applied to financial data.๐ป๐
The dataset used is the widely respected Churn_Modelling Dataset, representing real customer profiles from a global bank.
- Total Records: ~10,000
- Total Features: 14
- Target Variable: Exited (0 = Stayed, 1 = Left)
- ๐ฏ Credit Score
- ๐ Geography
- ๐ค Gender
- ๐ Age
- ๐ Tenure
- ๐ฐ Balance
- ๐ฆ Number of Products
- ๐ณ Has Credit Card
- ๐ Is Active Member
- ๐งพ Estimated Salary
This dataset is ideal for understanding how demographics, financial health, and product engagement influence whether a customer stays loyal or decides to churn.
Before building powerful ML models, the raw data undergoes careful preparation to ensure accuracy, reliability, and fairness.
- Checked for missing values & duplicates
- Dropped irrelevant IDs
- Encoded categorical features (Geography, Gender)
- Standardized numerical features
- Created train-test split
- Explored distributions & patterns through EDA
Proper preprocessing ensures that the model learns from clean, unbiased, well-structured data โ enhancing prediction quality and interpretability.
Visualization breathes life into data โ and in this project, dark-themed charts illuminate hidden patterns behind customer churn. ๐โจ
- ๐ Key Visual Insights (20+ Visuals Created)
- ๐ Churn Distribution โ Understanding the imbalance
- ๐ Age vs Churn โ Which age groups leave most?
- ๐ Geography vs Churn โ Regions with higher attrition
- ๐ค Gender Patterns โ Comparative churn behavior
- ๐ณ Active Member Status โ Engagement vs loyalty
- ๐ฐ Balance Distribution โ Does money influence churn?
- ๐ฆ Products Held vs Churn โ The loyalty power of product bundles
- ๐ฏ Credit Score Analysis โ Risk profiles
- ๐ฑ Salary vs Churn โ Income dynamics
- ๐งฒ Correlation Heatmap โ How numerical features relate
- ๐ฆ Customer Tenure Trends โ Experience vs loyalty
- ๐ฆ Confusion Matrix Heatmap โ Model performance
- ๐ ROC Curve โ Prediction strength
- ๐ Feature Importance Bar Plot โ What drives churn
- ๐ฎ Probability Distribution of Predictions
- ๐ Pairwise Relationships (Pairplot)
- ๐ Box Plots, Count Plots, Histograms & Violin Charts
- ๐ Model Comparison Charts
Visualization reveals behavioral clues โ showing that age, geography, credit score, and activity level play major roles in churn behavior.
This project implements multiple classification models to predict churn:
- Logistic Regression
- Random Forest Classifier
- XGBoost Classifier
- Decision Tree
- Support Vector Machine
- K-Nearest Neighbors
- โ Accuracy
- โ Precision
- โ Recall
- โ F1 Score
- โ ROC-AUC
- โ Confusion Matrix
Tree-based models (Random Forest & XGBoost) emerged as top performers โ offering high interpretability and strong predictive power.
- ๐บ Customers aged 40โ60 show significantly higher churn.
- ๐บ Customers from Germany churn more than other regions.
- ๐บ Inactive customers have a much higher probability of leaving.
- ๐บ Lower credit score customers are more likely to churn.
- ๐บ Users with only 1 product churn more, showing reduced loyalty.
- ๐บ Higher balance does not necessarily mean higher retention.
Churn is influenced by a combination of financial, behavioral, and demographic attributes โ making predictive modeling essential for proactive retention strategies.
- Python
- Pandas, NumPy
- Matplotlib
- Seaborn
- Scikit-learn
- XGBoos
- Imbalanced-learn (if needed)
From data cleaning to ML modeling and visualization, the project follows a structured, professional data science pipeline.
This Customer Churn Prediction Project highlights the transformative power of data analytics in the banking sector. By predicting churn before it happens, banks can:
- Strengthen customer relationships
- Build loyalty programs
- Increase revenue retention
- Offer personalized services This project is not just about prediction โ itโs about understanding human behavior, financial patterns, and the strategies that help businesses connect better with customers. ๐๐๐
Every customer carries a unique story โ and churn prediction helps banks listen to those stories before they lose valuable relationships. Data doesnโt just inform decisions; it empowers businesses to evolve.
๐ โRetention begins with understanding โ and understanding begins with data.โ




















