Customer Churn Prediction

Predicting bank customer attrition using machine learning on 10,000 European credit card customers

← Back to Portfolio

Executive Summary

Business Problem: A bank experiences 20.37% annual customer churn, resulting in significant revenue loss. This project develops a machine learning model to identify at-risk customers before they leave.

87% ROC-AUC Score
86.9% Model Accuracy
20.37% Overall Churn Rate

The Challenge

Banks invest heavily in customer acquisition, but retaining existing customers is more cost-effective. The dataset revealed:

Customer Churn Distribution

Customer Churn Distribution: 79.63% stayed vs 20.37% churned

Key Findings

Finding #1: Customers aged 50+ have a 44.6% churn rate - more than double the overall average of 20.37%.
Finding #2: Female customers churn at 25.1% compared to 16.5% for males - a significant 8.6 percentage point gender gap requiring investigation.
Finding #3: The Gradient Boosting model achieved 87% ROC-AUC score, successfully identifying 87% of at-risk customers for targeted retention.
Age Distribution by Churn Status

Age Distribution: Customers aged 40-50 show highest churn rates (red bars)

Technical Approach

1. Data Preprocessing

2. Feature Engineering

# Created new features for better prediction: - BalanceToSalary: Ratio of account balance to estimated salary - TenureAgeRatio: Relationship between tenure and age - IsZeroBalance: Binary flag for zero balance accounts - AgeGroup: Categorical age brackets (Young, Middle, Senior, Elderly)

3. Model Selection & Comparison

Tested three machine learning algorithms:

ROC Curves Comparison

ROC Curves: Gradient Boosting (green) achieves highest AUC of 0.870

4. Model Evaluation

Used ROC-AUC instead of accuracy because:

Results & Impact

Model Performance

Top Predictive Features (from Gradient Boosting)

  1. Age - Most important predictor (36.5% importance)
  2. Number of Products - Second highest (29.8% importance)
  3. IsActiveMember - Customer engagement indicator (11.4% importance)
  4. Balance - Account balance level (6.3% importance)
  5. Geography (Germany) - Location-based risk (5.7% importance)
Feature Importance Chart

Top 10 Feature Importances: Age (36.5%) and NumOfProducts (29.8%) are strongest predictors

Business Recommendations

Immediate Actions Based on Data:

  1. Target 50+ age group with specialized retention campaigns - this segment shows 44.6% churn rate compared to overall 20.37%
  2. Investigate gender disparity - Female customers churn at 25.1% while males churn at 16.5%. Understanding this 8.6 percentage point gap is critical
  3. Deploy predictive scoring - Use the 87% accurate Gradient Boosting model to score all customers monthly and identify at-risk individuals before they leave
  4. Focus on product optimization - Number of products is the 2nd most important feature (29.8% importance), suggesting product strategy significantly impacts retention
  5. Priority intervention - Female customers aged 50+ combine both highest risk factors and deserve immediate targeted retention programs

Long-term Strategy:

Expected Business Impact:

Challenges & Learning

Key Challenges:

Technical Skills Gained:

Future Improvements

Project Resources

Comments & Feedback

Have questions or feedback about this project? I'd love to hear from you!

← Back to Portfolio