Customer Churn

Understanding Customer Attrition in Banking

Understanding and predicting customer churn is pivotal for banks in proactively managing customer relationships, improving satisfaction, and reducing revenue loss. By identifying customers at higher risk of churning, banks can implement targeted retention strategies, personalized marketing, and enhanced customer service to foster loyalty.

The customer churn rate serves as a crucial business metric for banks, indicating the rate at which customers disengage from banking services. This metric is significant because the cost of retaining existing customers is notably lower than acquiring new ones. Consequently, prioritizing customer retention not only minimizes revenue loss but also fuels profit growth. In essence, effective management of customer churn becomes a linchpin in achieving sustained customer satisfaction and financial success for banks.

Exploratory Analysis of Distribution

View Source Code

In this section, we'll take a closer look at the distribution of various columns in our bank churn dataset. Utilizing Python code, we'll examine key variables such as credit scores, ages, account balances, and product usage. This analytical approach allows us to uncover patterns and insights that go beyond raw numbers, providing a clear understanding of customer behaviors and setting the foundation for informed decision-making.

							
	import csv
	import pandas as pd
	import matplotlib.pyplot as plt
	import seaborn as sns
	
	# Load Data
	filename = 'data/BankCustomerChurnPrediction.csv'
	data = pd.read_csv(filename)
	df = pd.DataFrame(data)
	pd.set_option("display.max_columns", None)
	pd.set_option("display.max_rows", None)
	
	# Initial data exploration + null value check
	print(df.isnull().sum())
	""">> customer_id         0
	credit_score        0
	country             0
	gender              0
	age                 0
	tenure              0
	balance             0
	products_number     0
	credit_card         0
	active_member       0
	estimated_salary    0
	churn               0
	"""
	
	# Check the amount of columns and rows
	print(df.shape)
	# >> (10000, 12)
			
	print(df.info())

	print(df.describe())
	
	# Cursory glance at snippet of table
	print(df.head(15))
	
	# Complete list of column names
	with open(filename) as f:
		reader = csv.reader(f)
		header_row = next(reader)
	print(header_row)


	# Check the headers of columns
	print(df.columns)
	"""Index(['customer_id', 'credit_score', 'country', 'gender', 'age', 'tenure',
	'balance', 'products_number', 'credit_card', 'active_member',
	'estimated_salary', 'churn']"""

This dataset is already cleaned and contains no missing values or duplication. It is ready for basic views of it's distribution. Provided below is an example of the code needed to visualise the distributions of important variables using Matplotlib.

						
	# Visualisation of data
	# Age Histogram
	plt.hist(df['age'], bins=20, color='skyblue', edgecolor='black')
	plt.xlabel('Age')
	plt.ylabel('Count')
	plt.title('Distribution of Age')
	plt.show()
	
	# Gender bar Graph
	plt.figure(figsize=(8, 6))
	sns.countplot(x='gender', data=df, palette='pastel')
	plt.title('Distribution of Gender')
	plt.show()
	
	# Country plot
	plt.figure(figsize=(8, 6))
	sns.countplot(x='country', data=df, palette='pastel')
	plt.title('Distribution of Country')
	plt.show()

Some of the more interesting resulting distributions are provided below.

distribution_of_ages dist_gender

dist_balance dist_country

dist_credit dist_salary

As this data is looking at banking customers and products, its important to consider how this may impact the distributions observed. These distributions indicate a mostly male population from Europe between the ages of 30-40. Our data also indicates a skewed distribution towards a better credit score.

Relational Analysis

The following section continues to visualise our bank customer data through comparing those who churned vs didnt.

View Source Code

To better understand what features predict a higher likelihood of customer attrition. Correlation and distribution were further examined using the following python and more.

						
	# Stacked bar plot for Number of Products vs. Churn
	products_churn_cross = pd.crosstab(index=df['products_number'], columns=df['churn'])
	products_churn_cross.plot(kind='bar', stacked=True, color=['lightblue', 'salmon'])
	plt.title('Churn by Number of Products')
	plt.xlabel('Number of Products')
	plt.ylabel('Count')
	plt.show()
	
	
	# Cross-tabulation
	gender_churn_cross = pd.crosstab(df['gender'], df['churn'])
	plt.figure(figsize=(8, 6))
	gender_churn_cross.plot(kind='bar', stacked=True, color=['lightblue', 'salmon'])
	plt.title('Churn by Gender')
	plt.xlabel('Gender')
	plt.ylabel('Count')
	plt.show()
	
	# Stacked bar plot for Gender and Credit Card vs. Churn
	gender_credit_churn_cross = pd.crosstab(index=[df['gender'], df['credit_card']], columns=df['churn'])
	gender_credit_churn_cross.plot(kind='bar', stacked=True, color=['lightblue', 'salmon'])
	plt.title('Churn by Gender and Credit Card')
	plt.xlabel('Gender, Credit Card', rotation=0)  # Rotate x-axis labels to be horizontal
	plt.ylabel('Count')
	plt.show()
	
	# Heatmap for selected numerical variable correlations
	plt.figure(figsize=(10, 8))
	selected_numerical_cols = ['age', 'balance', 'products_number', 'estimated_salary']
	selected_corr_matrix = df[selected_numerical_cols].corr()
	sns.heatmap(selected_corr_matrix, annot=True, cmap='coolwarm')
	plt.title('Correlation Matrix for Selected Numerical Variables')
	plt.show()

The following bar graphs show an increased likelihood of attrition among customers who use more than 2 products and females.

churn_product gender_churn

Looking more closely at distribution of tenure and credit score, its plausible that the newest customers and oldest customers have an increased likelihood of attrition, seen through a much larger interquartile range. No significant differences between customers who have churned and haven't were seen when investigating credit scores.

churnvstenurebox churnvscreditbox

While comparing churn versus no churn in these variables can reveal crucial distinctions, the effectiveness of this analysis is enhanced by incorporating additional demographic variables. Let's revisit credit scores and tenure. Below is the code used for more comprehensive visualizations.

						

	# Distribution of Credit Scores for Churned and Not Churned customers
	plt.figure(figsize=(12, 6))
	sns.kdeplot(df[df['churn'] == 0]['credit_score'], label='Not Churned', shade=True)
	sns.kdeplot(df[df['churn'] == 1]['credit_score'], label='Churned', shade=True)
	plt.xlabel('Credit Score')
	plt.ylabel('Density')
	plt.title('Distribution of Credit Scores by Churn')
	plt.legend()
	plt.show()
	
	# Scatter plot of Tenure vs. Age with Hue by Churn
	plt.figure(figsize=(12, 8))
	sns.scatterplot(x='tenure', y='age', hue='churn', data=df, palette='coolwarm', alpha=0.7)
	plt.title('Scatter Plot of Tenure vs. Age with Hue by Churn')
	plt.xlabel('Tenure')
	plt.ylabel('Age')
	plt.show()

creditchurnvsnochurn
Looking at distribution in this way shows a subtle but potentially significant difference in credit scores of those who are churned, indicating a lower credit score is more associated with customers leaving the bank. .

creditchurnvsnochurn
When pairing tenure with age, we can now see that those between the age of 40 and 65 are more likely to churn. This association persist through years of tenure with the bank. analysis of any longitudinal populations.

A quick way of objectively checking some of these associations is through a correllation matrix.

					

# Heatmap for selected numerical variable correlations
plt.figure(figsize=(10, 8))
selected_numerical_cols = ['age', 'balance', 'products_number', 'estimated_salary']
selected_corr_matrix = df[selected_numerical_cols].corr()
sns.heatmap(selected_corr_matrix, annot=True, cmap='coolwarm')
plt.title('Correlation Matrix for Selected Numerical Variables')
plt.show()

creditchurnvsnochurn
Of the selected variables, we can see that the previous trend of age increase being associated with churn is replicated in the correlation matrix.

Conclusion

Though brief, this analysis outlined several factors associated with increased bank customer attrition.

Increased age of the customer was associated with churn, this remained true regardless of tenure.
A worse credit score was associated with increased churn.
Female customers were more likely to churn.
Using more than 2 products had a much higher likelihood of churning.