This segment delves into the Exploratory Data Analysis (EDA) phase, where we scrutinize the data to uncover patterns and trends related to customer churn. It entails selecting columns by type, categorizing numerical variables by churn, and visualizing the distribution of categorical features by churn. Through probability density function plots, we discern potential correlations between numerical features and churn, shedding light on the differences in distribution between churned and not churned customers. Additionally, analyzing the distribution of categorical features reveals intriguing patterns offering valuable insights into potential factors influencing churn. Here are several noteworthy observations:
1- City tier 3 exhibits a higher churn rate compared to other city types, prompting questions about potential contributing factors.
2- A disparity in churn rate between men and women warrants further investigation into potential gender-related factors influencing churn.
3- The category ‘Laptop and Accessories’ displays a lower churn rate compared to ‘Mobile Phone’, prompting examination of contributing factors.
4- Single individuals demonstrate a higher churn rate compared to other marital statuses, leading to inquiries about underlying reasons.
5- ‘Cash on Delivery’ payment mode indicates a higher churn rate than other payment modes, prompting investigation into influencing factors.
Moreover, features such as “Complain”, “Tenure”, “Days since last order”, and “Cashback amount” show more pronounced differences between churned and not churned customers’ distribution functions. This suggests a potential correlation with churn, making them valuable for predictive modeling. Examination of the correlation values of all features will assess their relationship with churn, contributing to the predictive model’s development. While Kernel Density Estimation charts with probability density functions may not be optimal for discrete variables due to dataset characteristics, they still offer valuable insights for analysis.
Subsequently, visualizing the correlation matrix using a heatmap allows us to extract and sort the correlation values of all features with ‘Churn’ in descending order. These values provide insights into the relationship between each feature and churn, aiding in the identification of influential factors for predictive modeling. Notably, features with higher correlation coefficients are more useful when included in the churn prediction model.
Among the features with the most significant correlation values, ‘Tenure’ exhibits the strongest negative correlation (-0.34) with churn, indicating longer tenure associated with lower churn rates. Conversely, ‘Complain’ shows the highest positive correlation (0.25) with churn, implying that customers who complain are more likely to churn. Additionally, ‘DaySinceLastOrder’ and ‘CashbackAmount’ demonstrate negative correlations (-0.16 and -0.15, respectively) with churn, while ‘NumberOfDeviceRegistered’, ‘SatisfactionScore’, and ‘CityTier’ exhibit positive correlations (0.11, 0.11, and 0.08, respectively).
While the positive correlation of ‘Complain’ and negative correlations of ‘Tenure’ and ‘CashbackAmount’ with churn align with intuitive expectations, the correlations of ‘DaySinceLastOrder’ and ‘CityTier’ with churn warrant further investigation due to their less obvious rationale. Moreover, the positive correlation of ‘SatisfactionScore’ with churn is intriguing and merits closer examination to uncover its implications for customer retention strategies. These insights from the correlation analysis inform our next steps in refining the predictive modeling approach for customer churn.
Let´s take a look at the analysis of some key features.
City Tier
The analysis dives deeper into understanding the impact of ‘CityTier’ on customer churn. By grouping the data based on ‘CityTier’ and ‘Churn’, we observe notable differences in churn ratios among the three city tiers. Tier 3 stands out with the highest churn ratio of 21.37%, followed by Tier 2 with 19.83%, and Tier 1 with the lowest churn ratio of 14.51%. This disparity underscores the importance of considering geographical factors in customer retention strategies.
To gain further insights into the relationship between ‘CityTier’ and churn, we explore the distribution of ‘Tenure’ values for each city tier using violin plots. ‘Tenure’ exhibits the highest absolute correlation value with churn (-0.34), making it a crucial factor in understanding churn dynamics. The plots reveal variations in median ‘Tenure’ values across city tiers, with Tier 3 displaying the lowest median tenure (8) and Tier 2 showing the highest median tenure (11). This suggests that differences in customer tenure may contribute to the variance in churn ratios among city tiers.
Additionally, we analyze the association between ‘Complain’ and ‘CityTier’, considering ‘Complain’ as the second feature with the highest absolute correlation value with churn (0.25). The bar chart illustrates the distribution of complaints across city tiers, with Tier 3 exhibiting the highest complaint ratio of 28.92%, followed closely by Tier 1 with 28.48%, and Tier 2 with 25.62%. This suggests a potential link between customer complaints and churn rates, particularly in Tier 3 cities.
Overall, the analysis reveals that Tier 3 cities experience higher churn ratios compared to other tiers, possibly due to a combination of factors such as shorter customer tenure and higher complaint ratios. Further investigation is warranted to identify the underlying reasons behind these trends and devise targeted retention strategies to mitigate churn in Tier 3 cities. While Tier 1 customers constitute the largest customer base with the lowest churn ratio, efforts should also focus on maintaining customer satisfaction and loyalty in this segment to ensure sustained business growth.
Gender
The analysis of gender-based churn ratios reveals a slight difference between male and female customers. Male customers exhibit a churn rate of 17.73%, while female customers churn at a slightly lower rate of 15.49%. However, this difference is not statistically significant, with a mean churn ratio of 16.61% and a standard deviation of 1.12%. Therefore, gender may not be a significant factor in determining churn in this dataset, suggesting that other features may have a stronger influence on customer attrition. Further investigation into these features is warranted to identify key drivers of churn and develop targeted retention strategies.
Preferred Order Category
The analysis of the ‘PreferedOrderCat’ feature unveils interesting insights into customer preferences and their impact on churn rates. Among the various categories, ‘Laptop & Accessory’ stands out with a below-average churn rate of 10.24%, positioning it as one of the categories with the lowest churn rates. In contrast, ‘Mobile Phone’ and ‘Fashion’ categories represent a significant portion of churned customers, collectively constituting 43.90% of the churned population. This observation aligns with preliminary findings and underscores the importance of understanding customer preferences in mitigating churn.
Examining the distribution of tenure across different order categories further corroborates these findings. Categories with higher churn rates tend to have shorter median tenures, indicating a potential correlation between tenure and churn. Additionally, when exploring the relationship between order categories and customer complaints, categories like ‘Grocery’ and ‘Fashion’ emerge with the highest complaint ratios despite their lower churn rates. This phenomenon suggests unique characteristics of these products, such as non-standardization and personalized interaction, which may influence customer satisfaction and churn behavior. These insights highlight the complexity of customer preferences and their implications for churn management strategies, emphasizing the need for targeted approaches tailored to specific product categories.
Marital Status
In this section of the analysis, we focused on exploring the relationship between ‘MaritalStatus’ and churn within the dataset of the ecommerce company. The examination revealed distinct patterns across different marital statuses concerning customer churn. Notably, single individuals exhibited the highest churn rate at 26.73%, followed by divorced individuals at 14.62%, and married individuals at 11.52%.
Further investigation through visualizations, such as violin plots comparing tenure distribution among marital statuses, highlighted that the order of marital status labels with the lowest tenure median mirrored the labels with the highest churn ratios. This observation suggests a potential link between shorter customer tenure and increased likelihood of churn, particularly among single customers.
Additionally, we analyzed the relationship between marital status and customer complaints. Surprisingly, complaint ratios did not show significant variations across different marital statuses, with divorced, married, and single individuals exhibiting comparable complaint ratios of approximately 29%. This indicates that while marital status may influence churn rates, customer complaints alone may not be a decisive factor in explaining these differences.
Overall, these insights provide valuable guidance for managers in developing targeted retention strategies. For instance, recognizing the higher churn propensity among single customers, the company could implement personalized retention initiatives tailored to this segment. Additionally, efforts should be made to address underlying factors contributing to churn across all marital statuses, such as enhancing overall customer satisfaction and improving product or service offerings.
Prefered Payment Mode
The analysis of ‘PreferredPaymentMode’ sheds light on the varying churn rates associated with different payment methods. Customers who prefer ‘Cash on Delivery’ exhibit the highest churn rate at 24.90%, followed closely by ‘E-wallet’ users at 22.80%. Conversely, customers who use ‘Credit Card’ and ‘Debit Card’ demonstrate relatively lower churn rates at 14.21% and 15.38%, respectively. The mean churn ratio across all payment methods stands at 18.94%, with a median churn ratio of 17.39% and a standard deviation of 4.19%.
Examining the distribution of tenure across different payment methods reveals a consistent pattern, with all payment methods exhibiting similar median tenure values ranging between 8 and 9. Despite slight variations in the first and third quartiles, the overall distribution of tenure appears to be comparable across all payment methods.
Furthermore, when analyzing the relationship between payment methods and customer complaints, no significant differences in complaint ratios emerge among the various payment methods. Regardless of the preferred payment mode, customer complaint ratios hover around 30%, indicating that customer complaints may not be a decisive factor in explaining differences in churn rates across payment methods.
Overall, while ‘PreferredPaymentMode’ appears to influence churn rates to some extent, other factors beyond customer complaints and tenure may contribute to the observed variations in churn rates among different payment methods. Further investigation into these factors is essential for developing targeted strategies to manage churn effectively across different payment methods.
Satisfaction Score
The analysis of ‘SatisfactionScore’ provides intriguing insights into the relationship between satisfaction levels and churn rates. Surprisingly, there appears to be a positive correlation between churn and satisfaction scores. Lower satisfaction scores, particularly scores 1 and 2, correspond to lower churn ratios of 11.51% and 12.63%, respectively. Conversely, higher satisfaction scores, such as score 5, exhibit the highest churn ratio at 23.83%. Scores 3 and 4 fall in between, with churn ratios of 17.20% and 17.13%, respectively.
This unexpected pattern challenges conventional assumptions about the relationship between customer satisfaction and churn. Typically, higher satisfaction levels are associated with lower churn rates, as satisfied customers are more likely to remain loyal. However, the data suggests the opposite trend, where higher satisfaction scores are linked to higher churn ratios.
One potential explanation for this counterintuitive finding could be a data collection error or an inverse interpretation of satisfaction scores, wherein higher scores indicate lower satisfaction with the product or service. Without additional context or information about the satisfaction survey methodology, it is challenging to draw definitive conclusions about the observed pattern.
Nevertheless, the significant deviation of score 5’s churn ratio, almost two standard deviations above the mean, underscores the importance of further investigation into the underlying factors driving customer churn across different satisfaction levels. Identifying the root causes of this unexpected relationship is crucial for devising effective retention strategies and improving overall customer satisfaction and loyalty.
Tenure
The analysis of the ‘Tenure’ feature sheds light on the distribution of customer tenure within the dataset. The distribution plot illustrates that the majority of clients have tenure values ranging between 2 and 16. However, it is notable that there is a significant concentration of data points at tenure levels 0 and 1. This suggests a large proportion of new customers or those who have recently joined the platform.
The quartile distribution plot further examines the distribution of tenure across quartiles and its relationship with churn. Clients are divided into quartiles based on their tenure values, with Quartile 1 representing the lowest tenure and Quartile 4 representing the highest.
Upon closer inspection, it becomes evident that a considerable portion of churned clients belongs to the group with the lowest tenure levels, specifically tenure levels 0 and 1. Approximately 46.52% of churned clients have a tenure of either 0 or 1. This highlights the importance of early customer retention efforts, as clients with lower tenure levels appear to be more prone to churn.
Understanding the tenure distribution and its impact on churn provides valuable insights for customer retention strategies. By identifying clients with low tenure levels and implementing targeted retention initiatives, such as on-boarding programs or personalized engagement campaigns, businesses can mitigate churn risk and foster long-term customer loyalty.
Complain
The examination of the ‘Complain’ feature provides insights into its distribution and its relationship with churn within the dataset. The statistics reveal that the majority of clients, approximately 83% of the dataset, do not have any recorded complaints. Meanwhile, the remaining 17% of clients have registered at least one complaint.
Analyzing the distribution of complaints in relation to churn, it becomes evident that clients who have filed complaints are more likely to churn. Specifically, 53% of the total churned clients, accounting for 508 out of 948 churned cases, have previously filed complaints. This highlights the significant role of the ‘Complain’ feature as an indicator of churn within the dataset.
Given the imbalanced nature of the dataset, where non-churned clients represent the majority, the presence of complaints emerges as a notable predictor of churn. Despite not exhibiting a strong correlation with other features, the ‘Complain’ feature stands out as a crucial indicator of customer dissatisfaction and potential churn.
Understanding the relationship between complaints and churn provides valuable insights for proactive churn prevention strategies. By identifying clients who have filed complaints and implementing targeted resolution measures, businesses can mitigate churn risk and enhance overall customer satisfaction. Additionally, leveraging customer feedback from complaints can inform product or service improvements, fostering long-term customer loyalty and retention.
Day Since Last Order
The analysis of the ‘DaySinceLastOrder’ feature provides valuable insights into customer behavior and its relationship with churn within the dataset.
The statistics reveal that the majority of customers have relatively recent purchase activity, with 75% of clients having made a purchase within the past 7 days. However, there are outliers representing 62 instances where the gap between the last order and the current date exceeds 1.5 times the interquartile range (IQR). These outliers could potentially indicate irregular purchasing patterns or data anomalies.
Visualizing the distribution of ‘DaySinceLastOrder’ through a violin plot illustrates the concentration of values within the lower range, indicating recent purchase activity among the majority of clients.
Further analysis categorizes the ‘DaySinceLastOrder’ values into custom intervals to assess their relationship with churn. The results highlight a significant correlation between recent purchase activity and churn. Specifically, a considerable proportion of churned clients, accounting for 883 out of 948 cases, have relatively low values for ‘DaySinceLastOrder’, indicating a short time since their last purchase.
Examining specific intervals, it becomes apparent that the interval between 0 and 3 days since the last order has the highest churn ratio, with 21.54% of clients within this range churning. This suggests that clients who exhibit frequent purchase activity but subsequently churn may require targeted retention efforts to understand and address their specific needs or concerns.
Overall, understanding the relationship between the recency of purchases and churn provides actionable insights for customer retention strategies. By identifying clients with recent purchase activity who are at risk of churning, businesses can implement proactive measures to engage and retain these valuable customers, ultimately fostering long-term loyalty and profitability.
Cashback Amount
The analysis of the ‘CashbackAmount’ feature sheds light on its relationship with churn within the dataset.
Upon examining the statistics, it’s evident that the majority of customers receive cashback amounts within a relatively narrow range, with 75% of clients falling within the range of 145.77 to 196.39 units. However, there are a significant number of outliers, amounting to 438 instances, which may impact the accuracy of the analysis.
Visualizing the distribution of ‘CashbackAmount’ by churn status through a violin plot reveals interesting insights. Churned clients tend to have lower median cashback amounts compared to non-churned clients, indicating a potential correlation between lower cashback values and churn.
Further analysis categorizes the ‘CashbackAmount’ values into quartiles to assess their relationship with churn. The results highlight that clients in the first quartile, representing lower cashback amounts, have the highest churn ratio. This suggests that clients receiving lower cashback amounts may be more inclined to churn, potentially due to factors such as perceived value for money or dissatisfaction with rewards.
It’s worth noting that the presence of outliers in the dataset may slightly skew the analysis. These outliers represent extreme values that deviate significantly from the typical range of cashback amounts, and their influence should be considered when interpreting the results.
In conclusion, understanding the impact of cashback amounts on churn provides valuable insights for optimizing customer retention strategies. By identifying clients receiving lower cashback amounts who are at a higher risk of churning, businesses can tailor their rewards programs and incentives to better meet customer expectations and enhance loyalty.
EDA KEY TAKEAWAYS
1- Tier 3 cities have the highest churn rate (21.37%) and the highest complaint ratio (28.92%), indicating a link between complaints, geographic differences and churn. Managers should customize retention strategies, like localized promotions and improved support, for these areas.
2- ’Mobile Phone’ and ‘Fashion’ have high churn rates (43.90% combined). Managers should prioritize understanding customer issues in these categories to reduce churn.
3- Despite lower churn rates, ‘Grocery’ and ‘Fashion’ categories have high complaint ratios. Managers should address product quality and service issues to improve customer satisfaction.
4- Single individuals exhibit the highest churn rate at 26.73%, followed by divorced individuals at 14.62%, and married individuals at 11.52%, indicating a correlation between marital status and churn. Implementing personalized retention initiatives, such as onboarding programs or loyalty rewards, specifically targeting single customers should reduce churn.
5- ’Cash on Delivery’ and ‘E-wallet’ users have the highest churn rates. Managers should focus on retaining customers using high-churn payment methods.
6- Surveys targeting high-churn payment method users can provide insights into their concerns. Managers can use this feedback to implement targeted improvements and enhance retention.
7- Higher satisfaction scores surprisingly correlate with higher churn rates, challenging conventional assumptions about customer loyalty. This unexpected pattern suggests the need for further investigation into the interpretation and collection of satisfaction scores. Managers should review survey methodologies and customer feedback processes to ensure accurate data collection.
8- The majority of clients have tenure values between 2 and 16, with a concentration at levels 0 and 1, indicating many new customers. Managers can focus on enhancing the onboarding experience for new customers to increase retention.
9- Quartile distribution analysis reveals a correlation between low tenure levels and churn. Developing personalized engagement campaigns tailored to new customers can strengthen retention strategies.
10- 83% of clients have no recorded complaints, while 17% have filed at least one complaint. Managers can focus on understanding satisfied customers to enhance loyalty and experience.
11- Clients who complained are more likely to churn, with 53% of churned clients having filed complaints. Prioritizing resolution for complaints can reduce churn and improve satisfaction.
12- Interval between 0 and 3 days since the last order has the highest churn ratio (21.54%). Prioritize retention strategies for clients in this interval to reduce churn risk.
13- Churned clients tend to have lower median cashback amounts compared to non-churned clients. Explore ways to incentivize higher cashback amounts for at-risk customers to reduce churn.
14- Clients in the first quartile of cashback amounts have the highest churn ratio. Target retention efforts towards clients receiving lower cashback amounts to improve customer satisfaction.
15- Understanding the relationship between cashback amounts and churn enables optimization of customer retention strategies. Customize rewards programs and incentives to better align with customer preferences and increase loyalty.