SaaS Churn Prediction Documentation

1. Executive Summary

The Churn Prediction for SaaS Platform project delivers an end-to-end machine learning solution to forecast customer churn, enabling proactive retention strategies. It ingests customer data from MySQL, applies feature engineering on metrics like tenure, usage frequency, and support tickets, builds ensemble models using RandomForest and XGBoost, incorporates SHAP for explainability, and provides an interactive Streamlit dashboard for the retention team. The system achieves 87%+ accuracy (0.89 AUC-ROC with XGBoost), reduces churn by an estimated 25%, and ensures interpretability, completed over 7.5 months from April 2025 to November 2025.

2. Architecture Overview

The architecture follows a streamlined pipeline: data is extracted from MySQL via ETL processes, preprocessed with feature engineering and balancing, trained using ensemble ML models (RandomForest and XGBoost), explained with SHAP values, and visualized through a Streamlit dashboard for interactive analytics. This design ensures efficiency, scalability, and integration with existing infrastructure, focusing on churn definition (e.g., 30-day inactivity), model predictions, and actionable insights for retention teams.

3. Technology Stack

The system uses Python for data processing and development, Scikit-Learn for RandomForest modeling and metrics, XGBoost for gradient boosting, LIME/SHAP for explainability, and MySQL for relational data storage and querying. Additional tools include Pandas for manipulation, Streamlit for the interactive dashboard, and Matplotlib for SHAP visualizations.

4. Churn Model and Features

The churn model employs ensemble methods with RandomForest for robustness and XGBoost for complex interactions, trained on stratified splits (70/15/15). Features include tenure (days since signup), usage frequency (logins per period), support tickets (count of open/resolved), plus additional metrics like amount deviation or categoricals (one-hot encoded). SMOTE handles class imbalance, with SHAP providing global/local explanations, highlighting tenure and tickets as key drivers.

5. Data Processing

Data processing extracts from MySQL using SQL queries and Pandas, engineers features (e.g., tenure calculation, frequency aggregation), preprocesses with scaling and encoding, and handles imbalances via SMOTE. Models are trained with hyperparameter tuning and early stopping, predictions stored back in MySQL, and explanations generated via SHAP, ensuring data quality, anonymization for privacy, and efficient querying for dashboard integration.

6. Project Timeline (7.5 Months)

  • 📅 Month 1: Planning & Data Prep (Define churn, extract dataset from MySQL).
  • 📅 Month 1.5-3: Feature Engineering (Prepare balanced dataset).
  • 📅 Month 3-5: Model Development (Train/tune RandomForest and XGBoost).
  • 📅 Month 5-6: Explainability (Integrate SHAP for visualizations).
  • 📅 Month 6-7: Dashboard (Develop Streamlit interactive tool).
  • 📅 Month 7-7.5: Testing & Deployment (Validation and rollout).

7. Testing & Deployment

Testing includes unit validation for features and models, integration checks for pipeline flow, performance tuning for AUC-ROC >0.85, and usability testing for dashboard (<5s response). Deployment integrates models and dashboard with MySQL, using a phased rollout with anonymization for privacy, bias checks via balanced sampling, and rollback options by reverting to baseline models if needed.

8. Monitoring & Maintenance

Post-go-live, monitor model accuracy and drift via periodic retraining on new data, dashboard usage logs, and MySQL query performance, aiming for >99% uptime and <5s responses. Maintenance includes quarterly updates for SHAP explanations, monthly bias audits, and cost controls, with alerts for low engagement patterns to trigger interventions.

9. Roles & Responsibilities

  • 🛠️ Data Engineers: Manage ETL and MySQL integration.
  • 🤖 ML Engineers: Develop models and features with SHAP.
  • 📊 BI Developers: Build the Streamlit dashboard.
  • 🚀 DevOps: Ensures deployment and monitoring.
  • 💼 Project Manager: Handles Agile sprints and stakeholder feedback.