Healthcare Readmission Prediction Documentation

1. Executive Summary

The Healthcare Readmission Prediction (Machine Learning) project delivers an end-to-end AI solution to forecast 30-day patient readmission risks using EMR/EHR data, enabling preventive interventions and reducing costs. It preprocesses datasets with ICD codes, trains a Keras neural network for binary classification, incorporates SHAP for interpretability, and visualizes insights via Power BI dashboards for hospital management. The system achieves 0.86 AUC-ROC, cuts readmissions by ~20%, ensures HIPAA compliance, and promotes better outcomes, completed over 7.5 months from April to November 2025 for efficient healthcare analytics.

2. Architecture Overview

The architecture follows a comprehensive pipeline: EHR data is ingested and preprocessed with ICD mapping and balancing, fed into a Keras neural network (MLP with dense/dropout layers) for probability prediction, explained via SHAP values/plots, and integrated into Power BI for interactive dashboards (risk scores, trends, cohorts). This design ensures robustness against biases, scalability for 10,000+ patients, and secure deployment, focusing on clinical factors like age/comorbidities for actionable hospital insights.

3. Technology Stack

The system uses Python for development and scripting, TensorFlow/Keras for neural network modeling, Scikit-Learn for preprocessing (imputation, scaling, SMOTE, metrics), and Power BI for visualization and dashboards. Additional libraries include Pandas for data handling, SHAP for interpretability; tools support DAX queries and automated exports for integration.

4. Readmission Model and Features

The readmission model uses a Keras MLP (128-64-1 layers, relu/sigmoid activations, dropout 0.3) for binary classification, trained with Adam optimizer, binary cross-entropy loss, 50 epochs on stratified splits (80/20). Features include ICD codes (one-hot/embedded), age, numerics (imputed/scaled); handling includes SMOTE for imbalance. SHAP provides summary/beeswarm/dependence plots, highlighting age and ICD comorbidities as top predictors, achieving 0.86 AUC-ROC.

5. Data Processing

Data processing ingests from CSV/SQL using Pandas, preprocesses with ICD mapping/one-hot encoding, imputation (median/KNN), scaling (StandardScaler), outlier handling, and balancing (SMOTE). Models are trained/evaluated, SHAP values computed on background data, predictions exported to CSV/SQL for Power BI, ensuring de-identification for privacy, robustness, and efficient handling of structured EHR with hierarchies.

6. Project Timeline (7.5 Months)

  • 📅 Month 1: Planning & Data Prep (Secure access, initial ICD preprocessing).
  • 📅 Month 2-3: Feature Engineering (Encode ICD, prepare balanced dataset).
  • 📅 Month 4-5: Model Development (Train Keras network, evaluate metrics).
  • 📅 Month 6: Interpretability (Integrate SHAP for plots/explanations).
  • 📅 Month 7: Dashboard (Build/integrate Power BI visuals).
  • 📅 Month 7-7.5: Testing & Deployment (UAT and secure handover).

7. Testing & Deployment

Testing includes unit for preprocessing/model functions, integration for pipeline flow, performance for AUC-ROC >0.80, and bias checks via balanced training. Deployment exports predictions to secure SQL, connects Power BI for real-time querying, uses phased rollout with de-identification, and supports rollback via model versions if issues arise.

8. Monitoring & Maintenance

Post-deployment, monitor accuracy/drift via periodic retraining on new EHR data, dashboard usage, and SHAP audits, aiming for >99% uptime and robust predictions. Maintenance includes quarterly updates for ICD mappings/features, monthly compliance/bias reviews, and cost controls, with alerts for high-risk trends to trigger interventions.

9. Roles & Responsibilities

  • 📂 Data Engineers: Manage ingestion and preprocessing with ICD.
  • 🧠 ML Engineers: Develop Keras models and SHAP integration.
  • 📊 BI Developers: Build Power BI dashboards.
  • 🛡️ DevOps: Ensures secure deployment and compliance.
  • 💼 Project Manager: Oversees Agile sprints and expert feedback.