The Healthcare Readmission Prediction (Machine Learning) project delivers an end-to-end AI solution to forecast 30-day patient readmission risks using EMR/EHR data, enabling preventive interventions and reducing costs. It preprocesses datasets with ICD codes, trains a Keras neural network for binary classification, incorporates SHAP for interpretability, and visualizes insights via Power BI dashboards for hospital management. The system achieves 0.86 AUC-ROC, cuts readmissions by ~20%, ensures HIPAA compliance, and promotes better outcomes, completed over 7.5 months from April to November 2025 for efficient healthcare analytics.
The architecture follows a comprehensive pipeline: EHR data is ingested and preprocessed with ICD mapping and balancing, fed into a Keras neural network (MLP with dense/dropout layers) for probability prediction, explained via SHAP values/plots, and integrated into Power BI for interactive dashboards (risk scores, trends, cohorts). This design ensures robustness against biases, scalability for 10,000+ patients, and secure deployment, focusing on clinical factors like age/comorbidities for actionable hospital insights.
The system uses Python for development and scripting, TensorFlow/Keras for neural network modeling, Scikit-Learn for preprocessing (imputation, scaling, SMOTE, metrics), and Power BI for visualization and dashboards. Additional libraries include Pandas for data handling, SHAP for interpretability; tools support DAX queries and automated exports for integration.
The readmission model uses a Keras MLP (128-64-1 layers, relu/sigmoid activations, dropout 0.3) for binary classification, trained with Adam optimizer, binary cross-entropy loss, 50 epochs on stratified splits (80/20). Features include ICD codes (one-hot/embedded), age, numerics (imputed/scaled); handling includes SMOTE for imbalance. SHAP provides summary/beeswarm/dependence plots, highlighting age and ICD comorbidities as top predictors, achieving 0.86 AUC-ROC.
Data processing ingests from CSV/SQL using Pandas, preprocesses with ICD mapping/one-hot encoding, imputation (median/KNN), scaling (StandardScaler), outlier handling, and balancing (SMOTE). Models are trained/evaluated, SHAP values computed on background data, predictions exported to CSV/SQL for Power BI, ensuring de-identification for privacy, robustness, and efficient handling of structured EHR with hierarchies.
Testing includes unit for preprocessing/model functions, integration for pipeline flow, performance for AUC-ROC >0.80, and bias checks via balanced training. Deployment exports predictions to secure SQL, connects Power BI for real-time querying, uses phased rollout with de-identification, and supports rollback via model versions if issues arise.
Post-deployment, monitor accuracy/drift via periodic retraining on new EHR data, dashboard usage, and SHAP audits, aiming for >99% uptime and robust predictions. Maintenance includes quarterly updates for ICD mappings/features, monthly compliance/bias reviews, and cost controls, with alerts for high-risk trends to trigger interventions.