Sales Forecasting System Documentation

1. Executive Summary

The Sales Forecasting System with ML + Prophet project delivers a hybrid time-series solution for accurate sales predictions, integrating seasonal/holiday effects and exogenous factors to optimize inventory and planning. It builds Prophet models for decomposition, XGBoost for feature-driven regression (weather, promotions, marketing spend), compares their performance, and stores outputs in Snowflake fact tables for analytics. The system achieves 92% accuracy (MAPE <8%), reduces overstock by 35%, handles 1M+ records daily, and was completed over 10 months from January to November 2025 for retail/e-commerce efficiency.

2. Architecture Overview

The architecture follows an end-to-end pipeline: historical sales data is ingested from sources, merged with exogenous features (weather APIs, promo calendars, marketing logs) via Python/Pandas, forecasted using Prophet for seasonality/holidays and XGBoost for regression, compared via metrics/visuals, and persisted in Snowflake schemas (raw, processed, fact tables) for querying/BI integration. This design ensures scalability, automation via scripts/Airflow, and hybrid selection for optimal predictions in dynamic markets.

3. Technology Stack

The system uses Python for scripting and integration, Facebook Prophet for time-series forecasting, XGBoost for gradient boosting regression, and Snowflake for cloud data warehousing and storage. Additional libraries include Pandas for merging/processing, Scikit-learn for metrics (MAE/RMSE/MAPE), and Snowflake connector for ingestion; supports APIs for external data like weather.

4. Forecasting Model and Features

The forecasting model employs Prophet for additive time-series with yearly/weekly seasonality, custom holidays (e.g., Black Friday dataframe), and regressors; XGBoost for regression with lagged sales, weather (temp/rain), promo flags, marketing spend, and holiday binaries, trained with squarederror objective, 0.05 LR, 1000 estimators. Comparison uses train/test splits, cross-validation metrics (MAE/RMSE), and visuals (forecast vs. actuals), with hybrid averaging 92% accuracy; features merged on date keys with validation.

5. Data Processing

Data processing ingests sales from CSV/SQL, merges exogenous sources (weather APIs, promos, spend) using Pandas time-joins, handles missing/outliers/normalization, formats for models (ds/y for Prophet, features for XGBoost). Forecasts are generated (e.g., 365-day future dataframe with pre-filled regressors), compared, and batch-inserted into Snowflake fact tables (date, product_id, forecast_prophet/xgboost, actual) via connector, ensuring daily updates, versioning, and efficiency for large datasets.

6. Project Timeline (10 Months)

  • 📅 Month 1-1.5: Planning & Data Prep (Integrate sources, Snowflake setup).
  • 📅 Month 1.5-3.5: Feature Engineering (Merge features, dataset prep).
  • 📅 Month 3.5-6: Model Development (Implement Prophet and XGBoost).
  • 📅 Month 6-7.5: Comparison & Optimization (Evaluate metrics, tune params).
  • 📅 Month 7.5-9: Integration & Storage (Automate pipeline in Snowflake).
  • 📅 Month 9-10: Testing & Deployment (End-to-end tests, handover).

7. Testing & Deployment

Testing includes unit for merging/model functions, integration for pipeline flow, accuracy via MAPE <10% and RMSE/MAE on test sets, and load for 1M+ records. Deployment automates via Python scripts/cron/Airflow, connects to Snowflake for storage/querying, uses phased rollout with validation scripts, and supports rollback via model versions if issues arise.

8. Monitoring & Maintenance

Post-deployment, monitor forecast accuracy/drift via daily metrics in Snowflake, pipeline runs, and feature alignment, aiming for >99% uptime and <30min daily processing. Maintenance includes quarterly retraining on new data, monthly data quality/validation checks, and cost controls (elastic Snowflake compute), with alerts for high MAPE deviations to trigger reviews.

9. Roles & Responsibilities

  • 📂 Data Engineers: Manage ingestion, merging, and Snowflake schemas.
  • 🤖 ML Engineers: Develop Prophet/XGBoost models and comparisons.
  • 🚀 DevOps: Handles automation and Airflow deployment.
  • 📊 Analysts: Evaluate metrics and forecasting risks.
  • 💼 Project Manager: Oversees Agile sprints and stakeholder reviews.