Cummins PreventTech — Connected Diagnostics
Industrial IoT predictive diagnostics platform for connected engine fleets at Cummins. PySpark feature engineering at scale, Azure ML model training, and Databricks orchestration powering real-time fleet health scoring.
Overview
At Cummins, I contributed to the PreventTech / Connected Diagnostics platform — an industrial IoT system providing real-time health scoring and predictive diagnostics for connected engine fleets across global customers.
Scale
- Millions of telemetry data points per day from connected diesel engines
- Fleet sizes ranging from a handful of vehicles to thousands of commercial trucks
- Real-time fault prediction and remaining useful life (RUL) estimation
Technical Contributions
Data Engineering
- PySpark feature engineering pipelines on Azure Databricks — processing raw CAN bus telemetry into ML-ready feature sets
- Streaming ingestion via Azure Event Hubs
- Data quality validation and schema enforcement at ingestion
ML Platform
- Azure ML for model training, experiment tracking, and deployment
- Gradient boosting and LSTM models for fault code prediction and RUL estimation
- Model versioning and A/B testing framework
Orchestration
- Databricks job orchestration for daily batch scoring across the full fleet
- Alerting pipeline: model output → fleet health dashboard → customer-facing recommendations
Business Impact
- Enabled early fault detection days before engine failure events
- Reduced unplanned road breakdowns for commercial fleet customers
- Provided actionable maintenance recommendations through Connected Diagnostics dashboard
Tech Stack
PySpark · Azure ML · Databricks · Azure Event Hubs · XGBoost · PyTorch · Python · SQL