Surgical Robot Fleet — Predictive Maintenance MLOps

End-to-end MLOps system for predictive maintenance of a global surgical robot fleet at Intuitive Surgical. Achieved 25% reduction in unplanned downtime and $700K+ in savings through CI/CD pipelines, automated model monitoring, and real-time alerting.

Overview

At Intuitive Surgical, I designed and deployed a production MLOps system for predictive maintenance of the da Vinci surgical robot fleet — a globally distributed set of high-value medical devices where unplanned downtime has direct patient care consequences.

Business Impact

  • 25% reduction in unplanned downtime across the global surgical robot fleet
  • $700,000+ in annual cost savings from prevented emergency service dispatches and extended component life
  • Shifted maintenance posture from reactive → condition-based → predictive

MLOps Architecture

Data Pipeline

  • Real-time telemetry ingestion from robot IoT sensors (vibration, temperature, motor current, error codes)
  • PySpark-based feature engineering at scale
  • Delta Lake on Azure for reliable data versioning

Model Layer

  • Failure prediction models: XGBoost, LSTM for temporal patterns
  • Multi-label classification across failure modes (mechanical, electrical, software)
  • MLflow experiment tracking, model registry, and artifact versioning

CI/CD for ML

  • GitHub Actions pipelines: data validation → training → evaluation → staging → production promotion
  • Automated retraining triggers on data drift detection
  • Quality gates: F1, AUC, and calibration thresholds enforced before any model promotion

Monitoring

  • Evidently for feature drift and data quality monitoring
  • Azure ML Monitor for production model performance tracking
  • PagerDuty alerting for threshold violations
  • Weekly monitoring reports auto-generated and distributed to engineering leadership

Tech Stack

Python · PySpark · MLflow · Azure ML · Delta Lake · XGBoost · PyTorch/LSTM · Evidently · GitHub Actions · Docker