RAG-Based Predictive Maintenance System
Production RAG pipeline for industrial predictive maintenance using ChromaDB, XGBoost, and a Claude-powered recommendation layer — built on the AI4I 2020 dataset.
Overview
A full production-grade RAG (Retrieval-Augmented Generation) pipeline for predictive maintenance in industrial settings, built from scratch on the AI4I 2020 machine failure dataset.
The system combines classical ML (XGBoost failure prediction) with a vector retrieval layer (ChromaDB) and an LLM-powered recommendation engine (Claude) to deliver actionable maintenance alerts with contextual justification.
Architecture
Sensor Data → Feature Engineering → XGBoost Failure Classifier
↓
ChromaDB Vector Store
(historical failure patterns)
↓
Claude Recommendation Layer
↓
Maintenance Alert + Justification
Key Components
- Ingestion pipeline — batch simulation of sensor telemetry with realistic noise and drift
- XGBoost classifier — trained on AI4I 2020 dataset (tool wear, heat dissipation, torque, rotational speed features)
- ChromaDB vector store — embeds historical failure events for semantic retrieval
- Claude recommendation layer — synthesizes classifier output + retrieved context into natural-language maintenance recommendations
- Retrieval evaluation module — Hit Rate@K, MRR, and NDCG metrics to benchmark retrieval quality
- MLflow tracking — experiment logging, model registry, and artifact storage
Results
- Retrieval Hit Rate@5: 0.87
- XGBoost F1 (failure class): 0.91
- End-to-end latency (recommendation): < 2s
Tech Stack
Python · LangChain · ChromaDB · XGBoost · MLflow · Pandas · Anthropic API · Streamlit