FleetPulse: Proactive AMR Fleet Command Center

Elevator Pitch

  • Real-time AI monitoring for autonomous robots that predicts failures before they happen, visualizes issues on a live map, and alerts operators instantly.

Inspiration

  • A robotics deployment engineer in a Singapore warehouse told us robots generate 500MB+ of logs daily, yet failure diagnosis remains manual and slow.
  • Engineers still scroll through 90,000+ lines of CSV to spot a single battery voltage spike — futuristic robots, legacy tools.
  • In hospitals and airports deploying heterogeneous fleets (cleaning, delivery, security), this blind spot is risky. We built a “Single Pane of Glass” that turns raw, noisy telemetry into an intuitive, real-time, predictive command center.

How We Built It

  • Streaming-friendly ingestion and analysis, end-to-end:
    • Frontend: Next.js 14 + React 18 + TypeScript; Tailwind CSS; Leaflet.js map; Recharts for trends; WebSocket client for instant updates.
    • Backend: FastAPI + Uvicorn; Pydantic models for validation; REST ingestion; WebSocket broadcast; per-robot IsolationForest for anomaly detection; health scoring and RUL estimation.
    • Simulation: Python fleet simulator generates realistic telemetry and injects controlled failures to test detection and alerting.
    • Notifications: Telegram Bot API for Markdown-formatted critical alerts.

Streaming ETL Concepts

  • We treat telemetry as streams, not static files. Data is processed in chunks, with rolling windows for normalization and trend analysis.
  • Z-Score Normalization (for sensor noise filtering):

$$ z = \frac{x - \mu}{\sigma} $$

  • (x): current reading; (\mu): rolling mean (e.g., last 60s); (\sigma): rolling std. If (|z| > 3), flag a critical anomaly.

Architecture Overview

  • Client → Presentation → Intelligence → Simulation → Notification
  • Flow:
    • Simulator → FastAPI (/telemetry) → Health/ML → WebSocket broadcast → Next.js dashboard
    • Critical conditions → Telegram alert

Data Engineering and ML

  • Ingestion:
    • Validates telemetry and maintains short rolling histories per robot.
    • Normalizes signals and aggregates for efficient visualization.
  • Anomaly detection:
    • IsolationForest (unsupervised) per robot on multivariate features (battery, temperature, CPU, velocity), producing an anomaly score and risk level.
  • Health scoring:
    • Compresses multivariate signals into one actionable metric and color-coded status.
  • RUL (Remaining Useful Life):
    • Estimates time-to-critical threshold from recent trend slopes (e.g., battery decay).

Challenges We Faced

  • Real-time consistency and concurrency:
    • Coordinating REST ingestion and WebSocket broadcasting without race conditions or stale views.
    • Pattern: clear state boundaries, short-lived buffers, and event-driven broadcasts.
  • Browser performance with high-frequency data:
    • Rendering thousands of points can stutter.
    • Solution: server-side aggregation and event-level summaries; selective UI updates.
  • Environment compatibility:
    • Binary wheels for NumPy/SciPy/Sklearn on Windows; careful version pinning for Python.
  • Alert fatigue vs responsiveness:
    • Thresholds and hysteresis reduce noisy alerts while keeping operators informed.

ROS 2 / Open-RMF Roadmap

  • Current backend is FastAPI with simulated telemetry; roadmap includes ROS 2 ingestion via rclpy and rmf_fleet_msgs FleetState subscriptions.
  • Bridge pattern:
    • ROS thread pushes messages to a thread-safe queue.
    • API/alerting workers consume from the queue without blocking.

What We Learned

  • Open-RMF interoperability:
    • Standardizing on rmf_fleet_msgs enables vendor-agnostic monitoring across facilities.
  • UX as a safety feature:
    • Clear health bars and pulsing alerts reduce cognitive load under stress more than raw error codes.
  • Data engineering > model complexity:
    • Robust pipelines (validation, normalization, aggregation) matter more than complex models for reliability.

Math Notes (LaTeX)

  • IsolationForest risk mapping:

$$ r = \sigma!\left(\beta \, (s_{\text{thr}} - s(x))\right), \quad \sigma(z) = \frac{1}{1 + e^{-z}} $$

  • Health Score (illustrative):

$$ H = 100 - \alpha_b \, \phi(b) - \alpha_T \, \phi(T) - \alpha_C \, \phi(C) - \alpha_v \, \phi(|v - \bar{v}|) $$

  • Battery RUL:

$$ \text{RUL}{\text{hours}} \approx \frac{b - b{\text{crit}}}{\left|\frac{db}{dt}\right|} \quad \text{for } \frac{db}{dt} < 0 $$

Built With

  • Languages
    • TypeScript, JavaScript, Python
  • Frontend
    • Next.js 14, React 18, Tailwind CSS, Leaflet.js, Recharts
  • Backend
    • FastAPI, Uvicorn, Pydantic
  • ML
    • Scikit-Learn (IsolationForest), NumPy, SciPy, Joblib, Threadpoolctl
  • Realtime & Transport
    • WebSocket, REST API
  • Notifications
    • Telegram Bot API (Markdown alerts)
  • Platforms
  • Data & Storage
    • In-memory rolling buffers for demo; roadmap:Nginx/HAProxy (load balancing)

Why It Matters

  • Demonstrates complete engineering: frontend, backend, ML, real-time ops.
  • Highly demoable: green-to-red map, health drop, instant phone alert.
  • Practical value: reduces downtime, speeds triage, supports predictive maintenance.
  • Clear path to scale: auth/TLS, distributed state, historical analytics, Open-RMF integration.

Quick Start (Local)

  • Backend: run FastAPI
    python -m uvicorn backend:app --host 0.0.0.0 --port 8000
  • Frontend: run Next.js
    npm run dev then open http://localhost:3000/
  • Optional simulation
    python sim_fleet.py to stream telemetry and trigger live updates

Built With

Share this project:

Updates