Skip to main content

Observability Services Integration Summary

Overview

Added complete observability stack to NoETL with ClickHouse, Qdrant, and NATS JetStream, with unified activation/deactivation controls.

Files Created (20 total)

ClickHouse (7 files)

  1. ci/manifests/clickhouse/namespace.yaml
  2. ci/manifests/clickhouse/crds.yaml
  3. ci/manifests/clickhouse/operator.yaml
  4. ci/manifests/clickhouse/clickhouse-cluster.yaml
  5. ci/manifests/clickhouse/mcp-server.yaml
  6. ci/manifests/clickhouse/observability-schema.yaml
  7. ci/manifests/clickhouse/README.md

Qdrant (3 files)

  1. ci/manifests/qdrant/namespace.yaml
  2. ci/manifests/qdrant/qdrant.yaml
  3. ci/manifests/qdrant/README.md

NATS (3 files)

  1. ci/manifests/nats/namespace.yaml
  2. ci/manifests/nats/nats.yaml
  3. ci/manifests/nats/README.md

Playbooks (4 files)

  1. automation/infrastructure/clickhouse.yaml - ClickHouse deployment and management
  2. automation/infrastructure/qdrant.yaml - Qdrant deployment and management
  3. automation/infrastructure/nats.yaml - NATS deployment and management
  4. automation/infrastructure/observability.yaml - Unified control playbook

Documentation (3 files)

  1. docs/observability_services.md - Complete guide
  2. docs/clickhouse_observability.md - ClickHouse usage guide
  3. docs/clickhouse_integration_summary.md - Implementation summary

Integration Updates

Bootstrap Playbook (automation/setup/bootstrap.yaml)

  • Updated verification to check all observability services
  • Activates all services and displays all ports

Services

ClickHouse

Purpose: OLAP database for logs, metrics, traces

Access:

  • HTTP: localhost:30123 (NodePort)
  • Native: localhost:30900 (NodePort)
  • MCP: localhost:8124

Features:

  • OpenTelemetry schema (logs, metrics, traces, noetl_events)
  • Materialized views for analytics
  • ZSTD compression, bloom filter indexes
  • 30-90 day TTL policies

Resources: 512Mi-2Gi memory, 500m-2000m CPU, 6Gi storage

Qdrant

Purpose: Vector database for embeddings and semantic search

Access:

  • HTTP: localhost:30633 (NodePort)
  • gRPC: localhost:30634 (NodePort)

Features:

  • Vector similarity search
  • Extended filtering
  • On-disk payload storage
  • Collection-based organization

Resources: 512Mi-2Gi memory, 250m-1000m CPU, 5Gi storage

NATS JetStream

Purpose: Messaging and key-value store

Access:

  • Client: localhost:30422 (NodePort)
  • Monitoring: localhost:30822 (NodePort)

Features:

  • Stream persistence (5GB)
  • Key-value store
  • Message replay
  • Consumer groups
  • Default credentials: noetl/noetl

Resources: 512Mi-2Gi memory, 250m-1000m CPU, 5Gi storage

Playbook Usage

Unified Control

# Activate all services
noetl run automation/infrastructure/observability.yaml --set action=activate-all

# Deactivate all services
noetl run automation/infrastructure/observability.yaml --set action=deactivate-all

# Status check
noetl run automation/infrastructure/observability.yaml --set action=status-all

# Health check
noetl run automation/infrastructure/observability.yaml --set action=health-all

# Restart all
noetl run automation/infrastructure/observability.yaml --set action=restart-all

Individual Service Control

# ClickHouse
noetl run automation/infrastructure/clickhouse.yaml --set action=deploy
noetl run automation/infrastructure/clickhouse.yaml --set action=status
noetl run automation/infrastructure/clickhouse.yaml --set action=connect
noetl run automation/infrastructure/clickhouse.yaml --set action=query --set query="SELECT 1"
noetl run automation/infrastructure/clickhouse.yaml --set action=health
noetl run automation/infrastructure/clickhouse.yaml --set action=logs

# Qdrant
noetl run automation/infrastructure/qdrant.yaml --set action=deploy
noetl run automation/infrastructure/qdrant.yaml --set action=status
noetl run automation/infrastructure/qdrant.yaml --set action=health
noetl run automation/infrastructure/qdrant.yaml --set action=collections
noetl run automation/infrastructure/qdrant.yaml --set action=logs

# NATS
noetl run automation/infrastructure/nats.yaml --set action=deploy
noetl run automation/infrastructure/nats.yaml --set action=status
noetl run automation/infrastructure/nats.yaml --set action=health
noetl run automation/infrastructure/nats.yaml --set action=streams
noetl run automation/infrastructure/nats.yaml --set action=logs

Port Forwarding

noetl run automation/infrastructure/clickhouse.yaml --set action=port-forward    # HTTP:8123, Native:9000
noetl run automation/infrastructure/clickhouse.yaml --set action=port-forward-mcp # MCP:8124
noetl run automation/infrastructure/qdrant.yaml --set action=port-forward # HTTP:6333, gRPC:6334
noetl run automation/infrastructure/nats.yaml --set action=port-forward # Client:4222, Monitoring:8222

Bootstrap Integration

Automatic Deployment

All services automatically deploy with:

noetl run automation/setup/bootstrap.yaml

Service Endpoints

After bootstrap, shows all endpoints:

NoETL started!
UI: http://localhost:8083
Grafana: kubectl port-forward -n vmstack svc/vmstack-grafana 3000:80
ClickHouse HTTP: localhost:30123 (NodePort)
ClickHouse Native: localhost:30900 (NodePort)
Qdrant HTTP: localhost:30633 (NodePort)
Qdrant gRPC: localhost:30634 (NodePort)
NATS Client: localhost:30422 (NodePort)
NATS Monitoring: localhost:30822 (NodePort)

Total Resources

Memory: ~3.5-6Gi (1.5-2Gi per service) CPU: ~2-4 cores (~1 core per service) Storage: ~16Gi (ClickHouse 6Gi, Qdrant 5Gi, NATS 5Gi)

Quick Reference

Health Checks

# All services
noetl run automation/infrastructure/observability.yaml --set action=health-all

# Individual
curl http://localhost:30123 # ClickHouse HTTP
clickhouse-client --host localhost --port 30900 # ClickHouse Native
curl http://localhost:30633 # Qdrant HTTP
curl http://localhost:30822/healthz # NATS Monitoring

Logs

noetl run automation/infrastructure/clickhouse.yaml --set action=logs
noetl run automation/infrastructure/qdrant.yaml --set action=logs
noetl run automation/infrastructure/nats.yaml --set action=logs

Restart

noetl run automation/infrastructure/observability.yaml --set action=restart-all
# or individual:
noetl run automation/infrastructure/clickhouse.yaml --set action=restart
noetl run automation/infrastructure/qdrant.yaml --set action=restart
noetl run automation/infrastructure/nats.yaml --set action=restart

API Examples

ClickHouse Query

noetl run automation/infrastructure/clickhouse.yaml --set action=query --set query="SELECT COUNT(*) FROM observability.logs"

Qdrant Create Collection

curl -X PUT "http://localhost:30633/collections/embeddings" \
-H "Content-Type: application/json" \
-d '{"vectors": {"size": 384, "distance": "Cosine"}}'

NATS Publish

nats -s nats://noetl:noetl@localhost:30422 pub events.test "message"

Documentation References

  • docs/observability_services.md - Complete guide with examples
  • docs/clickhouse_observability.md - ClickHouse detailed usage
  • ci/manifests/clickhouse/README.md - ClickHouse manifests
  • ci/manifests/qdrant/README.md - Qdrant manifests
  • ci/manifests/nats/README.md - NATS manifests

Testing

Verify deployment:

# Deploy all
noetl run automation/infrastructure/observability.yaml --set action=activate-all

# Check status
noetl run automation/infrastructure/observability.yaml --set action=status-all

# Health check
noetl run automation/infrastructure/observability.yaml --set action=health-all

# Test ClickHouse
noetl run automation/infrastructure/clickhouse.yaml --set action=test

# Test Qdrant
noetl run automation/infrastructure/qdrant.yaml --set action=test

# Test NATS
noetl run automation/infrastructure/nats.yaml --set action=test

Cleanup

Remove all services:

noetl run automation/infrastructure/observability.yaml --set action=deactivate-all

Next Steps

  1. Configure OpenTelemetry collector to send data to ClickHouse
  2. Create Grafana dashboards for observability data
  3. Implement vector embedding pipeline for Qdrant
  4. Set up NATS streams for NoETL events
  5. Add automated backup procedures
  6. Configure production-grade cluster settings