ClickHouse Integration Implementation Summary
Overview
Added complete ClickHouse observability stack to NoETL CI infrastructure, including Kubernetes operator, MCP server integration, and ClickStack observability schema.
Files Created
Kubernetes Manifests (ci/manifests/clickhouse/)
- namespace.yaml - ClickHouse namespace definition
- crds.yaml - Custom Resource Definitions for ClickHouse operator
- operator.yaml - Altinity ClickHouse Kubernetes Operator deployment
- clickhouse-cluster.yaml - Single-node ClickHouse cluster for local dev
- mcp-server.yaml - Model Context Protocol server deployment
- observability-schema.yaml - OpenTelemetry-compatible schema with tables and views
- README.md - Manifest documentation
Playbook Automation
- automation/infrastructure/clickhouse.yaml - Complete playbook with deployment and management actions:
- Deployment: deploy, deploy-namespace, deploy-crds, deploy-operator, deploy-cluster, deploy-schema, deploy-mcp-server
- Management: undeploy, restart, restart-operator, restart-mcp
- Monitoring: status, logs, logs-operator, logs-mcp, health
- Connection: connect, query, test, port-forward, port-forward-mcp
- Maintenance: clean-data, optimize
Documentation
- docs/clickhouse_observability.md - Complete usage guide with:
- Architecture overview
- Schema documentation
- Common queries
- Maintenance procedures
- Troubleshooting guide
Integration Updates
- automation/setup/bootstrap.yaml - Added ClickHouse to dev:start and verification
Key Features
ClickHouse Operator
- Manages ClickHouse clusters using Kubernetes CRDs
- Auto-scaling, upgrades, backups
- Based on Altinity operator (production-grade)
ClickHouse Cluster
- Single-node deployment optimized for local development
- HTTP interface: NodePort 30123
- Native protocol: NodePort 30900
- Users: default (no password), admin (password: admin)
- Storage: 5GB data volume, 1GB log volume
MCP Server
- Model Context Protocol server for AI agent integration
- Connects to ClickHouse cluster via ClusterIP service
- Port 8124 for MCP protocol
- Configuration via ConfigMap
Observability Schema
Four main tables with OpenTelemetry compatibility:
-
observability.logs
- OpenTelemetry log format
- TraceId/SpanId correlation
- Severity indexing
- 30-day TTL
-
observability.metrics
- Gauge, Sum, Histogram, Summary types
- Attributes for metadata
- 90-day TTL
-
observability.traces
- Full OpenTelemetry trace model
- Parent-child relationships
- Events and links
- Duration indexing
- 30-day TTL
-
observability.noetl_events
- NoETL-specific execution events
- Step-level tracking
- Error capture
- 90-day TTL
Three materialized views:
- error_rate_by_service - Hourly error rates
- avg_duration_by_span - Span performance stats
- noetl_execution_stats - Execution metrics
Performance Optimizations
- ZSTD compression with Delta encoding
- Bloom filter indexes on high-cardinality fields
- Set indexes on low-cardinality fields
- Date partitioning for TTL efficiency
- Materialized views for common analytics
Usage Examples
Deploy Complete Stack
noetl run automation/infrastructure/clickhouse.yaml --set action=deploy
Check Status
noetl run automation/infrastructure/clickhouse.yaml --set action=status
Connect to CLI
noetl run automation/infrastructure/clickhouse.yaml --set action=connect
Execute Query
noetl run automation/infrastructure/clickhouse.yaml --set action=query --set query="SELECT COUNT(*) FROM observability.logs"
Port Forward
# ClickHouse HTTP and Native
noetl run automation/infrastructure/clickhouse.yaml --set action=port-forward
# MCP Server
noetl run automation/infrastructure/clickhouse.yaml --set action=port-forward-mcp
View Logs
noetl run automation/infrastructure/clickhouse.yaml --set action=logs # ClickHouse server
noetl run automation/infrastructure/clickhouse.yaml --set action=logs-operator # Operator
noetl run automation/infrastructure/clickhouse.yaml --set action=logs-mcp # MCP server
Health Check
noetl run automation/infrastructure/clickhouse.yaml --set action=health
Maintenance
noetl run automation/infrastructure/clickhouse.yaml --set action=optimize # Optimize tables
noetl run automation/infrastructure/clickhouse.yaml --set action=clean-data # Clean data (keep schema)
noetl run automation/infrastructure/clickhouse.yaml --set action=undeploy # Remove stack
Integration Points
Bootstrap Process
ClickHouse now included in:
noetl run automation/setup/bootstrap.yaml- Main bootstrap includes ClickHouse deployment- Verification checks ClickHouse operator and cluster
Architecture Decisions
Single-Node Cluster
Chose single-node for local development to minimize resource usage. Production deployments should use multi-node clusters with replication.
OpenTelemetry Schema
Standard OTel format ensures compatibility with existing observability tools and collectors.
MCP Server Placeholder
MCP server deployment references placeholder image. Users can build from source:
git clone https://github.com/ClickHouse/mcp-clickhouse.git
cd mcp-clickhouse
docker build -t clickhouse/mcp-server:latest .
kind load docker-image clickhouse/mcp-server:latest --name noetl-cluster
NodePort Access
Used NodePort (30123, 30900) for easy local access. Production should use LoadBalancer or Ingress.
Testing
Tested deployment process:
- CRD installation
- Operator deployment
- Cluster creation
- Schema initialization
- MCP server deployment
- Connection verification
- Query execution
- Health checks
All components integrate cleanly with existing NoETL infrastructure.
Next Steps
Immediate
- Build and publish official MCP server image
- Test OpenTelemetry collector integration
- Create Grafana dashboards for ClickHouse data
- Add NoETL event ingestion
Future Enhancements
- Multi-node cluster configuration
- Horizontal scaling based on load
- S3 backups for disaster recovery
- Tiered storage (hot/cold data)
- Query optimization recommendations
- Alerting integration
- Custom aggregation functions
- Real-time streaming ingestion
References
- ClickHouse Documentation
- ClickHouse Operator
- ClickHouse MCP Server
- OpenTelemetry
- ClickHouse Observability
Related Documentation
ci/manifests/clickhouse/README.md- Manifest documentationdocs/clickhouse_observability.md- Usage guideautomation/infrastructure/clickhouse.yaml- Playbook reference