Semantic Execution Pipeline
NoETL includes an AI-powered semantic pipeline that enables intelligent analysis of workflow executions using embeddings, vector search, and LLM reasoning.
Pipeline Overview
┌──────────────────────────────┐
│ Business User UI │
│ (GraphQL/Gateway/API Client) │
└───────────────┬──────────────┘
│
▼
┌──────────────────────┐
│ NoETL Server │
_________________│ - Validates playbook │
| │ - Creates workload │
| │ - Publishes commands │
| └─────────────┬────────┘
│(NATS JetStream) |
| |
┌────────▼──────────────────┐ ┌-─────▼─────────────────────┐
│ JetStream │ │ NoETL Workers │
│ │ │ - Pull commands │
│ - NOETL_COMMANDS Stream │◀────│ - Run tools / tasks │
└──────────────────────────-┘ │ - Emit results + logs │
└──────────-┬────────────────┘
(event log messages)│
│
┌─────────────────────────────┐ │
│ NoETL Server Event Handler │◀─────────────┘
│ - Collects events |
| - Execution Events │
│ - Normalizes + indexes │
│ - Stores metadata │
└──────────────┬──────────────┘
│
▼
┌──────────────────────────────────────────────────┐
│ Embedding + Semantic Layer │
│--------------------------------------------------│
│ │
│ 1. Convert events/workloads/logs to embeddings │
│ using local/OpenAI embedding models │
│ │
│ 2. Store vectors in Qdrant (vector database) │
│ - Similar executions │
│ - Error clusters │
│ - Semantic search index │
│ │
└───────────────┬──────────────────────────────────┘
│
▼
┌────────────────────────────────────────┐
│ Qdrant Vector DB │
│ - HNSW vector search │
│ - Top-K nearest neighbors │
│ - Semantic relevance ranking │
└───────────────┬────────────────────────┘
│ (retrieved context)
▼
┌────────────────────────────┐
│ LLM │
│ (OpenAI / Local Model) │
│----------------------------│
│ - Root-cause analysis │
│ - Explain execution flows │
│ - Recommend next actions │
│ - Optimize retries/loops │
│ - Generate workflow steps │
└───────────────┬────────────┘
│
▼
┌───────────────────────────┐
│ Insights / AI Assistant │
│ - Why did this fail? │
│ - Show similar workflows │
│ - Predict bottlenecks │
│ - Recommend improvements │
└───────────────────────────┘
Components
1. NoETL Server
- Validates playbooks against schema
- Creates workload instances
- Publishes commands to NATS JetStream
2. NoETL Workers
- Pull tasks from
NOETL_COMMANDSstream - Execute tasks (python, http, postgres, etc.)
- Emit detailed events back to server
3. Event Processor
- Normalizes events (
task_start,task_end,error,retries) - Builds structured execution traces
- Indexes metadata for search
4. Embedding Pipeline
For each execution event:
- Extract message text, error descriptions, metadata
- Convert to embedding vectors using embedding models
- Store vectors in Qdrant with metadata reference
5. Semantic Search (Qdrant)
Enables intelligent queries:
- Find similar failures
- Cluster executions by behavior
- Show similar playbooks
- Detect anomalies
6. LLM Reasoning Layer
Retrieves top-k relevant context from Qdrant and produces:
- Explanations: Why did this step fail?
- Recommendations: Fix missing credential, increase batch size
- Optimization: Parallelize steps X and Y
- Auto-generation: Suggested retry logic adjustments
Use Cases
Root Cause Analysis
When a workflow fails:
- Embed the error message and context
- Search for similar past failures
- LLM analyzes patterns and suggests fixes
Workflow Optimization
Based on execution history:
- Identify slow steps across executions
- Find similar successful workflows
- Recommend parallelization or batching
Anomaly Detection
Monitor for unusual patterns:
- Embed execution metrics
- Detect outliers in vector space
- Alert on significant deviations
Infrastructure
Qdrant Vector Database
- HTTP API: http://localhost:30633
- gRPC: localhost:30634
- Storage: 5GB default allocation
Deployment
# Deploy observability stack (includes Qdrant)
task observability:activate-all
# Check status
task observability:status-all
See Also
- Observability Services - Full observability stack
- ClickHouse Integration - Analytics database
- Architecture - System overview