NoETL Metrics Table Schema
Overview
The noetl.metric table has been added to the NoETL PostgreSQL schema to centralize metrics collection from both workers and servers. The live implementation uses a partitioned parent table (daily partitions by created_at) with a 1‑day default TTL and a fast cleanup routine that drops expired partitions. This provides a unified storage location for all NoETL component metrics with efficient retention management before potential migration to dedicated time-series storage.
Table Structure (legacy non-partitioned example)
CREATE TABLE IF NOT EXISTS noetl.metric (
metric_id BIGSERIAL PRIMARY KEY,
runtime_id BIGINT NOT NULL REFERENCES noetl.runtime(runtime_id) ON DELETE CASCADE,
metric_name TEXT NOT NULL,
metric_type TEXT NOT NULL CHECK (metric_type IN ('counter', 'gauge', 'histogram', 'summary')),
metric_value DOUBLE PRECISION NOT NULL,
labels JSONB,
help_text TEXT,
unit TEXT,
timestamp TIMESTAMPTZ NOT NULL DEFAULT now(),
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
Schema Design
Key Fields
- metric_id: Auto-incrementing primary key
- runtime_id: Foreign key to
noetl.runtimetable (links metrics to registered components) - metric_name: Name of the metric (e.g.,
noetl_jobs_processed_total,cpu_usage_percent) - metric_type: Prometheus-compatible metric types:
counter: Monotonically increasing valuesgauge: Values that can go up or downhistogram: Distribution of valuessummary: Similar to histogram with quantiles
- metric_value: Numeric value of the metric
- labels: JSONB field for metric labels/dimensions (e.g.,
{"worker_pool": "cpu-01", "job_type": "http"}) - help_text: Description of what the metric measures
- unit: Metric unit (e.g.,
bytes,seconds,requests) - timestamp: When the metric was recorded
- created_at: When the record was inserted
Indexes
- Primary:
metric_id - Runtime lookup:
idx_metrics_runtime_id - Metric name queries:
idx_metrics_name - Time-series queries:
idx_metrics_timestamp - Combined queries:
idx_metrics_runtime_name(runtime_id + metric_name) - Label searches:
idx_metrics_labels(GIN index on JSONB)
Integration with Runtime Table
The metrics table leverages the existing noetl.runtime table as a service catalog:
Runtime Table Structure
-- Runtime table serves as component registry
CREATE TABLE noetl.runtime (
runtime_id BIGINT PRIMARY KEY,
name TEXT NOT NULL, -- Component name
component_type TEXT NOT NULL, -- 'worker_pool', 'server_api', 'broker'
base_url TEXT, -- API endpoint
status TEXT NOT NULL, -- 'ready', 'busy', 'offline'
labels JSONB, -- Component labels
capabilities JSONB, -- What the component can do
capacity INTEGER, -- Max concurrent jobs
runtime JSONB, -- Runtime info (type, version, etc.)
last_heartbeat TIMESTAMPTZ, -- Last activity
created_at TIMESTAMPTZ,
updated_at TIMESTAMPTZ
);
Relationship
- Each metric record references a component in the runtime table
- Workers and servers register in runtime table, then report metrics
- Cascade delete ensures metrics are cleaned up when components are removed
Usage Patterns
Workers Reporting Metrics
- Worker registers in
runtimetable during startup - Worker periodically reports metrics to server API
- Server inserts metrics into
metrictable with worker'sruntime_id
Server Self-Reporting
- Server has its own entry in
runtimetable - Server reports its own metrics (API requests, system resources)
- Server inserts its metrics with its own
runtime_id
API Endpoints (To Implement)
POST /api/metrics- Bulk insert metrics from workers/serversGET /api/metrics- Query metrics with filteringGET /api/runtime/{runtime_id}/metrics- Get metrics for specific component
Example Metrics
Worker Metrics
{
"runtime_id": 123,
"metric_name": "noetl_jobs_processed_total",
"metric_type": "counter",
"metric_value": 45,
"labels": {"worker_pool": "cpu-01", "status": "completed"},
"help_text": "Total number of jobs processed by worker",
"unit": "jobs"
}
Server Metrics
{
"runtime_id": 124,
"metric_name": "noetl_api_requests_per_second",
"metric_type": "gauge",
"metric_value": 12.5,
"labels": {"endpoint": "/api/queue/lease", "method": "POST"},
"help_text": "Current API request rate",
"unit": "req/sec"
}
System Metrics
{
"runtime_id": 123,
"metric_name": "system_cpu_usage_percent",
"metric_type": "gauge",
"metric_value": 65.2,
"labels": {"core": "all"},
"help_text": "CPU utilization percentage",
"unit": "percent"
}
Migration Path
Current State: PostgreSQL Storage
- All metrics stored in
noetl.metricparent table with daily partitions - Good for development and small/medium deployments
- Leverages existing PostgreSQL infrastructure
Future State: Time-Series Database
- Export metrics to VictoriaMetrics/Prometheus
- Keep PostgreSQL for short-term retention
- Use dedicated TSDB for long-term storage and advanced queries
Migration Strategy
- Phase 1: Implement metrics collection in PostgreSQL
- Phase 2: Add metrics export to VictoriaMetrics
- Phase 3: Implement retention policies (keep recent in PG, historical in TSDB)
- Phase 4: Optional migration to pure TSDB setup
Benefits
Centralized Collection
- Single point for all NoETL component metrics
- Consistent schema across workers and servers
- Simplified monitoring setup
Flexible Storage
- JSONB labels support arbitrary dimensions
- Easy to query with SQL
- Compatible with existing PostgreSQL tooling
Integration Ready
- Links to existing runtime registry
- Supports current heartbeat/registration flow
- Easy to extend with new metric types
Migration Friendly
- Can export to Prometheus format
- Structured for time-series analysis
- Retention policies can be implemented
Implementation Notes
- Bulk Inserts: Use batch inserts for performance
- Retention: Implement cleanup jobs for old metrics
- Partitioning: Consider table partitioning by timestamp for large datasets
- Monitoring: Monitor the metrics table size and performance
- Export: Implement Prometheus export endpoint for existing monitoring tools
Partitioning and TTL Cleanup (current implementation)
- Parent table:
noetl.metricpartitioned by RANGE oncreated_atwith 1-day intervals - Default TTL: records expire after 1 day via
expires_atdefault; retention is enforced by dropping whole partitions older than 1 day - Benefits: fast cleanup via DROP TABLE, immediate disk space reclaim, better query performance via partition pruning
Database functions
noetl.initialize_metric_partitions()- Creates initial set of partitions (yesterday, today, next 7 days)
noetl.create_metric_partition(partition_date date)- Creates a single daily partition for the supplied date
noetl.create_metric_partitions_ahead(days_ahead int)- Creates daily partitions from tomorrow up to N days ahead
noetl.cleanup_expired_metrics()returns text[]- Drops partitions older than 1 day; returns a list of dropped partition table names
Example (Python async call):
dcd . && .venv/bin/python -c "
import asyncio
from noetl.core.common import get_async_db_connection
async def test_cleanup():
async with get_async_db_connection() as conn:
async with conn.cursor() as cur:
await cur.execute('SELECT noetl.cleanup_expired_metrics()')
result = await cur.fetchone()
print('Dropped partitions:', (result or [None])[0])
asyncio.run(test_cleanup())
"
API endpoints
-
POST
/api/metrics/cleanup- Triggers cleanup (partition drops) and returns the list of dropped partitions
- Example:
curl -s -X POST http://localhost:30082/api/metrics/cleanup | jq
-
POST
/api/metrics/partitions/create?days_ahead=3- Creates partitions from tomorrow up to 3 days ahead
- Example:
curl -s -X POST "http://localhost:30082/api/metrics/partitions/create?days_ahead=3" | jq
Notes
- Cleanup typically drops nothing until at least one full day has elapsed since initial partitions were created/populated.
- You can safely call partition creation multiple times; it will create missing partitions idempotently.
- The legacy content above shows a non-partitioned example schema; the live implementation is partitioned with 1-day TTL.