NoETL Metrics Table Schema

Overview

The noetl.metric table has been added to the NoETL PostgreSQL schema to centralize metrics collection from both workers and servers. The live implementation uses a partitioned parent table (daily partitions by created_at) with a 1‑day default TTL and a fast cleanup routine that drops expired partitions. This provides a unified storage location for all NoETL component metrics with efficient retention management before potential migration to dedicated time-series storage.

Table Structure (legacy non-partitioned example)

CREATE TABLE IF NOT EXISTS noetl.metric (
    metric_id BIGSERIAL PRIMARY KEY,
    runtime_id BIGINT NOT NULL REFERENCES noetl.runtime(runtime_id) ON DELETE CASCADE,
    metric_name TEXT NOT NULL,
    metric_type TEXT NOT NULL CHECK (metric_type IN ('counter', 'gauge', 'histogram', 'summary')),
    metric_value DOUBLE PRECISION NOT NULL,
    labels JSONB,
    help_text TEXT,
    unit TEXT,
    timestamp TIMESTAMPTZ NOT NULL DEFAULT now(),
    created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

Schema Design

Key Fields

metric_id: Auto-incrementing primary key
runtime_id: Foreign key to noetl.runtime table (links metrics to registered components)
metric_name: Name of the metric (e.g., noetl_jobs_processed_total, cpu_usage_percent)
metric_type: Prometheus-compatible metric types:
- counter: Monotonically increasing values
- gauge: Values that can go up or down
- histogram: Distribution of values
- summary: Similar to histogram with quantiles
metric_value: Numeric value of the metric
labels: JSONB field for metric labels/dimensions (e.g., {"worker_pool": "cpu-01", "job_type": "http"})
help_text: Description of what the metric measures
unit: Metric unit (e.g., bytes, seconds, requests)
timestamp: When the metric was recorded
created_at: When the record was inserted

Indexes

Primary: metric_id
Runtime lookup: idx_metrics_runtime_id
Metric name queries: idx_metrics_name
Time-series queries: idx_metrics_timestamp
Combined queries: idx_metrics_runtime_name (runtime_id + metric_name)
Label searches: idx_metrics_labels (GIN index on JSONB)

Integration with Runtime Table

The metrics table leverages the existing noetl.runtime table as a service catalog:

Runtime Table Structure

-- Runtime table serves as component registry
CREATE TABLE noetl.runtime (
    runtime_id BIGINT PRIMARY KEY,
    name TEXT NOT NULL,                    -- Component name
    component_type TEXT NOT NULL,          -- 'worker_pool', 'server_api', 'broker'
    base_url TEXT,                         -- API endpoint
    status TEXT NOT NULL,                  -- 'ready', 'busy', 'offline'
    labels JSONB,                          -- Component labels
    capabilities JSONB,                    -- What the component can do
    capacity INTEGER,                      -- Max concurrent jobs
    runtime JSONB,                         -- Runtime info (type, version, etc.)
    last_heartbeat TIMESTAMPTZ,            -- Last activity
    created_at TIMESTAMPTZ,
    updated_at TIMESTAMPTZ
);

Relationship

Each metric record references a component in the runtime table
Workers and servers register in runtime table, then report metrics
Cascade delete ensures metrics are cleaned up when components are removed

Usage Patterns

Workers Reporting Metrics

Worker registers in runtime table during startup
Worker periodically reports metrics to server API
Server inserts metrics into metric table with worker's runtime_id

Server Self-Reporting

Server has its own entry in runtime table
Server reports its own metrics (API requests, system resources)
Server inserts its metrics with its own runtime_id

API Endpoints (To Implement)

POST /api/metrics - Bulk insert metrics from workers/servers
GET /api/metrics - Query metrics with filtering
GET /api/runtime/{runtime_id}/metrics - Get metrics for specific component

Example Metrics

Worker Metrics

{
  "runtime_id": 123,
  "metric_name": "noetl_jobs_processed_total",
  "metric_type": "counter", 
  "metric_value": 45,
  "labels": {"worker_pool": "cpu-01", "status": "completed"},
  "help_text": "Total number of jobs processed by worker",
  "unit": "jobs"
}

Server Metrics

{
  "runtime_id": 124,
  "metric_name": "noetl_api_requests_per_second",
  "metric_type": "gauge",
  "metric_value": 12.5,
  "labels": {"endpoint": "/api/queue/lease", "method": "POST"},
  "help_text": "Current API request rate",
  "unit": "req/sec"
}

System Metrics

{
  "runtime_id": 123,
  "metric_name": "system_cpu_usage_percent", 
  "metric_type": "gauge",
  "metric_value": 65.2,
  "labels": {"core": "all"},
  "help_text": "CPU utilization percentage",
  "unit": "percent"
}

Migration Path

Current State: PostgreSQL Storage

All metrics stored in noetl.metric parent table with daily partitions
Good for development and small/medium deployments
Leverages existing PostgreSQL infrastructure

Future State: Time-Series Database

Export metrics to VictoriaMetrics/Prometheus
Keep PostgreSQL for short-term retention
Use dedicated TSDB for long-term storage and advanced queries

Migration Strategy

Phase 1: Implement metrics collection in PostgreSQL
Phase 2: Add metrics export to VictoriaMetrics
Phase 3: Implement retention policies (keep recent in PG, historical in TSDB)
Phase 4: Optional migration to pure TSDB setup

Benefits

Centralized Collection

Single point for all NoETL component metrics
Consistent schema across workers and servers
Simplified monitoring setup

Flexible Storage

JSONB labels support arbitrary dimensions
Easy to query with SQL
Compatible with existing PostgreSQL tooling

Integration Ready

Links to existing runtime registry
Supports current heartbeat/registration flow
Easy to extend with new metric types

Migration Friendly

Can export to Prometheus format
Structured for time-series analysis
Retention policies can be implemented

Implementation Notes

Bulk Inserts: Use batch inserts for performance
Retention: Implement cleanup jobs for old metrics
Partitioning: Consider table partitioning by timestamp for large datasets
Monitoring: Monitor the metrics table size and performance
Export: Implement Prometheus export endpoint for existing monitoring tools

Partitioning and TTL Cleanup (current implementation)

Parent table: noetl.metric partitioned by RANGE on created_at with 1-day intervals
Default TTL: records expire after 1 day via expires_at default; retention is enforced by dropping whole partitions older than 1 day
Benefits: fast cleanup via DROP TABLE, immediate disk space reclaim, better query performance via partition pruning

Database functions

noetl.initialize_metric_partitions()
- Creates initial set of partitions (yesterday, today, next 7 days)
noetl.create_metric_partition(partition_date date)
- Creates a single daily partition for the supplied date
noetl.create_metric_partitions_ahead(days_ahead int)
- Creates daily partitions from tomorrow up to N days ahead
noetl.cleanup_expired_metrics() returns text[]
- Drops partitions older than 1 day; returns a list of dropped partition table names

Example (Python async call):

dcd . && .venv/bin/python -c "
import asyncio
from noetl.core.common import get_async_db_connection

async def test_cleanup():
    async with get_async_db_connection() as conn:
        async with conn.cursor() as cur:
            await cur.execute('SELECT noetl.cleanup_expired_metrics()')
            result = await cur.fetchone()
            print('Dropped partitions:', (result or [None])[0])

asyncio.run(test_cleanup())
"

API endpoints

POST /api/metrics/cleanup
- Triggers cleanup (partition drops) and returns the list of dropped partitions
- Example:
```
curl -s -X POST http://localhost:30082/api/metrics/cleanup | jq
```
POST /api/metrics/partitions/create?days_ahead=3
- Creates partitions from tomorrow up to 3 days ahead
- Example:
```
curl -s -X POST "http://localhost:30082/api/metrics/partitions/create?days_ahead=3" | jq
```

Notes

Cleanup typically drops nothing until at least one full day has elapsed since initial partitions were created/populated.
You can safely call partition creation multiple times; it will create missing partitions idempotently.
The legacy content above shows a non-partitioned example schema; the live implementation is partitioned with 1-day TTL.

Overview​

Table Structure (legacy non-partitioned example)​

Schema Design​

Key Fields​

Indexes​

Integration with Runtime Table​

Runtime Table Structure​

Relationship​

Usage Patterns​

Workers Reporting Metrics​

Server Self-Reporting​

API Endpoints (To Implement)​

Example Metrics​

Worker Metrics​

Server Metrics​

System Metrics​

Migration Path​

Current State: PostgreSQL Storage​

Future State: Time-Series Database​

Migration Strategy​

Benefits​

Centralized Collection​

Flexible Storage​

Integration Ready​

Migration Friendly​

Implementation Notes​

Partitioning and TTL Cleanup (current implementation)​

Database functions​

API endpoints​

Notes​