NoETL Global Cloud Hybrid Supercluster Blueprint
Catalog-driven, serverless-first, sharded AWS/GCP/Azure/datacenter architecture for NoETL business operating system and quantum cloud workloads
Draft: 2026-05-29 | Owner: NoETL
| Field | Value |
|---|---|
| Document purpose | Blueprint for building NoETL as a global, multitenant, regionalized cloud computation platform. |
| Primary audience | NoETL maintainers, platform engineers, data engineers, security architects, and early design partners. |
| Scope | Event log lifecycle, NATS shard model, sharded Postgres projections, object-store archive, global resource location, multitenancy, sharing, scheduling, and development roadmap. |
| Status | Architecture draft for implementation planning. |
Table of Contents
- 1. Executive Summary
- 2. Design Goals and Non-Goals
- 3. Target Architecture
- 4. Regional Cell and Shard Model
- 5. Event Log Lifecycle
- 6. Projection and Query Model
- 7. Object Store Archive and Result Bundles
- 8. NoETL Resource Locator
- 9. Resource Catalog and Resource Registry
- 10. Multitenancy, Isolation, and Sharing
- 11. Cross-Region Computation and Data Locality
- 12. Control Plane Services
- 13. Runtime Protocols and Data Contracts
- 14. Global Runtime Fabric: Cloudflare, NATS Supercluster, KEDA, and Arrow
Distributed Runtime Spec Alignment
The global cloud blueprint is aligned with the NoETL Distributed Runtime + Event-Sourced Shared Memory Spec. That spec defines event-sourced system state, worker-side loop batching, Arrow IPC shared-memory data-plane hints, decentralized projection, and cloud-agnostic event, payload, and projection store abstractions. The blueprint extends those runtime contracts into a multi-cloud and hybrid regional-cell operating model.
Important correction to the earlier planning snapshot: the public distributed-runtime spec now records phases 0-6 as complete for the v2 distributed runtime close-out. The architecture should therefore use the phase table as the implementation baseline, not as a future-only plan.
| Runtime spec area | Blueprint interpretation |
|---|---|
| Event-sourced state | The active log remains authoritative for replay; completed execution bundles move to object storage with manifest, checksum, payload references, and replay validation evidence. |
| Frame-shaped cursor loops | Workers claim frame-sized units of work, preserve stage/frame lifecycle events, and let the planner fan out/reduce work without losing deterministic replay. |
| Transactional outbox | Event writers append canonical events and enqueue distribution envelopes in the same transaction; publishers drain the outbox to the configured fabric. |
| Arrow IPC Tier 1.5 | Frame outputs may use Arrow/Feather bytes for low-serialization movement while durable payload_ref/locator remains mandatory for replay after cache eviction. |
| Replay ports | ReplayEventReader and ReplayPayloadResolver keep replay independent of Postgres, object-store brand, or message-fabric choice. |
| Topology metadata | Worker locality, region, zone, cluster, node, worker locator, source locality, and placement evaluation become replayable metadata, not scheduler side effects. |
Replaceable Event and Queue Fabric
NATS JetStream and NATS supercluster should remain the reference implementation because they support the low-latency active event fabric, durable consumers, account isolation, and gateway-meshed regional clusters. However, NoETL must treat the fabric as a port, not as a permanent dependency. The queue/event-fabric adapter must be replaceable by Kafka, Redis Streams, or cloud-native services when customer cost, compliance, or existing infrastructure requires it.
| Fabric option | Best use in NoETL | Notes / guardrails |
|---|---|---|
| NATS JetStream / Supercluster | Reference active event fabric, command queue, projector notifications, edge/datacenter federation. | Use as default for NoETL-owned regional cells and low-latency supercluster operation. |
| Apache Kafka / Redpanda | Enterprise event log and high-throughput stream processing where Kafka already exists. | Good for large enterprises; keep EventStore/CommandQueue ports clean to avoid Kafka-specific semantics leaking into workflow logic. |
| Redis Streams | Lightweight local queue, edge cell queue, dev/test, or short-lived burst buffer. | Do not rely on Redis as the long-term audit archive; persist canonical events/payloads elsewhere. |
| Google Pub/Sub | GCP-native command/fanout adapter and serverless trigger source. | Use for delivery and recent replay; keep canonical event archive in Postgres/object storage or another durable event store. |
| AWS SQS/SNS/EventBridge | AWS-native serverless command queue, event routing, and integration bus. | Use SQS for work queues, SNS/EventBridge for routing and integrations; archive completed execution state to S3. |
| Azure Event Hubs / Service Bus / Event Grid | Azure-native streaming, command messaging, and event notification. | Use Event Hubs for stream ingestion, Service Bus for commands, Event Grid for notifications; archive to Azure Blob or ADLS. |
| Cloudflare Queues / Workers / Durable Objects | Global edge ingress buffering, lightweight edge coordination, customer-facing entrypoint logic. | Useful at the edge; not the primary replay authority for scientific/regulated execution history. |
Cloud-Run-First Compute Plane
NoETL should prefer serverless container compute whenever the workload can be expressed as a stateless or bounded-duration container invocation. Kubernetes remains important, but it should not be the only compute substrate. A cost-effective NoETL global cloud should route simple, bursty, and tenant-isolated workloads to Cloud Run-style services, and use Kubernetes/KEDA only when the workload needs long-lived workers, custom networking, GPU/TPU/QPU adjacency, local shared memory, on-prem connectivity, or supercluster-managed stateful services.
| Compute substrate | NoETL use case | When to choose it |
|---|---|---|
| Google Cloud Run services/jobs | HTTP task endpoints, batch jobs, bursty workers, project-scoped isolated compute. | Default on GCP for short-lived CPU tasks and cost-controlled tenant jobs. |
| AWS Lambda / ECS Fargate / AWS Batch / App Runner | Serverless functions, containerized task workers, batch pools, public API workers. | Default on AWS when avoiding persistent EKS cost or when customers already standardize on AWS serverless. |
| Azure Container Apps / Azure Functions / Azure Batch | Serverless containers, event-driven jobs, enterprise Azure workloads. | Default on Azure when jobs can scale to zero and do not require a dedicated AKS cluster. |
| Kubernetes + KEDA | Durable worker pools, projectors, NATS/Postgres sidecars, GPU/TPU-adjacent jobs, on-prem cells, regulated environments. | Choose when workers need persistent pull consumers, local cache/shared memory, strict topology, or datacenter deployment. |
| Slurm / Ray / Dask / Spark / HPC scheduler adapters | University lab clusters, HPC, distributed simulations, large fanout/reduce jobs. | Use as external compute providers behind NoETL adapters; return result manifests through the Resource Registry. |
| Quantum provider / QPU adapters | Quantum circuit execution, hybrid classical/quantum loops, lab instrument orchestration. | Use NoETL tasks as orchestration envelopes around provider APIs or on-prem QPU gateways; store inputs/results as immutable resources. |
Multi-Store Projection, Analytics, and Payload Options
A cross-region NoETL business operating system should not require one database to serve every purpose. The control plane should define ports for EventStore, ProjectionStore, PayloadStore, ResourceCatalog, VectorStore, AnalyticsStore, and SearchStore. A tenant or regional cell may use several of these together. The Resource Registry records which resource lives where, which store owns it, which policy applies, and which compute plane can process it.
| Store / engine | Role in NoETL | Cross-region supercomputer usage |
|---|---|---|
| Postgres / Citus / Aurora PostgreSQL / Cloud SQL / AlloyDB / Azure Database for PostgreSQL | Authoritative catalog, active projections, control-plane metadata, tenant grants, replay indexes. | Default relational control store and shard-local projection store. |
| ClickHouse | High-volume analytical projections, telemetry, runtime observability, cost/accounting facts, event analytics. | Global analytics plane for execution metrics, shard health, tenant cost, frame throughput, and large result summaries. |
| Object storage: S3 / GCS / Azure Blob / Ceph / SeaweedFS (NOT MinIO — see § 7 policy) | PayloadStore and cold archive for completed executions, Arrow/Parquet/Iceberg data files, validation bundles. | Primary cross-region data substrate with sharded bucket naming and immutable manifests. |
| Apache Iceberg + Polaris / Gravitino | Open table format and catalog layer for tabular scientific and enterprise datasets. | Allows NoETL-generated datasets to be consumed by warehouses, Spark, Trino, Flink, Databricks, and notebooks. |
| BigQuery / Redshift / Synapse / Microsoft Fabric / Snowflake / Databricks SQL | Customer-native warehouse projections and managed analytics backends. | Use as optional projection targets, not as the only runtime catalog. |
| DuckDB | Local/vectorized post-processing, test fixtures, small/medium result packaging, notebook workflows. | Useful inside workers and archive validators; not a global coordination store. |
| Qdrant / pgvector / OpenSearch / Elasticsearch | Embedding, vector, search, semantic index, and document retrieval projections. | Supports AI-native retrieval across execution archives, lab artifacts, models, and documentation. |
| Redis / Memcached / NATS KV/Object | Hot cache, short-lived lookup cache, edge cache, coordination hints. | Cache only; replay and historical correctness must come from durable event/payload stores. |
ClickHouse in Cross-Region Supercomputer Mode
ClickHouse is a strong optional component for the global analytics plane. It should not replace the Resource Registry or canonical replay authority, but it can become the high-throughput query engine for runtime facts and business operating system dashboards.
Recommended ClickHouse projection domains:
- execution_fact: one row per execution lifecycle transition, with tenant, project, region, shard, status, duration, cost, and resource counters.
- frame_fact: one row per frame lifecycle or aggregated frame batch, with stage, worker locality, rows processed, payload shape, and retry information.
- resource_fact: resource creation, replication, archive, access, grant, and checksum verification events.
- compute_fact: CPU/GPU/TPU/QPU allocation, provider, region, queue wait, runtime, and cost metadata.
- tenant_cost_fact: per-tenant chargeback/showback across compute, storage, egress, and managed service usage.
- supercluster_health_fact: regional NATS/Kafka/PubSub lag, KEDA scale activity, Cloudflare routing decisions, and failover events.
For multi-region deployments, ClickHouse may run per-region with replicated/distributed tables or be treated as an asynchronous analytics sink. No task should block on ClickHouse writes. Runtime correctness remains based on events, payload references, registry records, and projection checkpoints.
Next-Generation NoETL Business Operating System for Quantum Cloud
The long-term product shape is a cross-region, multi-tenant business operating system for computation. In this model, NoETL is not only a workflow runner. It becomes the system of record for who requested computation, which resources were used, which data and models were produced, how results can be shared, and how cost, policy, provenance, and replay evidence are attached to every output.
| Capability | Business operating system function |
|---|---|
| Tenant/project hierarchy | Universities, laboratories, departments, teams, companies, and shared collaborations receive isolated namespaces with explicit grants. |
| Resource market / resource sharing | Datasets, models, execution results, embeddings, tables, and instruments can be published privately or shared across projects under policy. |
| Compute marketplace | CPU, GPU, TPU, QPU, HPC, serverless, Kubernetes, and on-prem pools become addressable compute capabilities with cost and locality metadata. |
| Global resource location | noetl:// and URN-derived subjects resolve logical resources to physical locations, replicas, credentials, and preferred compute regions. |
| Policy and accounting | Every execution has tenant, region, cost, quota, grant, classification, audit, and retention metadata. |
| Scientific/enterprise provenance | Every result can be traced to events, inputs, code version, payload checksums, worker locality, and replay validation bundles. |
| Cross-region supercomputer mode | NoETL coordinates regional cells as one logical computation plane while preserving data residency, tenant isolation, and locality-aware scheduling. |
All-Together Deployment Modes
NoETL should support three composable deployment modes. The same control-plane contracts should apply in all modes; only adapters and placement policies change.
| Mode | Description | Typical stack |
|---|---|---|
| Managed serverless cell | Cost-optimized cloud cell for bursty customers and isolated tenant jobs. | Cloudflare entrypoint, Cloud Run/Fargate/Container Apps, managed Pub/Sub/SQS/Event Hubs, managed Postgres, object storage. |
| Regional Kubernetes cell | Always-on regional runtime for low-latency queues, projectors, GPU/TPU adjacency, and durable workers. | Cloudflare entrypoint, NATS JetStream, KEDA, Kubernetes, Postgres, object storage, optional ClickHouse/Qdrant. |
| Hybrid datacenter cell | University/lab/company cell connected to the global control plane with local data and compute sovereignty. | On-prem Kubernetes or HPC scheduler, NATS/Kafka bridge, S3-compatible object store, local Postgres/ClickHouse, outbound-only secure control link. |
| Cross-region supercomputer mode | Multiple cloud and datacenter cells federated as a logical computation fabric for scientific and AI/quantum workloads. | Resource Registry, topology service, NATS/Kafka/cloud queues, Cloudflare routing, object-store replication, ClickHouse analytics, QPU/GPU/HPC adapters. |
- 15. Operational Model
- 16. Security Model
- 17. Development Roadmap
- 18. Minimum Viable Implementation
- 19. Risks and Mitigations
- 20. References
1. Executive Summary
NoETL should be built as a regionalized, sharded, multitenant computation cloud. The active execution path should keep commands and events close to the workers that execute them. Historical data should move to durable object storage. Queryable state should live in projection stores, primarily sharded Postgres for the control plane and operational views.
| Area | Blueprint decision |
|---|---|
| Hot active log | NATS JetStream streams per regional cell and shard. These streams keep active execution events available for replay, retry, monitoring, and projection catch-up. |
| Historical log | Immutable event and result archives in S3, GCS, Azure Blob Storage, or an S3-compatible in-house object store. The archive is the long-term reference for processed executions. |
| Projections | Sharded Postgres stores queryable execution state, indexes, tenant metadata, resource metadata, and projection checkpoints. |
| Large data/results | Never inline large payloads in events. Store payloads as objects; events and projections carry content-addressed references. |
| Global sharing | Share computation outputs through governed NoETL Resource Locators, grants, replication policies, and data-local scheduling. |
| Global topology | A lightweight global catalog maps tenant/project/resource/shard identifiers to regional cells, NATS clusters, Postgres shards, object locations, and access policies. |
Core architectural decision
NATS JetStream = hot/active event log and command fabric
Object storage = immutable historical execution archive and large result store
Postgres = shard-local projections, indexes, resource catalog, and checkpoints
Global catalog = metadata, routing, identity, policy, and resource-location resolution
This model lets NoETL scale independently along the dimensions that matter: event throughput, object payload size, regional isolation, tenant isolation, and query workload. It avoids forcing Postgres to be the only event path while also avoiding the mistake of treating a message bus as the only historical system of record.
2. Design Goals and Non-Goals
Design goals
- Support active executions with durable, replayable, shard-local event streams.
- Archive completed executions and result manifests to cloud or in-house object storage for long-term reference.
- Make every event, result, model, dataset, log segment, and projection snapshot addressable through a unified NoETL Resource Locator.
- Support multitenant isolation for universities, laboratories, companies, teams, projects, and experiments.
- Allow controlled sharing of computation results between tasks, projects, tenants, and regions.
- Run across regional datacenters without requiring a single global database in the data path.
- Allow NATS clusters, object buckets/prefixes, and Postgres shards to scale independently.
- Preserve auditability, reproducibility, and data lineage for scientific and enterprise workloads.
- Support Cloudflare entrypoints for global ingress, tenant-aware request routing, and region/cell selection.
- Support Kubernetes event-driven autoscaling with KEDA based on NATS JetStream consumer lag and backlog.
- Support Apache Arrow IPC as a Tier 1.5 payload format for frame-shaped data movement between outbox, projector, and worker runtimes.
- Support cloud-native and hybrid adapters for queues, object stores, event stores, payload stores, and projection stores.
Non-goals
- Do not build a single globally consistent event database for all regions. NoETL should prefer regional cells and explicit replication over a global write bottleneck.
- Do not store large binary or tabular results directly in NATS or Postgres event rows. Store references and manifests instead.
- Do not make cross-tenant sharing implicit. Sharing must be grant-based and auditable.
- Do not assume one cloud provider. The architecture must work with AWS, GCP, Azure, on-premises S3-compatible stores, and hybrid deployments.
3. Target Architecture
NoETL global cloud is a federation of regional cells. A cell is a deployable unit containing the control-plane services, NATS JetStream cluster, worker pools, Postgres projection shards, and object-store connectivity required to execute and observe workloads in that region.
+---------------------------------------+
| Global NoETL Catalog |
| identity | topology | policy | routing |
+-------------------+-------------------+
|
+------------------------+------------------------+
| |
+-----------v------------+ +-----------v------------+
| Regional Cell: usw2-a | | Regional Cell: euw1-a |
|------------------------| |------------------------|
| NATS JetStream shards | | NATS JetStream shards |
| Postgres projections | | Postgres projections |
| worker pools CPU/GPU | | worker pools CPU/GPU |
| object-store adapters | | object-store adapters |
+-----------+------------+ +-----------+------------+
| |
v v
S3/GCS/Azure/on-prem object stores S3/GCS/Azure/on-prem object stores
| Layer | Primary responsibility | Scale unit |
|---|---|---|
| Global catalog | Tenant registry, topology, policy, resource locator resolution, control metadata. | Globally replicated metadata service; not the high-volume data path. |
| Regional control plane | Accept workload submissions, append execution events, schedule tasks, manage projections, expose APIs. | Regional cell. |
| NATS JetStream | Hot event log, command streams, durable consumers, fanout to projections and workers. | NATS cluster + stream shard. |
| Postgres | Projections, indexes, checkpoints, tenant-local operational queries, resource metadata cache. | Postgres shard or cluster per region/cell. |
| Object storage | Historical event segments, execution bundles, result payloads, logs, artifacts, model files, checkpoints. | Bucket/account/container + prefix shard. |
| Workers | Execute tasks on CPU/GPU/TPU/quantum provider adapters; publish events; read/write resources. | Worker pool by capability and region. |
4. Regional Cell and Shard Model
The shard model should be explicit. Each tenant/project/workload maps to a regional home cell and one or more shard identifiers. The shard identifier determines the NATS stream, Postgres projection partition, object-store prefix, and archive path.
Shard key
shard_key = hash(tenant_id + project_id + optional_dataset_or_execution_affinity) % shard_count
recommended starting point:
shard_count_per_region = 256 logical shards
physical_nats_clusters = 1..N per region
physical_postgres_clusters = 1..N per region
object_prefix_shards = s0000..s0255
Shard naming convention
region: usw2
cell: usw2-a
tenant: t_acme_lab
project: p_genomics_01
logical shard: s0042
NATS stream: NOETL_EVT_usw2_s0042
NATS commands: NOETL_CMD_usw2_s0042
Postgres db: noetl_usw2_s0042
Object prefix: noetl/env=prod/region=usw2/cell=usw2-a/shard=s0042/tenant=t_acme_lab/project=p_genomics_01/
| Shard component | Naming rule | Purpose |
|---|---|---|
| NATS event stream | NOETL_EVT_<region>_<shard> | Hot execution event log. |
| NATS command stream | NOETL_CMD_<region>_<shard> | Task dispatch, retries, delayed scheduling, cancellation. |
| Postgres schema/db | noetl_<region>_<shard> | Projection and checkpoint storage. |
| Object prefix | env/region/cell/shard/tenant/project/date/execution | Archive and result layout. |
| Global resource URI | noetl://<tenant>/<project>/<kind>/<id> | Stable logical identifier independent of physical storage backend. |
Cell principles
- A regional cell should be independently deployable, drainable, and replaceable.
- Workers should connect outbound to the regional NATS/control plane; cross-region worker calls should be avoided unless explicitly scheduled.
- Cell-to-cell communication should exchange resource references, manifests, and policy decisions, not raw global database writes.
- Every cell should be able to continue processing local workloads when another region is degraded, subject to local resource availability and policy.
5. Event Log Lifecycle
NoETL should distinguish active events, projected state, and archived history. Active events live in NATS JetStream; queryable operational state lives in Postgres; completed execution history and large result manifests live in object storage.
Hot, warm, cold lifecycle
| Tier | Storage | Contains | Retention intent |
|---|---|---|---|
| Hot | NATS JetStream | Active execution events, command state, recent task lifecycle events, retry/cancellation events. | Hours to weeks, or longer for active workflows; retention controlled by archive checkpoint. |
| Warm | Postgres projections | Current execution state, task states, resource metadata, indexes, checkpoint positions, audit summaries. | Operational lifetime; compacted and queryable. |
| Cold | Object storage | Immutable event segments, execution archive bundles, result files, logs, metrics summaries, lineage manifests. | Long-term historical reference and reproducibility. |
Append path:
1. Control plane publishes event to JetStream and waits for server acknowledgement.
2. Projection consumer updates Postgres projection idempotently.
3. Archive consumer writes events to object-store segments and commits an archive manifest.
4. Retention controller marks stream sequence ranges as archive-safe.
5. Old hot events may be evicted only after archive-safe status and retention policy are satisfied.
Stream separation
| Stream | Policy | Notes |
|---|---|---|
| NOETL_CMD_<region>_<shard> | Work queue semantics | Commands are requests to do work. They may be retried, delayed, canceled, or dead-lettered. |
| NOETL_EVT_<region>_<shard> | Append-only event semantics | Events are facts that happened. Do not mutate. Use compensating events. |
| NOETL_DLQ_<region>_<shard> | Dead-letter inspection | Failed commands/events requiring operator or automated remediation. |
| NOETL_ARCHIVE_ACK_<region>_<shard> | Archive control | Archive progress and retention-safety checkpoints. |
Event envelope
{
"event_id": "01J...", // globally unique, generated by NoETL
"event_type": "task.completed",
"schema_version": "noetl.event.v1",
"tenant_id": "t_acme_lab",
"project_id": "p_genomics_01",
"execution_id": "exec_20260529_000001",
"workflow_id": "wf_variant_calling",
"step_id": "align_reads",
"task_id": "gpu_bwa_mem",
"region": "usw2",
"cell": "usw2-a",
"shard": "s0042",
"causation_id": "cmd_01J...",
"correlation_id": "run_01J...",
"created_at": "2026-05-29T22:00:00Z",
"payload": {
"status": "ok",
"result_ref": "noetl://t_acme_lab/p_genomics_01/results/exec_20260529_000001/align_reads/main@v1"
},
"security": {
"classification": "tenant-confidential",
"policy_ref": "policy://t_acme_lab/p_genomics_01/default-result-policy"
}
}
Retention rule
The active stream should not be allowed to evict events simply because a max age or max bytes limit is reached. For NoETL, eviction must be gated by archive status.
A stream sequence range can be evicted only when:
archived == true
projection_checkpoint >= range_end_sequence for required projections
retention_min_age has passed
no active execution still references the range as unrecoverable hot context
6. Projection and Query Model
Postgres should serve as the primary operational projection store, not the only system of record for every payload. It keeps the control plane fast and queryable while allowing event history and large artifacts to live elsewhere.
| Projection | Description | Primary key |
|---|---|---|
| execution_projection | Current status and summary for every execution. | tenant_id, project_id, execution_id |
| task_projection | Current status and attempts for each step/task. | execution_id, step_id, task_id, attempt |
| resource_index | Searchable index of resources, result refs, manifests, checksums, locations, and policies. | resource_uri, version |
| projection_checkpoint | Per-projection stream sequence cursor. | projection_name, stream_name, shard |
| processed_event | Idempotency ledger for projection consumers. | projection_name, event_id |
| archive_manifest_index | Pointer to object-store archive bundles and segment manifests. | tenant_id, execution_id, segment_id |
| grant_index | Materialized access grants and sharing policy cache. | resource_uri, grantee, scope |
create table projection_checkpoint (
projection_name text not null,
region text not null,
shard text not null,
stream_name text not null,
last_stream_seq bigint not null,
last_event_id text,
updated_at timestamptz not null default now(),
primary key (projection_name, region, shard, stream_name)
);
create table resource_index (
resource_uri text not null,
version text not null,
tenant_id text not null,
project_id text,
owner_principal text not null,
resource_kind text not null,
content_type text,
checksum_sha256 text,
size_bytes bigint,
region text not null,
shard text not null,
storage_class text not null,
location_descriptor jsonb not null,
policy_ref text,
created_at timestamptz not null default now(),
primary key (resource_uri, version)
);
7. Object Store Archive and Result Bundles
Object storage is the durable, cost-efficient place for historical execution data and large computational outputs. The object store layer must be pluggable: S3, GCS, Azure Blob Storage, NATS Object Store for small/simple deployments, Ceph RGW, SeaweedFS, or another S3-compatible in-house store.
Object store policy: MinIO excluded from the recommended adapter list
MinIO is not part of the recommended S3-compatible adapter list for NoETL deployments. In-house S3-compatible storage SHOULD use Ceph RGW or SeaweedFS instead. The PayloadStore adapter for S3-compatible stores remains available — operators who must run MinIO can still point an s3:// adapter at a MinIO endpoint — but MinIO is not in the supported, validated, or recommended set.
MinIO retains a role only in the compliance-fixture suite (§ C.2, Appendix F.3) for parametrized adapter testing alongside Azurite, fake-gcs-server, and LocalStack / moto. Production and supported-deployment paths use Ceph RGW, SeaweedFS, S3, GCS, or Azure Blob.
Execution archive bundle
archive bundle root:
noetl/env=prod/region=usw2/cell=usw2-a/shard=s0042/tenant=t_acme_lab/project=p_genomics_01/date=2026-05-29/execution=exec_20260529_000001/
required objects:
manifest.json
events/segment-000001.jsonl.zst
events/segment-000001.sha256
results/result-manifest.json
logs/control-plane.jsonl.zst
logs/worker-events.jsonl.zst
metrics/summary.json
lineage/lineage.json
security/access-snapshot.json
signatures/manifest.sig
| Object | Purpose |
|---|---|
| manifest.json | Top-level archive descriptor: schema version, event ranges, checksums, created_at, tenant/project/execution identifiers. |
| events/*.jsonl.zst | Compressed immutable event segments exported from the active JetStream log. |
| results/result-manifest.json | References to output files, database tables, vectors, reports, models, or external artifacts. |
| logs/*.jsonl.zst | Control-plane and worker logs required for audit and debugging. |
| metrics/summary.json | Runtime metrics, durations, resource usage, cost estimates, and scheduling decisions. |
| lineage/lineage.json | Input/output graph, versions, datasets, tasks, code artifacts, containers, model versions, and environment digest. |
| security/access-snapshot.json | Policy snapshot used when execution ran; useful for audit and reproducibility. |
| signatures/manifest.sig | Optional signature over manifest and checksums. |
Object-store layout principles
- Use path partitioning that supports lifecycle policies and analytics: env, region, cell, shard, tenant, project, date, execution.
- Keep all objects immutable once archived. Corrections should create a new version or a compensating manifest, not overwrite history.
- Use checksums for every payload and manifest. Prefer content-addressing for shared results and model artifacts.
- Store small metadata in Postgres projections; store large payloads in object storage.
- Support cross-region replication as a policy attached to the resource, not a hard-coded platform behavior.
8. NoETL Resource Locator
NoETL needs a stable logical URI for every data asset that can be produced, consumed, archived, shared, replicated, or post-processed. The logical URI must be independent of the physical backend. Physical locations are resolved by the resource catalog and policy engine.
Logical URI format
noetl://<tenant>/<project>/<kind>/<logical_path>@<version>
examples:
noetl://t_acme_lab/p_genomics_01/results/exec_20260529_000001/align_reads/main@v1
noetl://t_uc_lab/p_climate/models/forecast-net/checkpoint@sha256-b7e4...
noetl://t_company_a/p_risk/datasets/market-risk/snapshots/2026-05-29@v3
noetl://t_acme_lab/p_genomics_01/events/exec_20260529_000001/archive@final
Location descriptor
{
"uri": "noetl://t_acme_lab/p_genomics_01/results/exec_20260529_000001/align_reads/main@v1",
"kind": "result",
"tenant_id": "t_acme_lab",
"project_id": "p_genomics_01",
"owner": "principal://tenant/t_acme_lab/service/noetl-control-plane",
"classification": "tenant-confidential",
"region_affinity": ["usw2"],
"preferred_region": "usw2",
"home_cell": "usw2-a",
"home_shard": "s0042",
"storage_class": "warm",
"content_type": "application/vnd.apache.parquet",
"size_bytes": 948392122,
"checksum": "sha256:...",
"locations": [
{
"backend": "s3",
"region": "us-west-2",
"uri": "s3://noetl-prod-usw2-s0042/.../results/align_reads/main.parquet",
"replica_state": "primary"
},
{
"backend": "gcs",
"region": "us-west1",
"uri": "gs://noetl-prod-usw1-s0042/.../results/align_reads/main.parquet",
"replica_state": "async-replica"
}
],
"policy_ref": "policy://t_acme_lab/p_genomics_01/results/default",
"kms_key_ref": "kms://t_acme_lab/usw2/result-key-01",
"lineage_ref": "noetl://t_acme_lab/p_genomics_01/lineage/exec_20260529_000001@final"
}
Resolution contract
resolve(noetl_uri, requester, purpose, preferred_region, required_capability) -> access plan
access plan:
- allowed or denied
- chosen physical location
- short-lived credentials or signed URL
- expected checksum
- replication/caching instruction if local copy is required
- audit event to append
| Resource kind | Typical physical backing |
|---|---|
| events | JetStream active range or object-store archive segment. |
| result | Object store object, database table, vector collection, or mounted filesystem object. |
| dataset | Object store prefix, table, Iceberg/Delta-style table, or external catalog object. |
| model | Object store artifact plus metadata; optional model registry projection. |
| log | Object-store log segment plus searchable projection. |
| projection_snapshot | Postgres snapshot export or object-store compacted view. |
| secret_ref | External secret manager reference; never resolved into plain text by resource catalog APIs. |
9. Resource Catalog and Resource Registry
NoETL should include a native Resource Registry as a first-class control-plane service. The registry is the runtime source of truth for resolving logical NoETL resources to physical regional locations, while external data catalogs remain optional integrations for human discovery and governance. This distinction is important: NoETL workers need a fast, policy-aware resolver in the execution path; humans need searchable metadata, documentation, lineage, and ownership views.
Architectural decision
NoETL Resource Registry = runtime source of truth for resources, locations, grants, and lineage
Sharded Postgres = authoritative metadata store for active catalog records
Object-store manifests = immutable cold/historical resource and execution references
NATS JetStream = catalog change events, cache invalidation, and regional propagation
Iceberg/Polaris/etc. = optional table/dataset catalog layer for analytical datasets
OpenMetadata/DataHub = optional human-facing discovery and governance layer
The registry should not be implemented as object-store listings, NATS messages, or an external data catalog alone. Object stores are durable but slow to query as metadata systems. NATS is excellent for active events and propagation, but it should not be the catalog database. OpenMetadata, DataHub, Polaris, Gravitino, or cloud provider catalogs may be integrated, but NoETL still needs its own runtime registry for deterministic resource resolution and authorization.
Registry versus catalog terminology
| Layer | Primary audience | Purpose | Typical backing system |
|---|---|---|---|
| Resource Registry | NoETL runtime, workers, schedulers, control plane | Resolve noetl:// resources, enforce grants, track locations, replicas, checksums, and lineage | Sharded Postgres plus object manifests |
| Data Catalog | Humans, stewards, governance teams | Search, documentation, ownership, quality, glossary, visual lineage, access request workflow | OpenMetadata, DataHub, cloud catalogs |
| Dataset/Table Catalog | Query engines and lakehouse tools | Manage table metadata, snapshots, partitions, schemas, and data-file manifests | Iceberg REST catalog, Polaris, Gravitino, Glue, BigLake, Unity-like catalogs |
Logical resource URI
All resources should have a stable logical URI that is independent from the current bucket, region, cluster, or database shard. The URI identifies the resource; the registry resolves it to one or more physical locations and access methods.
noetl://<tenant>/<project>/<namespace>/<resource_type>/<name>@<version>
examples:
noetl://stanford/lab-a/genomics/dataset/sample_reads@v3
noetl://ucb/quantum/experiments/result/run_2026_05_29@final
noetl://acme/risk/models/model/fraud_detector@2026-05-29
noetl://default/demo/executions/archive/exec_01HYABC@completed
Resource types
| Resource type | Description | Example physical storage |
|---|---|---|
| execution_archive | Archived event log, manifests, checksums, and lineage for a completed execution | s3://, gs://, az://, ceph://, seaweed:// (see § 7 policy: MinIO is not in the recommended adapter list) |
| result | Task or workflow output, usually referenced by manifest rather than embedded in events | Object store object, database rowset, Iceberg table snapshot |
| dataset | Reusable structured or semi-structured data asset | Iceberg table, Parquet folder, Delta-like layout, database table |
| model | ML/AI model artifact, checkpoint, adapter, embedding model, or prompt pack | Object store bundle, model registry pointer |
| index | Search, vector, graph, or embedding index produced by a workload | Qdrant, OpenSearch, pgvector, object-backed index files |
| schema | Schema, contract, EBNF/JSON Schema/OpenAPI/Avro/Protobuf definition | Catalog row plus object-store definition file |
| credential_ref | Reference to external secret or identity binding, never the raw secret | Vault, cloud secret manager, KMS envelope reference |
| worker_pool | Regional compute capacity available for CPU/GPU/TPU/quantum-adjacent tasks | Control-plane registration record |
Canonical registry records
The minimum registry implementation requires four durable metadata groups: resource identity, physical locations, access grants, and lineage. These records live in the tenant/project home shard and may be cached regionally, but Postgres remains the authoritative metadata store for active resources.
resource(resource_id, tenant_id, project_id, namespace, resource_type, logical_name, version, home_region, home_shard, status, metadata)
resource_location(location_id, resource_id, region, shard_id, storage_type, storage_uri, checksum, size_bytes, is_primary, is_replica, storage_class)
resource_grant(grant_id, resource_id, subject_type, subject_id, permission, scope, expires_at, conditions)
resource_lineage(lineage_id, output_resource_id, input_resource_id, execution_id, task_id, relationship_type)
Resolution contract
Workers, tools, and post-processing jobs should resolve resources through the NoETL registry rather than hard-coding cloud storage paths. The resolver should consider tenant policy, region affinity, data locality, replica health, storage class, and caller permissions before returning a usable location.
GET /v1/resources/resolve?uri=noetl://stanford/lab-a/genomics/dataset/sample_reads@v3
{
"resource_id": "res_01HY...",
"resource_uri": "noetl://stanford/lab-a/genomics/dataset/sample_reads@v3",
"status": "active",
"home_region": "us-west-2",
"nearest_location": {
"region": "us-west-2",
"storage_type": "s3",
"uri": "s3://noetl-usw2-lab-a/datasets/genomics/sample_reads/v3/"
},
"access": {
"allowed": true,
"credential_mode": "temporary",
"expires_in_seconds": 3600
}
}
Hot, warm, and cold registry behavior
| State | Event log | Registry / projection state | Object storage |
|---|---|---|---|
| Hot / active execution | NATS JetStream stream is retained for replay and recovery | Full active resource records, grants, locations, and execution indexes in Postgres | Large payloads and intermediate results stored by reference |
| Warm / recently completed | JetStream retained for configured recovery window | Catalog keeps searchable active metadata and archive pointer | Execution bundle, result manifests, and checksums written to object store |
| Cold / historical | JetStream can be compacted or expired after archive verification | Catalog keeps lightweight pointer, ownership, lineage summary, and retention metadata | Immutable archive bundle is the historical source of reference |
Catalog-change events
Every registry mutation should emit a catalog event to NATS JetStream. Regional caches, search indexes, governance exporters, and replication controllers subscribe to these events. This keeps the authoritative state in Postgres while preserving a distributed event-driven propagation model.
noetl.catalog.resource.registered
noetl.catalog.resource.location_added
noetl.catalog.resource.grant_created
noetl.catalog.resource.replica_available
noetl.catalog.resource.archived
noetl.catalog.resource.deleted_or_expired
Integration points
| Integration | Use when | NoETL role |
|---|---|---|
| Apache Iceberg | Structured analytical datasets need snapshots, schema evolution, partition manifests, and query-engine interoperability | Store dataset resources as Iceberg references and track lineage/grants in NoETL |
| Apache Polaris / Iceberg REST catalog | A neutral catalog service is needed for Iceberg tables across clouds and engines | Use as table catalog; NoETL remains runtime resource registry |
| Apache Gravitino or cloud catalogs | Enterprise data platforms already standardize on lakehouse catalogs | Bridge resource records and table locations |
| OpenMetadata / DataHub | Humans need discovery, docs, ownership, glossary, governance workflows, or visual lineage | Export metadata and lineage; do not put runtime resolution on the critical path |
| OPA / OpenFGA-style policy | Cross-tenant sharing requires expressive authorization and relationship-based access | Evaluate grants and sharing policies during resource resolution |
Minimum viable implementation
- Implement Resource Registry API in the NoETL control plane: register, resolve, add location, grant, lineage, archive, replica.
- Create sharded Postgres tables for resource, resource_location, resource_grant, resource_lineage, and registry checkpoints.
- Emit catalog-change events to NATS JetStream for cache invalidation and cross-region propagation.
- Write object-store archive manifests for completed executions and store registry pointers to those manifests.
- Require all tools and workers to return result manifests with resource_uri, storage_uri, checksum, size_bytes, format, and schema_ref when applicable.
- Add a resolver library to noetlctl, workers, and SDKs so playbooks reference noetl:// URIs instead of provider-specific paths.
- Add optional Iceberg resource type support before adding broader OpenMetadata/DataHub export.
Development rule
NoETL playbooks, workers, and APIs should treat physical storage URIs as resolved implementation details. Public contracts should use noetl:// logical resources wherever possible. This makes regional movement, replication, migration across cloud providers, and historical archive restoration possible without changing playbook logic.
10. Multitenancy, Isolation, and Sharing
NoETL should model tenancy as a hierarchy. Each layer adds isolation, policy, quotas, and sharing boundaries.
Organization / Tenant
Workspace / Department / Lab
Project / Study / Product
Workflow / Playbook
Execution / Run
Step / Task / Attempt
Resource / Result / Artifact
| Isolation layer | Required controls |
|---|---|
| Tenant | Dedicated policy namespace, quotas, KMS root, audit boundary, billing boundary, optional physical shard pinning. |
| Workspace/Lab | Delegated admins, resource groups, grant templates, data classifications. |
| Project | Default region, allowed worker capabilities, budget, network policy, sharing policy. |
| Execution | Runtime identity, immutable inputs, event stream partition, result namespace. |
| Resource | Owner, classification, grants, replicas, lifecycle, checksum, lineage, retention. |
Sharing model
Sharing should be explicit, time-bound where possible, and represented as first-class metadata. A task does not read another tenant’s resource because it knows the bucket path; it reads through a NoETL Resource Locator that resolves policy and returns an access plan.
grant:
resource_uri: noetl://t_acme_lab/p_genomics_01/results/exec_123/variant_table@v1
grantee: tenant://t_partner_university
scope: read
purpose: postprocessing
expires_at: 2026-12-31T23:59:59Z
allowed_regions: [usw2, use1]
allow_replication: true
max_replica_lifetime: 30d
audit_required: true
- Default policy: private to tenant and project.
- Project-level sharing: allow another project in the same tenant to consume selected resources.
- Tenant-to-tenant sharing: require explicit grant, audit, classification check, and optional legal/data-use agreement reference.
- Public or community datasets: publish a read-only resource descriptor with immutable versions and checksums.
- Compute sharing: allow a task to run in another region or tenant-owned worker pool without exposing raw data beyond the granted scope.
11. Cross-Region Computation and Data Locality
NoETL should schedule computation near data unless policy, cost, capability, or latency requires otherwise. Cross-region sharing should be based on resource references and controlled replication, not ad hoc copying.
| Scenario | Preferred behavior |
|---|---|
| Data and compute available in same region | Run task locally; read/write local resources; publish local events. |
| Data remote, compute local | Resolve resource; decide whether to replicate/cache input locally or run remote based on size, cost, policy, and worker capability. |
| GPU/TPU capability only in another region | Schedule task to capable region; output location follows policy and can replicate summary/result back to home region. |
| Tenant wants shared result reused globally | Promote result to governed shared resource; create replicas according to lifecycle/replication policy. |
| Strict data residency | Scheduler must not move raw data outside allowed regions; compute must run inside residency boundary. |
scheduling decision inputs:
- resource locations and sizes
- data classification and residency constraints
- required compute capability: CPU/GPU/TPU/quantum-provider/secure-enclave/etc.
- estimated transfer time and cost
- current region capacity
- tenant/project budget
- allowed sharing and replication policy
- cache hit probability and reuse value
Cross-region result sharing flow
- Producer task writes result to object storage in its home region and emits result.created event.
- Projection updates resource_index with the logical URI and primary physical location.
- Archive service writes final result manifest and checksum metadata.
- Sharing grant is created or inherited from project policy.
- Consumer task resolves the NoETL URI from its region.
- Policy engine decides whether to read remote, replicate to local cache, or schedule compute near the data.
- Every access and replication action emits audit events into the active event log and archive.
12. Control Plane Services
| Service | Responsibilities |
|---|---|
| Topology Service | Maps tenants, projects, executions, and resource URIs to regions, cells, shards, streams, databases, and object prefixes. |
| Event Gateway | Validates and appends events to the correct shard-local JetStream stream; returns event position. |
| Command Scheduler | Creates commands, routes to worker-capability streams, handles retries, deadlines, and cancellation. |
| Projection Service | Consumes events, updates Postgres projections, maintains idempotency and checkpoints. |
| Archive Service | Segments event streams, writes archive bundles to object storage, records archive-safe checkpoints. |
| Resource Catalog | Stores and resolves NoETL Resource Locators and location descriptors. |
| Policy Engine | Evaluates identity, grants, tenant isolation, data classification, residency, and sharing rules. |
| Replication Manager | Creates and monitors object replicas or cache copies according to policy. |
| Worker Registry | Tracks worker pools, capabilities, health, regional affinity, and tenant/project admission rules. |
| Audit Service | Records administrative, policy, access, and replication events. |
minimum control-plane API surface:
POST /v1/executions
GET /v1/executions/{execution_id}
POST /v1/events:append
GET /v1/events:read
POST /v1/resources:resolve
POST /v1/resources:grant
POST /v1/resources:replicate
GET /v1/topology/{tenant}/{project}
GET /v1/workers/capabilities
13. Runtime Protocols and Data Contracts
Command envelope
{
"command_id": "cmd_01J...",
"command_type": "task.run.requested",
"tenant_id": "t_acme_lab",
"project_id": "p_genomics_01",
"execution_id": "exec_20260529_000001",
"step_id": "align_reads",
"task_id": "gpu_bwa_mem",
"required_capability": ["gpu", "container", "object-store-read"],
"inputs": [
"noetl://t_acme_lab/p_genomics_01/datasets/reads/sample-a@sha256-..."
],
"output_contract": {
"uri": "noetl://t_acme_lab/p_genomics_01/results/exec_20260529_000001/align_reads/main@v1",
"content_type": "application/vnd.apache.parquet",
"retention": "project-default"
},
"deadline_at": "2026-05-29T23:00:00Z",
"retry_policy": {"max_attempts": 3, "backoff": "exponential"}
}
Result manifest
{
"schema_version": "noetl.result-manifest.v1",
"resource_uri": "noetl://t_acme_lab/p_genomics_01/results/exec_20260529_000001/align_reads/main@v1",
"producer": {
"execution_id": "exec_20260529_000001",
"step_id": "align_reads",
"task_id": "gpu_bwa_mem",
"worker_id": "worker_usw2_gpu_17"
},
"artifacts": [
{
"name": "aligned_reads",
"content_type": "application/vnd.apache.parquet",
"uri": "s3://noetl-prod-usw2-s0042/.../aligned_reads.parquet",
"checksum": "sha256:...",
"size_bytes": 948392122
}
],
"lineage": {
"inputs": ["noetl://t_acme_lab/p_genomics_01/datasets/reads/sample-a@sha256-..."],
"code_ref": "oci://registry/noetl/tasks/gpu-bwa-mem@sha256:...",
"environment_ref": "noetl://platform/environments/cuda-12-4@sha256-..."
}
}
14. Global Runtime Fabric: Cloudflare, NATS Supercluster, KEDA, and Arrow
NoETL should treat the global runtime fabric as a combination of edge entrypoints, regional NATS superclusters, Kubernetes-based worker pools, and a columnar payload plane. The goal is not to hide locality completely; the goal is to make locality explicit, routable, observable, and policy-controlled while keeping the user-facing API location-transparent.
Cloudflare entrypoints and regional routing
Users, API clients, agents, notebooks, and partner systems should enter NoETL through global Cloudflare entrypoints. Cloudflare should terminate public TLS, apply WAF/rate-limit rules, verify identity where appropriate, and route requests to the best NoETL regional cell. The selected cell is based on tenant home region, data residency policy, resource location, health, and load. For public SaaS entrypoints, use Cloudflare Load Balancing, health checks, geo/proximity/dynamic steering, and Workers for tenant-aware routing. For private enterprise or university datacenter connectivity, use Cloudflare Tunnel, private network load balancing, or equivalent private ingress patterns.
client / notebook / API / agent
-> Cloudflare entrypoint: api.noetl.cloud, tenant.noetl.cloud, or lab-specific domain
-> edge policy: TLS, WAF, identity, rate limits, tenant lookup, residency check
-> regional control-plane gateway: us-west, us-east, eu-west, asia, or private datacenter
-> regional NATS / Postgres / object-store shard selected by NoETL Resource Registry
| Cloudflare layer | NoETL use | Design rule |
|---|---|---|
| DNS / Load Balancing | Public API entrypoint and pool failover across regional control-plane gateways | Use health-aware routing. Do not route writes to a region that violates tenant home-region policy. |
| Workers | Lightweight edge router for auth bootstrap, tenant lookup, request normalization, and routing hints | Keep business state in NoETL control plane, not in edge code. |
| Smart Placement / Placement Hints | Place edge logic close to distributed or explicit backend regions when useful | Use for API-routing logic only; NoETL registry remains source of truth for data location. |
| Cloudflare Tunnel / private ingress | Expose university, lab, enterprise, or in-house datacenter cells without public origin IPs | Use for hybrid cells and private control-plane gateways. |
| WAF / rate limits / bot controls | Protect shared multitenant control-plane endpoints | Rate-limit by tenant, project, token, and route. |
URN-like Resource Names and NoETL Resource Locator
NoETL should support both a URL-like form and a URN-like form for logical resource identity. The URL-like form is convenient for developers and manifests; the URN-like form is useful for policy engines, identity systems, stream subjects, and cloud-neutral references. Both forms resolve to the same registry record.
URL-like form:
noetl://<tenant>/<project>/<namespace>/<resource_type>/<name>@<version>
URN-like form:
urn:noetl:<tenant>:<project>:<namespace>:<resource_type>:<name>:<version>
Example:
noetl://stanford/lab-a/genomics/dataset/sample_reads@v3
urn:noetl:stanford:lab-a:genomics:dataset:sample_reads:v3
NATS subjects should not be treated as the authoritative name of a resource. They are routing subjects derived from the resource name, tenant, project, shard, event type, and execution context. The Resource Registry remains authoritative for resolving a logical resource name to physical object locations, stream names, table snapshots, credentials, and grants.
NATS supercluster as decentralized message fabric
A NoETL global cloud should use regional NATS clusters connected through NATS gateways to form a supercluster. Each regional cluster owns local streams, local durable consumers, local queue groups, and local JetStream storage. Gateways provide cross-cluster routing and failover without requiring a single global NATS cluster or a full mesh of all nodes.
Regional cell A: nats-us-west-01, nats-us-west-02, nats-us-west-03
Regional cell B: nats-eu-west-01, nats-eu-west-02, nats-eu-west-03
Regional cell C: nats-private-lab-01, nats-private-lab-02, nats-private-lab-03
Gateway fabric:
us-west <-> eu-west <-> private-lab <-> asia
Subject patterns:
noetl.cmd.<tenant>.<project>.<region>.<shard>.<workload>
noetl.evt.<tenant>.<project>.<region>.<shard>.<execution_id>.<event_type>
noetl.catalog.<tenant>.<project>.<resource_event>
noetl.ctrl.<tenant>.<project>.<region>.<worker_pool>
| Concern | Recommended pattern |
|---|---|
| Locality | Workers should consume from local streams and local durable consumers first. Cross-region delivery is a failover or explicit sharing path, not the default hot path. |
| Subject design | Derive subjects from tenant, project, region, shard, and event type. Keep the canonical resource identifier in the event envelope. |
| Failover | If a regional cell is unhealthy, Cloudflare routes control-plane requests elsewhere and NATS gateway routing makes interested consumers reachable in another cell. |
| Sharding | Use shard-local streams and object prefixes. Avoid one global stream for all tenants. |
| Isolation | Use NATS accounts, credentials, and subject permissions aligned with tenant/project boundaries. |
KEDA autoscaling on NATS JetStream lag
Kubernetes worker pools should scale from zero or a small warm baseline based on NATS JetStream consumer lag, stream backlog, and workload-specific constraints. KEDA provides the Kubernetes event-driven scaling bridge. Each worker deployment, StatefulSet, or job queue can be scaled independently by stream, consumer, tenant class, GPU/CPU profile, and priority lane.
Example scaling target:
stream: NOETL_CMD_USW2_SHARD_007
consumer: worker-gpu-a100-durable
metric: consumer lag / pending messages
action: scale worker deployment from 0..N replicas
KEDA ScaledObject sketch:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: noetl-worker-nats-jetstream
spec:
minReplicaCount: 0
maxReplicaCount: 100
scaleTargetRef:
name: noetl-worker-pool
triggers:
- type: nats-jetstream
metadata:
natsServerMonitoringEndpoint: nats.monitoring.svc.cluster.local:8222
account: NOETL_TENANT_OR_POOL
stream: NOETL_CMD_USW2_SHARD_007
consumer: worker-pull-consumer
lagThreshold: "25"
| Worker class | Scale signal | Example use |
|---|---|---|
| CPU workers | NATS consumer lag, queue depth, CPU saturation | HTTP calls, Python functions, small ETL transforms, API orchestration |
| GPU workers | NATS lag plus GPU availability and tenant quota | ML inference, model scoring, image/video transforms, vectorization |
| TPU / accelerator workers | Backlog plus accelerator pool admission control | Large model training or batch accelerator workloads |
| Quantum-adjacent / external compute adapters | Command backlog plus external scheduler slots | Submitting jobs to specialized backends and polling results |
| Projector workers | Event-stream lag and checkpoint delay | Projection updates, resource registry updates, replay indexes |
Apache Arrow IPC Tier 1.5 payload plane
NoETL should treat Apache Arrow IPC as a Tier 1.5 payload format between JSON control metadata and durable analytical datasets. JSON remains appropriate for command envelopes, event metadata, manifests, and small payloads. Arrow IPC is the preferred format for frame-shaped intermediate results, projector payloads, and efficient interchange between Python, Rust, Java, Go, DuckDB, DataFusion, Spark-adjacent tools, and GPU-oriented data pipelines.
| Payload tier | Format | Use |
|---|---|---|
| Tier 0 | JSON / JSONB | Control envelopes, event metadata, small structured values, manifests, policies |
| Tier 1 | Object references | Large binary payloads, result bundles, archived execution logs, model files |
| Tier 1.5 | Apache Arrow IPC / Feather V2 | Frame-shaped cursor results, columnar intermediate data, projector outbox payloads, fast local interchange |
| Tier 2 | Parquet / Iceberg tables | Durable analytics, lakehouse datasets, long-term tabular storage, query-engine interoperability |
Arrow payload descriptor in a NoETL result manifest:
{
"payload_type": "arrow_ipc",
"encoding": "feather_v2",
"schema_fingerprint": "sha256:...",
"rows": 1250000,
"columns": 42,
"location": {
"type": "object_store",
"uri": "s3://noetl-usw2-shard-007/tenant=stanford/project=lab-a/execution=exec_01HY/frame_00042.arrow"
}
}
Current implementation status (Python side, as of NoETL v2.92.0+):
ArrowIpcSharedMemoryCacheships atnoetl/core/storage/ipc_cache.py— 256-MiB default budget, LRU-by-lease eviction, per-pod observability via 7 Prometheus counters +summary["ipc"]block on the projector.IpcHintmodel carriesshm_name,schema_digest,byte_length,row_count,producer,node_id,lease_expires_atand is embedded inResultRef.ipcalongside the durable ref.- Producer wiring (cursor worker) and consumer fast-path (
TempStore.get_ipc_bytes) both ship; the Phase 3 audit-table entry on line 968 records this as done.
Remaining future work for true cross-language zero-copy (still tracked):
- Apache Arrow Flight endpoint on the worker for cross-pod / cross-node IPC retrieval.
- Native Rust consumer of
IpcHint— a Rust worker that mmaps a Python-produced segment without a serialization round-trip. Covered by Appendix H (Rust Migration Path). - Broader producer adoption (promoting the
pythonandhttptool kinds to stage IPC) — out-of-spec; tracked separately.
Cloud-native and hybrid adapter model
NoETL should keep the control-plane contracts portable and implement cloud-specific or datacenter-specific behavior behind ports/adapters. This allows a university lab to run a private Ceph/SeaweedFS/NATS/Postgres cell while another tenant runs on S3/GCS/Azure Blob, Pub/Sub/Event Hubs/SQS, and managed Postgres-compatible databases. See § 7 for the object-store recommendation policy that excludes MinIO from the supported in-house adapter list.
| Port | Cloud-native adapters | Hybrid / datacenter adapters |
|---|---|---|
| EventStore | NATS JetStream, Kafka-compatible services, cloud stream services where appropriate | NATS JetStream, Kafka/Redpanda, Postgres append log |
| CommandQueue | NATS JetStream, Pub/Sub, SQS, Event Hubs, Service Bus | NATS JetStream, RabbitMQ, Kafka/Redpanda, Postgres queue |
| PayloadStore | S3, GCS, Azure Blob, cloud HDFS-compatible stores | Ceph RGW, SeaweedFS, on-prem S3-compatible store, NFS-backed object gateway (see § 7 — MinIO excluded from recommended set) |
| ProjectionStore | Cloud Postgres, AlloyDB, Cloud SQL, Aurora/RDS, BigQuery/ClickHouse for analytical views | Postgres, Citus, Timescale, ClickHouse, DuckDB for local/materialized views |
| CatalogStore | Managed Postgres-compatible shards plus object manifest archive | Self-managed Postgres shards plus object manifest archive |
Adapters should never leak cloud-specific paths into workflow logic. The workflow should refer to logical resources, and the registry/adapter layer should resolve them to physical storage, queue, and compute locations.
Failure and takeover flow
[Incoming request or command]
-> Cloudflare routes to nearest healthy policy-compliant cell
-> NoETL Resource Registry resolves tenant/project/resource and home shard
-> Command enters local NATS JetStream stream
-> KEDA observes consumer lag and scales worker pool
-> Worker writes events/results and object payload references
-> Projector updates Postgres projections and catalog records
-> Archive service moves completed execution bundle to object storage
If a cell fails:
-> Cloudflare stops sending public requests to unhealthy origin pool
-> NATS supercluster routes to interested consumers in a surviving region when configured
-> Resource Registry marks impacted locations degraded and chooses replicas
-> KEDA in the surviving region scales replacement workers from backlog
15. Operational Model
| Operational area | Blueprint |
|---|---|
| Backups | Back up Postgres projection shards; archive object stores through versioning/immutability/lifecycle rules; do not rely on NATS alone for historical retention. |
| Disaster recovery | Rebuild projections from object-store event archive or active stream. Recover active shards from JetStream replicas and archive-safe checkpoints. |
| Observability | Metrics per region/cell/shard: stream lag, projection lag, archive lag, command retries, worker utilization, object replication lag, policy denials. |
| Rebalancing | Use logical shards and topology service. New executions can move to new shards; old executions remain addressable by resource locator. |
| Schema evolution | Version every event, command, manifest, location descriptor, and projection schema. Use compatibility checks before deployment. |
| Tenant onboarding | Allocate tenant metadata, default project, KMS policy, shard assignment, object prefix, quotas, and default sharing policy. |
| Tenant offboarding | Freeze new writes, export archive manifests, revoke grants, retain/delete according to contract, emit audit events. |
Required operational metrics
- JetStream stream bytes, messages, first/last sequence, consumer lag, redelivery count, DLQ rate.
- Projection lag by shard and projection name.
- Archive lag by stream sequence range and object prefix.
- Resource resolution latency, policy-denial count, signed URL/token issuance count.
- Object-store replication lag, replica health, checksum verification failures.
- Worker pool capacity by capability, region, tenant, and task type.
- Per-tenant cost and quota utilization: CPU/GPU time, bytes stored, bytes transferred, active executions, archive volume.
16. Security Model
- All resources have an owner, tenant, project, classification, policy reference, and audit trail.
- All physical accesses should be mediated by short-lived credentials, signed URLs, or workload identity, not static keys in playbooks.
- Use separate KMS keys or key namespaces per tenant, with project-level overrides where required.
- Store secrets in external secret managers; NoETL events should contain secret references only.
- Use mTLS or equivalent strong identity between control-plane services, NATS, workers, and storage adapters.
- Make cross-tenant grants explicit and auditable; policy decisions should generate audit events.
- Support data-residency policies by region and resource classification.
- Support legal hold and immutable archive retention for regulated tenants.
| Threat | Mitigation |
|---|---|
| Tenant reads another tenant’s object path directly | Do not expose raw physical paths unless policy allows; use signed access plans scoped to the resolved resource. |
| Worker exfiltrates data outside allowed region | Worker admission policy, network policy, region constraints, signed URL scope, egress monitoring. |
| Event stream retention deletes unrecovered history | Archive-safe checkpoint gate before hot eviction; periodic recovery drills. |
| Projection corruption | Rebuild from active stream or object-store event archive; idempotent projection consumers. |
| Stale replica used for computation | Location descriptor includes replica state, generation, checksum, and freshness policy. |
| Over-permissive sharing grant | Policy review workflow, expiration, purpose limitation, audit trail, classification constraints. |
17. Development Roadmap
| Phase | Deliverable | Exit criteria |
|---|---|---|
| Phase 0: Contracts | Define event envelope, command envelope, resource locator, location descriptor, result manifest, archive manifest. | Schemas published; compatibility tests in CI. |
| Phase 1: Single regional cell | NATS JetStream active event/command streams, Postgres projections, object-store result refs. | One local/kind deployment can execute workflows and rebuild projections. |
| Phase 2: Archive service | Event segment exporter, execution archive bundle, archive manifest index, retention-safe checkpoints. | Completed execution can be replayed/rebuilt from object archive. |
| Phase 3: Shard-aware routing | Topology service, logical shard assignment, per-shard streams, per-shard Postgres schemas/databases. | New executions route by tenant/project/shard and can be queried by locator. |
| Phase 4: Resource catalog | NoETL URI resolution, location descriptors, signed access plans, resource_index projection. | Tasks consume and publish resources by NoETL URI, not raw bucket paths. |
| Phase 5: Multitenancy and grants | Tenant hierarchy, policy engine, grant model, audit events, quotas. | Cross-project and cross-tenant result sharing works under explicit grants. |
| Phase 6: Multi-region federation | Multiple cells, cross-region resolution, replication manager, data-local scheduler. | A task in one region can safely consume or request replication of a resource from another region. |
| Phase 7: Advanced compute marketplace | University/lab/company resource pools, GPU/TPU/quantum adapters, isolated worker admission. | Tenants can publish compute capacity and consume shared resources under policy. |
Current implementation status by phase
The following table records the current development state of the NoETL distributed runtime roadmap and should be used to drive repository issues, milestone planning, and acceptance criteria.
| Phase | Status | Notes |
|---|---|---|
| 0 - instrumentation + stage/frame tables + replay API | done | Stage/frame tables, replay API, replay parity gates, validation artifacts, and storage-neutral replay reader/payload resolver contracts are complete. |
| 1 - frame-shaped cursor loops | done | Frame-shaped cursor loops landed; frame lifecycle, claims, starts, commits, terminal links, and deterministic replay surfaces are now part of the runtime contract. |
| 2 - projector StatefulSet behind durable consumers | done | Projectors run behind durable consumers with shard ownership, strict startup validation, ACK/NAK/TERM metrics, decode/projection error split, and Phase 2 evidence bundles. |
| 3 - Apache Arrow IPC Tier 1.5 | done | Arrow/Feather payload bytes are supported through the outbox/projector path; durable payload references remain the replay authority. |
| 4 - URN + KEDA + NATS supercluster | done | URN/resource locator, KEDA NATS scaler, multi-cluster NATS topology generator, supercluster runtime fixes, and NATS account fix are complete. |
| 5 - port/adapter event-store + payload-store + projection-store | done | Storage-neutral ports are defined; PayloadStore adapters include filesystem, S3, GCS, Azure Blob, and S3-compatible SeaweedFS. |
| 6 - stage planner for fanout/reduce | done | Fanout/reduce planner landed as the basis for locality-aware computation, reduction, and post-processing. |
Roadmap implications
- Treat phases 0-6 as the distributed-runtime foundation; the next roadmap layer is packaging the runtime as a cloud business operating system with tenant accounting, resource markets, policy, and global placement.
- Keep NATS JetStream as the reference active fabric, but enforce adapter boundaries so Kafka, Redis Streams, Pub/Sub, SQS/SNS/EventBridge, and Azure messaging services can replace the transport when cost or enterprise integration requires it.
- Prefer serverless container compute for bursty customer workloads; reserve Kubernetes/KEDA clusters for always-on workers, GPU/TPU/QPU adjacency, stateful projectors, on-prem cells, and datacenter supercluster nodes.
- Promote ClickHouse and warehouse/lakehouse adapters as optional global analytics/projection planes rather than mandatory runtime dependencies.
18. Minimum Viable Implementation
The first implementation should be intentionally small but should use the same abstractions that the global cloud will need later.
MVI topology
local/kind or single cloud region:
- one NATS JetStream cluster
- one Postgres instance
- one object-store bucket (cloud) or Ceph RGW / SeaweedFS bucket (in-house — see § 7)
- one control-plane API
- one worker pool
- one archive service
- one projection service
- one resource catalog table
| MVI component | Implementation detail |
|---|---|
| Stream names | NOETL_EVT_dev_s0000, NOETL_CMD_dev_s0000, NOETL_DLQ_dev_s0000. |
| Object prefix | noetl/env=dev/region=local/cell=local-a/shard=s0000/tenant=<tenant>/project=<project>/... |
| Postgres schemas | projection_checkpoint, processed_event, execution_projection, task_projection, resource_index, archive_manifest_index. |
| NoETL URI | Implement logical URI parser and resolver before adding multi-region routing. |
| Archive | Archive completed executions to manifest + events JSONL.zst + result manifest. |
| Replay | CLI command: noetl replay --from-archive <execution-uri> --projection execution_projection. |
- one Cloudflare-routed public or private entrypoint
- one KEDA operator watching NATS JetStream consumer lag
- one regional NATS gateway configuration, even if single-region in MVI
- one Arrow IPC payload path for frame-shaped results
serverless-first extension:
- route stateless task containers to Cloud Run / Fargate / Container Apps when possible
- reserve Kubernetes/KEDA for durable consumers, projectors, and locality-sensitive workers
- keep NATS JetStream as reference fabric, but configure queue_adapter per region
- allow ClickHouse, BigQuery, Snowflake, Databricks, Redshift, Synapse, or DuckDB as projection sinks
example noetl.yaml storage section:
storage:
event_store:
type: nats_jetstream
stream_template: "NOETL_EVT_{region}_{shard}"
retention:
min_hot_age: 7d
require_archive_safe: true
command_queue:
type: nats_jetstream
stream_template: "NOETL_CMD_{region}_{shard}"
projection_store:
type: postgres
dsn_ref: "secret://noetl/dev/postgres"
schema_template: "noetl_{region}_{shard}"
archive_store:
type: s3_compatible
bucket_template: "noetl-{env}-{region}-{shard}"
prefix_template: "tenant={tenant}/project={project}/date={date}/execution={execution_id}/"
resource_catalog:
type: postgres
resolve_by: noetl_uri
19. Risks and Mitigations
| Risk | Why it matters | Mitigation |
|---|---|---|
| NATS treated as forever archive | Stream storage is operationally expensive and retention-limited by configuration. | Use NATS as hot active log; archive immutable history to object storage; gate eviction by archive-safe checkpoints. |
| Object archive not queryable enough | Long-term history can become hard to inspect. | Maintain Postgres archive_manifest_index and optionally export event segments to analytics tables. |
| Cross-region sharing becomes uncontrolled copying | Cost, security, and lineage problems. | Use resource locator, grants, replication policies, TTL caches, and audit events. |
| Shard mapping changes break references | Old executions must remain addressable. | Resource locator resolves through topology/catalog history; physical locations are versioned descriptors. |
| Projection lag hides recent state | Users may see stale execution status. | Expose projection lag; allow direct event read for recent execution state when necessary. |
| Tenant isolation conflicts with shared computation | Universities/labs/companies want both isolation and collaboration. | Use explicit grants, shared datasets, clean-room style execution, and policy-aware scheduling. |
| Cloud-specific features leak into core model | NoETL must stay portable. | Keep cloud adapters below Resource Location and Archive Store interfaces. |
20. References
These references informed the external storage and messaging assumptions. The NoETL-specific architecture decisions in this document are proposed design decisions, not claims from these external documents.
- NATS JetStream overview: https://docs.nats.io/nats-concepts/jetstream
- NATS JetStream streams: https://docs.nats.io/nats-concepts/jetstream/streams
- NATS JetStream model deep dive: https://docs.nats.io/using-nats/developer/develop\_jetstream/model\_deep\_dive
- NATS JetStream Object Store: https://docs.nats.io/nats-concepts/jetstream/obj\_store
- Google Cloud Pub/Sub subscription properties: https://docs.cloud.google.com/pubsub/docs/subscription-properties
- Google Cloud Pub/Sub quotas and limits: https://docs.cloud.google.com/pubsub/quotas
- Amazon S3 Multi-Region Access Points: https://docs.aws.amazon.com/AmazonS3/latest/userguide/MultiRegionAccessPoints.html
- Azure Blob Storage object replication: https://learn.microsoft.com/en-us/azure/storage/blobs/object-replication-overview
Additional references for global runtime fabric
- KEDA NATS JetStream scaler documentation - scale Kubernetes workloads based on NATS JetStream stream/consumer lag.
- NATS Super-cluster with Gateways documentation - connect regional clusters through gateways and propagate interest across clusters.
- Cloudflare Load Balancing traffic steering documentation - route requests to healthy pools and endpoints using global and local steering policies.
- Cloudflare Workers placement documentation - Smart Placement and Placement Hints for edge logic close to distributed or explicit backend infrastructure.
- Apache Arrow Columnar Format and IPC documentation - language-agnostic columnar memory layout, metadata serialization, and IPC/file transport for frame-shaped data.
Additional references for distributed runtime, serverless compute, and analytics options
- NoETL Distributed Runtime + Event-Sourced Shared Memory Spec: https://noetl.dev/docs/features/noetl\_distributed\_runtime\_spec
- NoETL docs repository source for distributed runtime spec: https://github.com/noetl/docs/blob/main/docs/features/noetl\_distributed\_runtime\_spec.md
- Google Cloud Run services and jobs: https://cloud.google.com/run/docs
- AWS serverless containers and batch options: AWS Lambda, ECS Fargate, AWS Batch, and App Runner documentation.
- Azure serverless containers and batch options: Azure Container Apps, Azure Functions, and Azure Batch documentation.
- Apache Kafka, Redis Streams, Google Pub/Sub, AWS SQS/SNS/EventBridge, and Azure Event Hubs/Service Bus/Event Grid are adapter targets, not mandatory dependencies.
- ClickHouse can be used as an optional analytics/projection plane for high-volume runtime facts and tenant accounting.
Appendix A: Suggested Repository Work Items
| Area | Initial GitHub issues / tasks |
|---|---|
| Schemas | Add JSON Schema or protobuf definitions for event, command, result manifest, location descriptor, archive manifest. |
| NATS adapter | Implement EventStore append/read_from/read_execution on JetStream. |
| Archive service | Implement consumer that writes compressed event segments and manifest.json to object store. |
| Projection service | Implement idempotent projection consumers and checkpoint tables. |
| Resource catalog | Implement noetl:// parser, resolver, resource_index table, and signed access-plan abstraction. |
| Topology service | Implement region/cell/shard registry with deterministic shard assignment. |
| Policy engine | Start with simple RBAC/ABAC checks; design for OPA/Cedar-style policy backends later. |
| CLI | Add noetl resources resolve, noetl archive inspect, noetl replay, noetl topology inspect. |
| Tests | Projection rebuild test, archive restore test, idempotent event processing test, cross-shard resource resolution test. |
Appendix B: First 10 Implementation Steps
- Freeze terminology: active event log, command stream, projection store, archive store, resource catalog, location descriptor.
- Create event and command schema packages in the core NoETL repository.
- Implement JetStream stream bootstrap for one region and one shard.
- Implement Postgres checkpoint and projection tables.
- Implement EventGateway.append_event with JetStream acknowledgement and event_id idempotency.
- Implement ProjectionService for execution and task projections.
- Implement ArchiveService that writes events to object-store archive bundles.
- Implement NoETL URI parser and ResourceCatalog.resolve for local resources.
- Modify workers to consume commands and publish events/result manifests instead of large inline payloads.
- Add replay command that rebuilds projections from active JetStream or object-store archive.
Appendix C: Validation Against Ephemeral Blueprints and Hybrid Cloud Requirements
This appendix validates the global cloud blueprint against the uploaded Ephemeral Blueprints and Compute-Data Boundary document and turns the remaining hybrid-cloud gaps into explicit architecture requirements. The result is a stricter next-generation NoETL design for AWS, GCP, Azure, and private datacenters using shared sharded regional cells, serverless-first compute, and replaceable queue/storage adapters.
C.1 Ephemeral blueprint validation
The uploaded Ephemeral Blueprints document establishes the architectural contract that NoETL playbooks are ephemeral blueprints, workers execute atomic compute blocks, the gateway acts only as auth/routing/callback gatekeeper, shared cache carries in-flight state, and the event log is the system of record. This blueprint validates against that contract as follows:
- Gateway boundary: Cloudflare and the NoETL gateway terminate sessions, route requests, manage SSE/callbacks, and select a regional cell. They must not read or mutate domain data directly.
- Playbook boundary: Every data touch, query, mutation, external API call, and credentialed action occurs inside a playbook step governed by policy, retry, idempotency, and audit rules.
- Worker boundary: Workers are stateless atomic block executors. They may run as Kubernetes/KEDA pools, Cloud Run-style jobs/services, Azure Container Apps, AWS ECS/Fargate/Lambda containers, or datacenter workers, but they do not own durable process state.
- Shared cache boundary: Arrow IPC, shared-memory, Redis, local NVMe, or object-store cache layers are performance vehicles. They are rebuildable and are never the sole replay authority.
- Event log boundary: The event log remains the immutable system of record for execution state. Hot event streams are active in the regional cell; completed executions are archived as object-store event bundles and manifests.
- Credential boundary: Business credentials live in the NoETL keychain and are referenced by aliases in playbook steps. Runtime credentials for gateway, worker, NATS, and service-to-service trust stay at the platform layer.
C.2 Coverage checklist for next-generation hybrid global cloud
The following checklist is required for the blueprint to be complete enough for a next-generation hybrid global cloud spanning AWS, GCP, Azure, and private datacenters.
| Requirement | Design rule | Implementation artifact / validation |
|---|---|---|
| Catalog-driven query routing | All query, result, and dataset requests route through the Resource Catalog / Registry. The catalog resolves noetl:// URNs to region, shard, storage adapter, policy, and compute placement. | Implement QueryRouter + RoutePlan + CatalogResolver. Validate routing to Postgres, ClickHouse, BigQuery, Snowflake, Databricks, DuckDB, Athena/Trino, and object-store datasets. |
| Cluster-aware NATS client routing | NATSCommandPublisher must pick the NATS endpoint from URN locality and catalog metadata, not static config alone. | Implement NatsEndpointResolver using tenant/org/region/zone/cluster. Emit command.route.selected and command.route.failed events for replay and audit. |
| Per-tenant NATS accounts | The NATS supercluster generator must create account-level isolation for tenants and optionally projects/labs. | Generate NOETL platform account plus TENANT_* accounts, subject exports/imports, quotas, JWT/operator config, and tenant-scoped JetStream limits. |
| Cross-cluster stream mirror/source | Use JetStream mirror/source for regional replica and aggregation patterns. Commands should remain owner-routed; events and projection notifications may mirror/source across cells. | Generate mirror/source manifests for NOETL_EVENTS, NOETL_PROJECTIONS, catalog-change streams, and archive notification streams. Validate failover and replay semantics. |
| Registered PayloadStore spill path | Every storage-tier spill goes through PayloadStoreRegistry and returns a typed PayloadReference. Workers must not write raw provider URIs without registration. | Add spill policy to TempStore/Arrow cache/outbox path. Validate filesystem, S3, GCS, Azure Blob, SeaweedFS, Ceph RGW (S3-compatible) stores. |
| Replay URI adapter resolution | ReplayPayloadResolver must resolve s3://, gs://, azure://, abfs://, file://, and noetl:// references through the correct adapter and return bounded checksum/shape summaries. | Add adapter registry tests for all supported schemes. Replay must fail closed if a payload reference is unresolved unless an explicit degraded replay mode is requested. |
| Process-emulator compliance fixture | Cloud adapters must join the same parametrized compliance suite using local emulators where possible. | Add CI fixture using Azurite for Azure Blob, fake-gcs-server for GCS, moto or LocalStack for S3, and SeaweedFS/MinIO for S3-compatible object stores. |
Appendix D: Catalog-Driven Query Routing and Resource Resolution
Catalog-driven query routing is the core mechanism that makes the shared sharded model usable across global regions. Clients and workers must not infer where data lives from bucket names, database hostnames, or NATS subjects alone. They resolve a logical resource identifier through the Resource Catalog and receive a signed, policy-filtered route plan.
D.1 Query routing flow
client/gateway/worker
-> ResourceCatalog.resolve(noetl://tenant/project/ns/resource@version)
-> PolicyEngine.authorize(subject, action, resource, context)
-> QueryRouter.plan(resource, locality, cost, freshness, policy)
-> StoreAdapter.execute(plan) or ComputePlanner.dispatch(plan)
The query router chooses the closest safe execution point, not always the physically closest database. For regulated data, the selected route may force in-region compute. For cost-sensitive jobs, the selected route may prefer a cheaper region or precomputed projection. For high-throughput analytics, the selected route may use ClickHouse, BigQuery, Redshift, Synapse/Fabric, Snowflake, Databricks, DuckDB, Trino/Athena, or a tenant-owned database adapter.
D.2 RoutePlan contract
{
"resource_uri": "noetl://tenant/project/lab/dataset/sample@v3",
"selected_region": "us-west-2",
"selected_cell": "aws-usw2-cell-03",
"store_kind": "clickhouse | postgres | bigquery | s3 | gcs | azure_blob | duckdb | qdrant | external",
"storage_uri": "s3://... | gs://... | azure://... | noetl://...",
"compute_kind": "cloud_run | kubernetes | fargate | azure_container_apps | datacenter_worker | gpu_pool | qpu_adapter",
"credential_ref": "keychain://tenant/project/credential/alias",
"policy_decision_id": "pol_dec_...",
"freshness": {"mode": "strong | bounded | eventual", "max_lag_seconds": 30},
"movement": {"mode": "compute_to_data | data_to_compute | replica | deny"}
}
D.3 Query routing rules
- Prefer compute-to-data for large datasets, regulated datasets, and object-store partitions larger than the configured movement threshold.
- Prefer serverless container execution for short-lived, stateless, tenant-isolated workloads when the required payloads are available through registered PayloadStore adapters.
- Prefer Kubernetes/KEDA workers for durable consumers, projector StatefulSets, local shared-memory cache, custom networking, GPU/TPU/QPU adjacency, and datacenter execution.
- Prefer ClickHouse or cloud analytical warehouses for cross-tenant aggregate dashboards only after policy filtering and tenant-level aggregation rules have been applied.
- Record every route decision as an event so replay, billing, policy audit, and incident review can reconstruct why a region, store, and compute substrate were chosen.
Appendix E: Cluster-Aware Event Fabric Routing
E.1 NATSCommandPublisher locality routing
NATSCommandPublisher must become cluster-aware. Instead of publishing all commands to one configured endpoint, it asks the catalog and topology resolver for the best endpoint for the target URN/resource locality.
publish(command):
locator = NoetlResourceLocator.parse(command.target_urn)
locality = locator.locality() # tenant/org/region/zone/cluster/node/worker when present
endpoints = NatsEndpointResolver.resolve(locality, tenant, purpose="command")
selected = EndpointPolicy.choose(endpoints, health, cost, affinity, failover_policy)
publish_to(selected, subject=locator.to_nats_subject(), payload=command)
emit command.route.selected with endpoint, account, region, cluster, and fallback chain
For Cloud Run-style execution, the selected endpoint may be a regional NATS leaf/gateway endpoint or an HTTP-to-NATS command ingress. For Kubernetes, the selected endpoint may be the in-cluster NATS service. For private datacenters, the selected endpoint may be a private gateway, tunnel, or customer-managed NATS/Kafka adapter registered in the catalog.
E.2 Replaceable event and queue fabric
NATS JetStream remains the reference fabric, but NoETL must keep the event/queue fabric behind ports so customers can use the most cost-effective provider-native service when appropriate.
- NATS JetStream / supercluster: reference active event log, durable consumers, low-latency worker commands, projectors, KEDA scaling, and cross-datacenter gateway mesh.
- Kafka / Redpanda / cloud Kafka: high-throughput event streaming where customers already operate Kafka or need ecosystem compatibility.
- Redis Streams: lightweight local or edge command/event fabric for small deployments and cache-adjacent workloads.
- GCP Pub/Sub: GCP-native command and event delivery for serverless workers; canonical long-term replay still requires object archive or database/event-store backing.
- AWS SQS/SNS/EventBridge: AWS-native command queues, fanout, event routing, and serverless invocation paths.
- Azure Service Bus/Event Hubs/Event Grid: Azure-native queues, streams, and event delivery for enterprise/Azure Container Apps patterns.
E.3 Per-tenant NATS accounts in the supercluster generator
The NATS topology generator should emit account-level tenancy by default. Account boundaries enforce subject visibility, JetStream limits, quotas, and operator visibility while still allowing controlled subject exports/imports for shared resources and cross-tenant computation.
accounts:
NOETL_PLATFORM:
subjects: "$SYS.>", "noetl.catalog.>", "noetl.ops.>"
TENANT_<tenant_id>:
exports:
- "noetl.event.<tenant>.>"
- "noetl.result.<tenant>.shared.>"
imports:
- from: NOETL_PLATFORM
subject: "noetl.catalog.resolve"
limits:
jetstream_storage: tenant_quota
jetstream_streams: tenant_stream_limit
max_connections: tenant_connection_limit
E.4 Cross-cluster JetStream mirror/source
Cross-cluster JetStream mirror/source must be generated explicitly. It should not be an accidental side effect of subject naming. Use mirror for a read-only regional replica of a stream. Use source for fan-in, aggregate, or filtered stream composition. Command ownership must remain clear: commands route to the owning cell; events, catalog changes, projection notifications, and archive notifications may mirror/source across cells.
- Mirror NOETL_EVENTS from home cell to warm standby cells when fast regional failover or local replay is required.
- Source selected tenant/project event streams into global audit, billing, cost, or observability streams.
- Keep NOETL_COMMANDS owner-routed to avoid duplicate execution, unless the command is explicitly designed as idempotent fanout.
- Persist mirror/source topology in the Resource Catalog so replay can tell whether an event came from a home stream, mirror, or source aggregation.
Appendix F: PayloadStore, Replay, and Compliance Fixtures
F.1 Storage-tier spill through registered PayloadStore
All storage-tier spill paths must route through a registered PayloadStore. This applies to Arrow IPC frame spill, outbox payload bytes, worker temporary outputs, result bundles, cache eviction, and archive generation. The store returns a canonical PayloadReference that can be persisted in events and replay manifests.
payload_ref = PayloadStoreRegistry
.for_policy(tenant, project, region, payload_shape, size_bytes)
.put(bytes_or_stream, metadata={
"content_type": "application/vnd.apache.arrow.file | application/json | application/octet-stream",
"checksum": "sha256:...",
"producer_execution_id": "exec_...",
"locality": "aws/us-west-2/cell-03"
})
- Do not let tools write raw s3://, gs://, or azure:// URIs directly into replay-critical event payloads.
- Store the durable locator, checksum, content type, schema reference, size, region, storage class, and producing execution ID.
- Allow direct provider URIs only as physical locations behind a typed PayloadReference or ResourceLocation record.
- Treat shared-memory and local-NVMe Arrow cache as Tier 1.5 optimization; a durable PayloadReference must exist before cache garbage collection can make replay impossible.
F.2 Replay adapter resolution
Replay must resolve durable payload references through the same adapter registry used by runtime writes. It must support logical NoETL URIs and provider-native URIs while preserving tenant policy and bounded replay behavior.
scheme -> adapter
noetl:// ResourceCatalogResolver -> PayloadStoreAdapter
s3:// S3PayloadStore or S3-compatible adapter
gs:// GCSPayloadStore
azure:// AzureBlobPayloadStore
abfs:// Azure Data Lake / Blob adapter
file:// FilesystemPayloadStore
seaweed:// SeaweedFS adapter or S3-compatible adapter
minio:// S3-compatible adapter (test/compliance-fixture only — see § 7 policy)
ReplayPayloadResolver should return deterministic summaries by default: checksum, row count, Arrow schema fingerprint, JSON shape summary, byte size, and resolution status. Full payload bodies should be opt-in and bounded by policy.
F.3 Process-emulator compliance fixture
Cloud payload adapters need a process-emulator compliance fixture so they can join the same parametrized test suite without requiring live AWS, GCP, or Azure accounts.
- Azurite for Azure Blob and ADLS-compatible test paths.
- fake-gcs-server for GCS object-store behavior.
- moto or LocalStack for S3 behavior and AWS SDK compatibility.
- SeaweedFS (recommended) or MinIO (test/compliance only — see § 7) for S3-compatible in-house object storage.
- A single pytest parametrized compliance suite for put/get/head/delete/list, metadata sidecars, checksum validation, atomic write expectations, region/locality metadata, replay resolution, and failure injection.
- CI profiles: fast unit mocks, process-emulator integration tests, and optional live-cloud certification tests guarded by credentials and cost controls.
Appendix G: Cross-Region Supercomputer Setup Mode
The cross-region supercomputer mode combines all supported components into a single operating model for scientific, AI, enterprise, and quantum-cloud workloads. This mode is not a single cluster. It is a federation of regional cells whose resources are cataloged, policy-controlled, and addressable through NoETL resource locators.
G.1 Combined topology
Cloudflare global entrypoints
-> NoETL Gateway / API / callback ingress
-> Resource Catalog + Policy Engine + QueryRouter
-> Event/Queue Fabric Port
- NATS supercluster reference
- Kafka / Redis / cloud-native queue adapters where required
-> Compute Plane
- Cloud Run / serverless containers first
- Kubernetes + KEDA for durable workers/projectors/cache/GPU/TPU/QPU/datacenter
- AWS ECS/Fargate/Lambda containers, Azure Container Apps, GCP Cloud Run Jobs
-> Payload and Data Plane
- S3 / GCS / Azure Blob / Ceph RGW / SeaweedFS (MinIO excluded — see § 7)
- Postgres / AlloyDB / Cloud SQL / Aurora / Azure PostgreSQL
- ClickHouse / BigQuery / Redshift / Synapse-Fabric / Snowflake / Databricks / DuckDB
- Qdrant / OpenSearch / vector and search adapters
-> Replay, archive, lineage, billing, and provenance services
G.2 Quantum-cloud positioning
For quantum-cloud use, NoETL should treat QPU access as one compute adapter inside a hybrid classical/quantum operating system. NoETL should not assume every workload benefits from quantum execution. The scheduler should classify tasks by tool capability, data movement cost, queue wait, provider availability, policy, and expected value.
- Classical preprocessing and postprocessing should run near the data using serverless containers or Kubernetes pools.
- GPU/TPU workloads should route to specialized regional pools when acceleration is justified and policy allows data movement.
- QPU tasks should route through provider adapters that expose queue time, shot count, circuit limits, cost, region, and result provenance.
- Every quantum result should be stored as a cataloged resource with payload references, execution lineage, provider metadata, calibration/context references, and replay evidence.
- Cross-region sharing should use resource grants and policy-controlled replicas, not direct bucket credentials or unmanaged copied files.
G.3 Repository work items added by this validation
- Implement QueryRouter, RoutePlan, and catalog-driven query execution adapters.
- Implement NATSCommandPublisher endpoint selection by URN/locality and catalog route metadata.
- Extend NATS supercluster generator with per-tenant accounts, quotas, and subject export/import templates.
- Add JetStream mirror/source manifest generator and validation smoke tests.
- Force Arrow/outbox/cache spill through PayloadStoreRegistry with durable PayloadReference output.
- Extend ReplayPayloadResolver to handle s3://, gs://, azure://, abfs://, file://, noetl://, and S3-compatible object stores.
- Add process-emulator compliance fixtures: Azurite, fake-gcs-server, moto/LocalStack, MinIO/SeaweedFS.
- Add serverless compute adapters for Cloud Run, Cloud Run Jobs, AWS ECS/Fargate/Lambda container paths, and Azure Container Apps where tasks are stateless and bounded.
- Add cost-aware placement metrics covering queue wait, egress, compute cost, storage class, serverless cold starts, and GPU/TPU/QPU provider queues.
Appendix H: Rust Migration Path and Unified Executor Roadmap
This appendix sets the technical-direction decision for moving the runtime hot path to Rust over the long term while protecting the production Python platform (noetl/noetl v2.103.x) that just shipped all seven phases of the v2 distributed-runtime spec. It is the binding plan for the work the next several quarters will execute against.
H.1 Current Rust footprint
| Repository | Crate | LoC | Role | Current state |
|---|---|---|---|---|
noetl/cli | noetl | ~10.7k | Production CLI; ships the noetl binary; includes playbook_runner.rs for partial local-mode playbook execution. | v2.17.1, mature. |
noetl/gateway | noetl-gateway | mature | Production gatekeeper-only edge service. Auth, SSE, subscription routing. | v2.12.0, mature. |
noetl/tools | noetl-tools | ~7.3k | Shared tool library — 14 tool kinds: http, postgres, duckdb, ducklake, python (subprocess), rhai, shell, snowflake, transfer, playbook, secrets, script, noop, plus the registry, context, auth, template, and error modules. | v1.0.0, decent breadth; no Arrow yet. |
noetl/server | noetl-server | ~13.8k (engine 1,967 LoC) | Control-plane skeleton split out of the monorepo. | v1.0.3, early — ~15-20% feature parity with the Python engine. |
noetl/worker | noetl-worker | ~1.9k | Worker skeleton — NATS pull consumer that dispatches to noetl-tools. | v1.0.0, skeleton. |
The full production engine, projector, frame-shaped cursor loops, replay services, frame/stage projections, ArrowIpcSharedMemoryCache, MCP catalog architecture, DSL parser/planner/render layer, and credential/keychain resolution still live in noetl/noetl (Python, ~89k LoC). All seven v2-spec phases ship from there today.
H.2 Decision: hybrid execution, Rust-first on the hot path
The runtime should move toward Rust where Rust earns its keep and stay Python where Python's productivity is unmatched. The decision is hybrid by intent, not by accident.
| Dimension | Python today | Rust target | Why this shape |
|---|---|---|---|
handle_event tail latency | p90 = 140 ms after five rounds of profiling and elision (see noetl/ai-meta#29) | p90 ~10–30 ms with no tuning effort | 5–10× headroom that compounds across worker count. |
| Worker memory footprint | ~250 MB resident per pod | ~30–60 MB | 4–8× more workers per node; cheaper KEDA scale-out. |
| Arrow IPC consumer | pyarrow ABI binding — copy on every read | arrow-rs mmap — true zero-copy | The compounding win that justifies a Rust worker existing at all. |
Type safety on credential / _ref / keychain envelopes | Runtime assertions plus redaction carve-outs | Compile-time enum exhaustiveness | Catches the specific bug class hit several times during the keychain rollout. |
| Deployment artifact | Python interpreter plus virtualenv plus ~200 wheels | Single statically linked binary | Faster pod cold-start; smaller container images. |
| DSL parser / planner / replay analytics / MCP catalog | High productivity; rarely a hot path | Not a Rust priority | Rewrite cost not justified by Rust gains in low-frequency, high-complexity code. |
Engine state machine (_handle_event_inner) | Mature, well-tested, recently optimised | Likely Rust phase R-4 if load data supports it | Decision deferred until measured production need exists. |
The blueprint does not endorse a full Python-to-Rust rewrite of the production platform. A rewrite restarts the regression-cost clock for code that already absorbed the v2-spec phases and the post-spec performance work.
H.3 The architectural hinge: shared utilities, distinct control loops
Note (2026-05-30): the original wording of this section claimed CLI and worker were "95% identical execution logic" unified behind one
CommandSourcetrait. After R-1.1 PR-1 + PR-2a landed and the post-extractionplaybook_runner.rswas surveyed in depth, that claim was found wrong in one important way: the CLI is a tree walker and the worker is a pull-model consumer. These control loops are fundamentally different shapes; only the utilities under them are shared. § H.10 documents the finding and the resulting plan revision in full; the rest of this section has been rewritten to match.
repos/cli/src/playbook_runner.rs executes playbooks locally as a recursive tree walker — it loads the YAML, walks the workflow, evaluates next arcs / case conditions / then blocks in place, and dispatches each step to a tool implementation inline. repos/worker executes one command at a time pulled from a NATS durable consumer, with no tree to walk and no recursive routing decisions — the engine on the server side made those already.
These are fundamentally different control loops. An attempt to flatten the CLI's tree walker into a pull-model iterator loses local-debug clarity (the tree shape is gone) and produces awkward state management around case / then recursion. The unified-executor abstraction in the original Appendix H draft is the wrong abstraction for the CLI.
What CLI and worker actually share — and what noetl-executor should host — are utilities and types, not the control loop:
Revised target layout:
repos/cli/executor/ (noetl-executor workspace crate)
Cargo.toml
src/
lib.rs
playbook.rs // YAML types (R-1.1 PR-2a, MERGED)
template.rs // Jinja2 + Rhai rendering (R-1.1 PR-2b)
condition.rs // when / case condition evaluation (R-1.1 PR-2b)
credentials.rs // $noetl_ref resolution + redaction (R-1.1 PR-2b)
runtime.rs // ExecutionContext, EventEmitter,
// CredentialResolver (R-1.1 PR-1, MERGED)
events.rs // EventSink trait + ExecutorEvent shape
// (R-1.1 PR-1, MERGED)
worker/ // worker-specific control loop pieces
source.rs // trait CommandSource — WORKER-ONLY
// abstraction; CLI does not use it
sources/
nats.rs // NATS pull consumer (R-1.3)
dispatch.rs // route Command -> tool kind via
// noetl-tools registry (R-1.3)
repos/cli/ <-- depends on noetl-executor; uses utilities only;
keeps its own tree walker in playbook_runner.rs
repos/worker/ <-- depends on noetl-executor; uses utilities + CommandSource
+ NatsCommandSource impl
Both binaries depend on noetl-executor for the same set of utilities (template rendering, condition evaluation, credential resolution, event emission shape, YAML types). The CLI keeps its recursive tree walker (lines 213-770 of playbook_runner.rs — run, execute_step, execute_next_steps, execute_router_arcs, etc.) because that shape is the natural fit for local YAML execution. The worker keeps its NATS pull loop because that shape is the natural fit for distributed-runtime semantics.
What this gives up vs the original vision:
- The CLI does NOT get distributed-runtime semantics in local mode for free. Local mode stays the lightweight tree-walker shape. If a CLI user wants distributed-runtime semantics, they invoke a remote server via the
noetl-sdk(R-4 / § H.9). - The worker does NOT get the CLI's ergonomic single-binary debug surface. The worker is a daemon by nature.
What this still gains:
- Type-safe parsed playbook shape (R-1.1 PR-2a, shipped).
- Shared Jinja2 + Rhai template rendering (R-1.1 PR-2b).
- Shared
$noetl_refcredential resolution + response-boundary redaction (R-1.1 PR-2b). - Shared event-emission shape matching the Python
noetl.runtime.events.report_eventenvelope (R-1.1 PR-1, shipped). - Shared Arrow-IPC codec + cache (R-2).
- Shared tool registry via
noetl-tools(R-3).
Existing CLI features that move into noetl-executor (R-1.1 PR-2b's revised scope):
- Template rendering against the merged step context (
render_template,render_template_with_result,get_json_path,json_to_rhai,rhai_to_json_string). - Condition evaluation (
evaluate_condition,evaluate_rhai_condition). - Capability validation (
validate_capabilities).
What does NOT move into noetl-executor:
- The tree-walker control loop (
run,execute_step,execute_next_steps,execute_router_arcs) — stays inplaybook_runner.rs. - The inline tool execution logic (
execute_tool,execute_shell_command,execute_http_request,execute_duckdb_query, etc.) — R-1.1 PR-2c replaces these with calls into thenoetl-toolsregistry.
H.4 Apache Arrow-native data plane in Rust
Adopt the arrow-rs crate family (arrow.apache.org/rust/arrow) as the columnar substrate for the Rust worker. Three immediate use cases:
- Zero-copy IPC consumer. Mirror the Python
ArrowIpcSharedMemoryCacheAPI in a newnoetl-arrow-cachecrate. The Rust consumer mmap-reads a Python-produced segment without serialization or copy. Bidirectional: Rust producers emitIpcHintrecords that Python consumers read through the existingTempStore.get_ipc_bytesfast path. - Apache Arrow Flight endpoint. Add a Flight gRPC server on the worker for cross-pod / cross-node IPC retrieval. Closes the cross-language zero-copy gap noted in section 14 (Apache Arrow IPC Tier 1.5 payload plane).
- Frame projection over
RecordBatch. The cursor-loop frame model maps naturally to ArrowRecordBatch. The Rust frame projector ingests Feather V2 segments produced by Python or Rust workers, applies the planner's reduce, and emits the sameframe.committedenvelope shape.
The arrow-rs ecosystem also brings arrow-flight, arrow-ipc, parquet, and datafusion. Each becomes available as a tool kind without re-engineering: a future kind: datafusion step runs SQL over Arrow RecordBatch inputs without leaving the worker process.
Constraints to honour:
- The
IpcHintenvelope shape is fixed by the Python implementation. The Rust side conforms; do not invent a parallel shape. noetl-arrow-cachereleases segments by lease, not by reference count. Same eviction policy as Python, same TTL defaults, same Prometheus counter names so dashboards keep working.- Durable PayloadStore reference remains mandatory before any cache GC can run. Arrow IPC is a Tier 1.5 accelerator, never the system of record.
H.5 Phased migration plan
The plan is structured so each phase delivers measurable value and has a clear stop criterion. Sunk-cost protection is explicit.
Phase R-0 — blueprint alignment (this appendix)
- This document update fixes the stale line-871 paragraph on Arrow IPC Tier 1.5 implementation status and adds the migration roadmap.
- No code change.
- Exit: this appendix lands.
Phase R-1 — noetl-executor shared crate (~1–2 months)
- REVISED 2026-05-30 — see § H.10 for the architectural finding that forced this revision.
- Bootstrap
noetl-executoras a workspace crate underrepos/cli/(R-1.1 PR-1 — MERGED as noetl/cli#20). - Extract the YAML playbook types from
playbook_runner.rsintonoetl-executor::playbook(R-1.1 PR-2a — MERGED as noetl/cli#21). - Extract template rendering + condition evaluation + capability validation + credential resolution helpers into
noetl-executor::{template, condition, credentials}(R-1.1 PR-2b — revised scope, ~430 LoC). - Replace the CLI's inline tool execution (
execute_tooland friends) with calls into thenoetl-toolsregistry (R-1.1 PR-2c, ~870 LoC). - The CLI's tree walker (
run,execute_step,execute_next_steps,execute_router_arcs) stays inplaybook_runner.rs. Closing R-1.1 (PR-2d):noetl/cli#19closes when the CLI's tree walker only contains control-flow logic and delegates every other concern tonoetl-executor+noetl-tools. - The worker (R-1.3) depends on
noetl-executorfor utilities and for its ownCommandSource/NatsCommandSourceimpls.CommandSourceis a worker-only abstraction; the CLI does not implement or consume it. - Ship criterion: the CLI's local-mode runner consumes the same
noetl-executorutility surface that the worker does; the two binaries share types, template rendering, condition evaluation, credential resolution, and tool dispatch vianoetl-tools, but each owns its own control loop. - Stop criterion: if utility extraction creates more abstraction than it removes (e.g. template rendering needs a per-binary ExecutionContext that's too divergent), keep that utility in
playbook_runner.rsfor now and revisit after R-2 lands.
Phase R-2 — Apache Arrow Rust integration (~1 month, parallel to R-1)
- Add
arrow,arrow-ipc,arrow-flightto thenoetl-toolsworkspace. - Implement
noetl-arrow-cachecrate as the Rust mirror ofArrowIpcSharedMemoryCache. - Rust worker consumes Python-produced
IpcHintsegments with measurable zero-copy. - Arrow Flight gRPC endpoint on the worker.
- Benchmark target: ≥5× over
pyarrowround-trip on a 1M-row Parquet through the IPC fast path. - Stop criterion: if
pyarrow's ABI binding is already meeting workload needs (verify against production timing fields inbatch.completed.context), deprioritise Arrow Flight to backlog and keep Rust-only zero-copy for the IPC cache.
Phase R-3 — Rust worker tool parity (~2–3 months)
- Extend
noetl-toolsto match the Python worker's tool-kind surface for the production playbook mix. - Add keychain
$noetl_refresolution, response-boundary credential redaction, MCP-protocol agent dispatch, full event-emission semantics. - KEDA scales Rust and Python workers off the same NATS subjects; the orchestrator picks the worker class by tool kind.
- Ship criterion: the travel and glut-probe playbooks run end-to-end on a Rust worker with the same
batch.completed.contextshape as the Python worker. - Stop criterion: if ≥30% of the production tool-kind surface requires Python subprocess fallback (long-tail Python tools without a Rust port), declare the worker hybrid (Rust shell handling the common cases, Python worker handling the tail) and stop here.
Phase R-4 — Rust server hot path + Polars-pattern Python wrapper (~6 months in)
R-4 is no longer a deferred decision. The endpoint of the migration is the Polars pattern: Rust is the runtime, Python is a wrapper that ships in the same pip install noetl distribution. See § H.9 for the full shape; the deliverables for R-4 itself are:
- Port
_handle_event_inner, the cascade event-log batching, the save_state coalescing, and the projector hot path fromnoetl/noetl(Python) intonoetl/server(Rust). Keep Python for replay analytics, MCP catalog tooling, DSL parser, and the long-tail of frame-projector logic — code that is low-frequency or high-complexity and does not benefit from Rust. - Ship the Python PyO3 wrapper (
pip install noetl) that embeds the Rust binaries and exposes the existingnoetl/noetlAPI surface so current Python users migrate without touching their playbooks. - Ship
noetl-sdk(pure Python, no Rust binaries) for notebook users and remote clients who don't want the embedded runtime. - Ship
noetl-tools-pythonso Python callables registered as@noetl.toolkeep working — dispatched across the PyO3 FFI boundary with zero-copy ArrowRecordBatchinterop.
Ship criterion: pip install noetl installs the Rust runtime + Python wrapper from one wheel and runs the existing Travel and glut-probe playbooks unchanged. Stop criterion: if PyO3 binding overhead measurably regresses the worker tail latency past round-5 baselines, ship Rust-only and provide noetl-sdk as the Python user surface (no embedded runtime).
H.6 What does not change
- The Python platform stays the production control plane through all of R-1 / R-2 / R-3. R-4 introduces the Polars-pattern Python wrapper (§ H.9) so Python users keep their existing programmatic surface even after the runtime moves to Rust.
- The v2-spec phase audit (table on line 968) remains the source of truth for distributed-runtime spec status. Rust migration phases are tracked separately under this appendix.
- The wire contracts —
noetl.eventshape, NATS subject patterns,batch.completed.contextfield catalogue,IpcHintenvelope — do not change. Rust adoption is invisible to playbook authors and to the gateway. - Tool authors continue to add tool kinds in
noetl-tools(Rust) andnoetl/tools/(Python). The orchestrator picks the worker class by tool kind; both shells coexist.
H.7 First implementation steps
The first set of pull requests that turn this appendix into code, in dependency order:
noetl/executorrepository bootstrap. Newrepos/executor/Rust crate withlib.rs,CommandSourcetrait,LocalPlaybookSourceextracted fromrepos/cli/src/playbook_runner.rs,dispatch.rswired tonoetl-toolsregistry. CI passing.noetl/clidepends onnoetl-executor. Replace the inline local-mode runner with the shared crate. CLI integration tests pass unchanged.noetl/workerdepends onnoetl-executor. Replace the worker's NATS-dispatch path with aNatsCommandSourceplus the shared executor. End-to-end test against a local NATS shows the same event-emission shape as the Python worker.noetl-arrow-cachecrate. Implements the PythonArrowIpcSharedMemoryCacheAPI in Rust. Unit tests round-trip anIpcHint. Benchmark againstpyarrowbaseline.- Arrow Flight endpoint on
noetl/worker. Cross-pod IPC retrieval test green; metric counters published alongside the existing Python projector'ssummary["ipc"]block.
Each PR ships with the same engineering discipline used through noetl/ai-meta#29: per-phase timing instrumentation, before/after measurement, an ai-task issue tracking the round, and lockstep wiki + docs updates.
H.8 Open questions to decide before R-1 starts
- Should
noetl-executorlive as a fourth top-level repository, or as a workspace crate insidenoetl/cli? Workspace crate is faster to bootstrap; separate repository is cleaner for the eventualnoetl-workercargo dependency. Recommendation: workspace insidenoetl/clifor R-1, promote to standalone whennoetl-workeris ready to depend on it. - Should the Rust worker speak the same
batch.accepted → batch.processing → batch.completedlifecycle as the Python worker, or define its own status taxonomy? Recommendation: same lifecycle. Cross-stack observability is more valuable than schema purity. - Should
noetl-arrow-cacheandnoetl-executorbe published to crates.io or stay internal? Recommendation: stay internal until R-3 ships and the API has stabilised.
H.9 The endpoint: Polars-pattern Python wrapper around the Rust runtime
The long-term shape of NoETL — once the Rust migration concludes — is the same shape that Polars uses to replace pandas in the Python data ecosystem: a Rust core with a Python wrapper that ships in the same distribution. Python users keep their ergonomic surface (import noetl as nt, @noetl.tool, notebook-friendly types). The runtime they call into is Rust. Both halves ship in one pip install noetl wheel built with PyO3 + maturin, the same toolchain Polars, Pydantic V2, Ruff, and Ty use today.
This is what makes the migration a one-way change instead of a tower of optional steps: R-4 isn't a deferred "should we rewrite the server" question — it's the ship date of the Python wrapper that lets the Rust takeover happen without breaking any Python user.
H.9.1 Three Python surfaces
The Python distribution splits into three packages, each with a distinct audience:
| Package | Ships | Audience | Use when |
|---|---|---|---|
pip install noetl | PyO3 wrapper + embedded Rust binaries (noetl-server, noetl-worker, noetl-cli, noetl-executor, noetl-tools, noetl-arrow-cache) | Today's noetl/noetl Python users and Jupyter notebook authors who want a single-install runtime. | Local execution, embedded runtime in tests, prototyping, single-host deployments. |
pip install noetl-sdk | Pure Python types + thin client to a remote NoETL server | Data scientists, notebook users, partner integrations who don't want the embedded runtime. | Talking to a remote noetl/server over HTTP/gRPC, no local Rust binaries needed. |
pip install noetl-tools-python | PyO3 hooks that register Python callables as noetl-tools kinds at runtime | Anyone authoring tools as Python functions (@noetl.tool def my_transform(...)) | The Python tool surface that stays Python after R-3 — long-tail tools without a Rust port. |
This mirrors Polars's split (polars → Rust core + Python wrapper; polars-arrow / polars-core available separately on crates.io for Rust consumers). The pattern is mature and the tooling is proven.
H.9.2 FFI shape and Arrow zero-copy across the boundary
The PyO3 boundary is not free — but it doesn't have to be expensive either, because the data plane is Arrow.
- Control plane (
noetl.run(playbook="x.yaml", workload={...})) crosses PyO3 once per playbook invocation. Marshalling cost is negligible relative to playbook duration. - Tool dispatch for a Rust tool kind never crosses PyO3 —
noetl-executorcallsnoetl-toolsdirectly. Python is not in the loop. - Tool dispatch for a
noetl-tools-python(Python-callable) tool kind crosses PyO3 once per command, with the input + output as ArrowRecordBatchreferences. PyO3'spyarrowinterop hands the same Arrow buffer to Python without copy. This is the same path Polars uses to expose Rust DataFrames to Python in zero-copy form. - Arrow IPC cache (
noetl-arrow-cache) is bidirectional: Rust producers emitIpcHintsegments that the Python wrapper mmap-reads; Python tool outputs land in the same shared-memory region for downstream Rust consumers.
The Python side never sees a serialization round-trip on hot-path data. Where Python touches the data is exactly where Polars touches it — through pyarrow-compatible buffers.
H.9.3 What the migration looks like to a Python user
- Before (today,
noetl/noetlv2.103.x):pip install noetl-python(or whatever the Python distribution is named); Python interpreter runs the engine; tools are Python functions; control plane is Python. - After (R-4 ships):
pip install noetl; Rust binaries are embedded; the existing Python API surface (noetl.run,noetl.execute_playbook, decorators) is preserved by a PyO3 wrapper that delegates to the Rust binaries; existing Python tools registered via@noetl.toolkeep working throughnoetl-tools-python. - Their playbooks are unchanged. YAML doesn't care which language the runtime is written in.
- Their performance improves — the cumulative gains of R-1 through R-4 land for free at upgrade time.
This is the same migration path pandas users take to Polars: install the new package, change the import, and ship. Most playbook authors won't even need to change the import — the noetl package name stays the same; only the wheel contents change.
H.9.4 Tradeoffs to honour
- Wheel size grows. PyO3 wheels with embedded Rust binaries are larger than pure-Python wheels. Polars ships ~30 MB wheels; NoETL's runtime will likely be similar. Acceptable for a distribution that includes a database client (
duckdb), HTTP client, Arrow, and a NATS client. - Multi-platform wheel building gets harder. Need
cibuildwheel(or equivalent) producingmanylinux_2_28_x86_64,manylinux_2_28_aarch64,macosx_*_universal2, andwin_amd64wheels. The toolchain is well-trodden — Polars's CI is a usable template. - GIL coordination. Python tools registered via
noetl-tools-pythonrun under the GIL by default. Truly parallel Python tools (rare, but used in thepythontool kind for CPU-bound transforms) need the tool author to release the GIL explicitly (#[pyfunction(text_signature = "...", pass_module = false)]+py.allow_threads(...)). Document this in the@noetl.tooldecorator docs; provide an opt-ingil_release=Trueflag. - PyO3 ABI compatibility. Pin the PyO3 version per release and validate against the supported Python versions (3.10+ recommended). Use the
abi3feature so one wheel works across Python minor versions.
H.9.5 What this changes in the migration plan
- R-1, R-2, R-3 are unchanged. The Polars pattern is the endpoint; the path to get there is still the phased migration.
- R-4's ship criterion sharpens. Instead of "decide whether to port the server hot path," it's "ship
pip install noetlwith the embedded Rust runtime and the PyO3 wrapper, validated against the existing Travel and glut-probe playbooks." - R-5 (out of scope of this appendix, planning placeholder). After R-4 lands and Python users migrate, the Python control-plane code paths in
noetl/noetlcan be deleted once telemetry shows no consumers remain. Thenoetl/noetlrepo becomes the PyO3 wrapper repo; the Rust runtime lives innoetl/server,noetl/worker,noetl/executor,noetl/tools,noetl/cli, andnoetl/arrow-cache.
H.9.6 Reference projects
- Polars — the proof point.
pl.DataFrameis a RustDataFrameexposed via PyO3 with Arrow-native interop. Pandas-compatible enough for migration; 5–50× faster in production benchmarks. - Pydantic V2 (
pydantic-core) — Pydantic moved its hot path from Python to Rust through PyO3 without breaking its public Python API. Same pattern, smaller surface. - Ruff — Rust replacement for
flake8/isort/etc with a Python interface. Distribution shape (wheel + binary) is a template. - Ty — Astral's new Rust type checker with Python wrapper. Active project worth tracking for the latest PyO3 patterns.
maturin— the build tool that produces PyO3 wheels. Required toolchain forpip install noetl.
H.9.7 Naming
Recommended package names at R-4 ship:
- PyPI:
noetl(renames over the current Python distribution — coordinate with PyPI maintainers). - PyPI:
noetl-sdk(new, pure-Python remote client). - PyPI:
noetl-tools-python(new, registers Python callables as tool kinds). - Crates.io (R-3 / R-4):
noetl-server,noetl-worker,noetl-cli,noetl-executor,noetl-tools,noetl-arrow-cache,noetl-gateway(the gateway already lives here). - Crates.io publish flag flips from
publish = falsetopublish = trueper crate as its API stabilises.
The user-visible Python import stays import noetl as nt. Code that runs today against noetl/noetl v2.103.x runs unchanged against pip install noetl after R-4 lands.
H.10 Architectural finding (2026-05-30): tree walker vs pull-model
This section records an architectural finding that surfaced after R-1.1 PR-1 (executor skeleton) and PR-2a (YAML types extraction) landed and the post-extraction playbook_runner.rs was surveyed in depth. The finding revises § H.3 and § H.5 R-1.1; § H.4 (Arrow) and § H.9 (Polars endpoint) are unaffected.
H.10.1 The finding
The original Appendix H plan assumed the CLI and worker shared roughly 95% of their execution logic and that a unified CommandSource trait — LocalPlaybookSource for the CLI, NatsCommandSource for the worker — would be the integration surface. Reality is different:
- CLI (
repos/cli/src/playbook_runner.rs, ~2,277 LoC after PR-2a): the runner is a recursive tree walker. It loads the YAML playbook, walks the workflow steps recursively, evaluatesnextarcs /caseconditions /thenblocks in place, and dispatches each step to a tool implementation inline. Control flow is the call stack. - Worker (
repos/worker/, ~1.9k LoC skeleton today): the runner is a pull-model consumer. It subscribes to a NATS durable consumer, pulls one command at a time, executes it, emits events, and repeats. No tree. No recursion. No routing decisions — the server's engine made those already and emitted typed commands.
Flattening the CLI's tree walker into a pull-model iterator is doable in principle but has real costs:
- Local-debug clarity drops. A user running
noetl run --runtime local foo.yamlbenefits from the call stack mirroring the YAML structure. A flattened iterator loses that mapping. - The implementation needs explicit state management for
case/thenrecursion. The recursive tree walker gets this for free from the language. - The integration tests written against the tree shape (assertions like "step A's next arc ran after step A's tool call returned") would need rewriting.
The cost-benefit doesn't justify the change.
H.10.2 Revised target shape
noetl-executor becomes a utilities-and-types crate, not a control-loop crate. Both CLI and worker depend on it for:
- YAML types (
playbook::*) — shipped in PR-2a. - Template rendering (Jinja2 + Rhai) — moves in PR-2b.
- Condition evaluation (
when,case) — moves in PR-2b. - Capability validation — moves in PR-2b.
- Credential resolution (
$noetl_ref) + response-boundary redaction — moves in PR-2b. - Event-emission shape (
ExecutorEvent,EventSinktrait) — shipped in PR-1. - Arrow-IPC codec helpers — shipped in R-2.1 PR.
- Tool dispatch via
noetl-toolsregistry — wired in PR-2c.
Each binary owns its control loop:
- The CLI owns its recursive tree walker (
run,execute_step,execute_next_steps,execute_router_arcs). It stays inplaybook_runner.rs. - The worker owns its NATS pull loop.
CommandSourceandNatsCommandSourcelive undernoetl-executor::worker::*and are worker-only.
H.10.3 What R-1.1 PR-2b looks like under the revised plan
R-1.1 PR-2b shifts from "lift the parser + command-generation logic (~1,200 LoC) into LocalPlaybookSource" to:
- Extract shared utility methods from
playbook_runner.rsimpl into newnoetl-executormodules:noetl-executor::template—render_template,render_template_with_result,get_json_path,json_to_rhai,rhai_to_json_string.noetl-executor::condition—evaluate_condition,evaluate_rhai_condition.noetl-executor::capabilities—validate_capabilitiesand theRuntimeCapabilitiesfactory methods.
- Replace the inline method definitions in
playbook_runner.rswith calls into the new modules. - ~430 LoC of utility extraction; control loop untouched.
The placeholder LocalPlaybookSource introduced in PR-1 (queue-backed source for testing the trait machinery) is removed as part of PR-2b since the abstraction it sketched is not how the CLI works. The CommandSource trait moves under noetl-executor::worker::source to make its scope explicit.
H.10.4 What R-1.1 PR-2c and PR-2d look like under the revised plan
- PR-2c (~870 LoC): replace the CLI's inline tool execution (
execute_tool,execute_shell_command,execute_rhai_script,execute_http_request,execute_duckdb_query, etc.) with calls into thenoetl-toolsregistry. The CLI addsnoetl-toolsas a regular dependency. After this lands, the CLI and worker run the same tool implementations. - PR-2d (small): close
noetl/cli#19with a final pass that documents the new shape in the CLI README + wiki, and confirms via integration tests that the CLI's tree walker only contains control-flow logic.
H.10.5 What R-1.2 and R-1.3 look like under the revised plan
- R-1.2 is folded into PR-2b/2c — there is no separate "CLI depends on noetl-executor" PR; the CLI gains the dependency in PR-2b and uses more of the executor surface in PR-2c.
- R-1.3 (worker depends on noetl-executor) becomes the moment the worker grows beyond skeleton size. It depends on
noetl-executor::{playbook, template, condition, credentials, events}ANDnoetl-executor::worker::{source, sources::nats}. The CLI continues to depend on the utility modules only.
H.10.6 Why this is the right call
The original Appendix H draft was written before R-1.1 PR-1 / PR-2a landed and before the post-PR-2a state was surveyed in depth. Three rounds of code-level inspection (the PR-1 skeleton design, the PR-2a types extraction, and the post-PR-2a survey that produced this finding) revealed that the "95% identical execution logic" framing was wrong. The honest answer is closer to "60% identical utility logic, 0% identical control loop logic."
Recording this in writing — instead of quietly proceeding with the wrong abstraction — protects future sessions from re-running the same investigation and protects R-1.3 (worker) from inheriting a shape it doesn't want.
H.10.7 What does not change
- The Polars-pattern endpoint (§ H.9) is unchanged. Whether
pip install noetlships a Rust runtime + PyO3 wrapper is independent of whether that runtime's local mode uses a tree walker or a pull loop. - The phased R-1 / R-2 / R-3 / R-4 cadence is unchanged. Phases ship in the same order; only R-1's internal scope was wrong.
- The Apache Arrow data plane (§ H.4) is unchanged. Arrow is a payload format; it doesn't care which control loop produces or consumes the IPC bytes.
- The compatibility contracts (Python
noetl.runtime.events.report_eventenvelope shape,noetl.commandrow shape,IpcHintenvelope) are unchanged.
H.10.8 PRs affected
- R-1.1 PR-1 (#20, merged) — unchanged in retrospect; the
runtime/events/source/dispatchmodules it shipped are still the right scaffolding. TheLocalPlaybookSourceplaceholder insources/local_playbook.rsgets removed in PR-2b. - R-1.1 PR-2a (#21, merged) — unchanged; the YAML types extraction is correct under both the old and new plan.
- R-1.1 PR-2b (next) — rescoped per § H.10.3. ~430 LoC of utility extraction instead of ~1,200 LoC of parser/command-generation logic.
- R-1.1 PR-2c (after 2b) — rescoped per § H.10.4. Replaces inline tool execution with
noetl-toolscalls. - R-1.1 PR-2d (after 2c) — rescoped per § H.10.4. Documentation + integration-test pass; closes
noetl/cli#19.
H.10.9 Process note
Architectural findings that surface mid-implementation are normal. The recovery path is the one this section demonstrates: stop coding, write the finding into the binding plan (this doc), and re-enter implementation with the revised scope. Coding through an architectural mismatch is how plans become wrong on paper while the code drifts in the right direction silently — leaving future readers of the plan confused about what was decided and why.
A similar process note in the noetl/ai-meta memory bank (2026-05-22 v2-spec Phase 3 audit refreshed) demonstrates the same recovery for the v2-spec Phase 3 audit-table drift. Both follow the same pattern: write down the finding, revise the source of truth, then continue.
H.11 Local-mode Arrow Feather batch buffering for events + commands
H.11.1 The optimization
When a NoETL playbook runs in local mode (the CLI's recursive tree walker, not the worker's NATS pull loop), every event and command the runner emits today goes through one of two paths:
- Connected mode: HTTP POSTs to a remote
noetl/server(one round-trip per event). - Disconnected mode: written to a local SQLite/JSON event log under
~/.noetl/.
Both shapes are wasteful for the common case where the user runs noetl run --runtime local foo.yaml against a fast local-disk environment. The connected mode burns network round-trips on events that will all land in the same execution archive anyway; the disconnected mode burns disk syscalls and gives up the columnar shape that the rest of the pipeline expects.
The optimization: buffer the entire event + command stream of one local execution as in-memory Arrow RecordBatch values, dump them to a single Feather V2 file on local disk during the run, and push the full batch to the remote event log atomically at execution exit. When the local execution completes successfully, the runner uploads the Feather file as one batch.completed envelope. When it fails, the runner uploads what was buffered up to the failure point, then the failure-recovery event.
H.11.2 Why Arrow Feather
The Rust arrow-rs crate family (already added to noetl-tools in R-2.1) ships:
arrow::array::RecordBatch— typed columnar batch with schema.arrow-ipc::writer::FileWriter— Feather V2 / IPC file format writer.arrow-ipc::reader::FileReader— random-access reader for replay and partial restore.
This makes the local buffer:
- Compact — Feather V2 dictionary-encoded columns are 5-10× smaller than the equivalent JSON event log on disk.
- Replay-friendly — the same
IpcHintenvelope the worker uses to consume Python-produced Arrow segments (§ H.4) works for replay of a local-mode execution against a remote server. - Crash-safe — Feather V2 files are append-only with chunk-boundary footers; an interrupted local run keeps the events up to the last footer.
H.11.3 Shape
Per-execution layout under ~/.noetl/local/<execution_id>/:
~/.noetl/local/<execution_id>/
events.feather // append-only Arrow RecordBatch chunks
commands.feather // append-only Arrow RecordBatch chunks
metadata.json // execution_id, started_at, playbook_path, status
payloads/ // out-of-line large payloads (>4 MiB threshold)
<event_id>.arrow // referenced from events.feather via PayloadReference
Schema for events.feather:
| column | type | source field |
|---|---|---|
event_id | Int64 | snowflake id allocated locally |
execution_id | Int64 | parent execution |
event_type | Utf8 | batch.accepted, step.exit, call.done, etc. |
step | Utf8 | step name |
status | Utf8 | COMPLETED, FAILED, RUNNING |
created_at | Timestamp(us, UTC) | wall-clock |
context | Binary (JSONB) | the event's context map serialized as JSON bytes |
result | Binary (JSONB) | the event's result envelope serialized as JSON bytes |
meta | Binary (JSONB) | meta map |
parent_event_id | Int64 (nullable) | parent event id |
payload_ref | Utf8 (nullable) | local://payloads/<event_id>.arrow when large payload exists |
The commands.feather schema mirrors the Python-side noetl.command row shape from § 13's Command envelope.
H.11.4 Flush policy
Three triggers for flushing the in-memory RecordBatch builder to disk:
- Buffer-size threshold (default 4 MiB): flush a batch and start a new one. Keeps each Feather chunk small enough to memory-map cleanly.
- Execution exit (success or failure): flush remaining buffer, write the metadata.json's
statusfield, and close the file. - Idle timeout (default 30 s) — for long-running local executions, flush periodically so partial recovery has bounded loss.
H.11.5 Push-to-server contract
At execution exit (success or failure), the CLI POSTs the Feather files as one batched event envelope:
POST /api/events/batch
Content-Type: application/vnd.apache.arrow.file
X-NoETL-Execution-Id: <execution_id>
X-NoETL-Event-Count: <count>
<Feather V2 file bytes>
The server (R-4 Rust runtime + PyO3 wrapper) reads the Feather file, decodes the RecordBatch chunks, and applies them to the event log atomically inside one transaction. Existing batch.accepted / batch.processing / batch.completed semantics from § 13 still apply; the only difference is that one HTTP call delivers ~thousands of events instead of one.
When the user runs noetl run --runtime local --offline foo.yaml, the push step is skipped and the Feather files stay under ~/.noetl/local/<execution_id>/ indefinitely. A separate noetl push <execution_id> command flushes them later.
H.11.6 What this gains
- Network round-trips: ~thousands per execution → 1 per execution for connected local mode.
- On-disk footprint: 5-10× smaller than the current JSON event log.
- Replay parity: the same Feather +
IpcHintmachinery the worker uses for shared-memory cache reads (§ H.4) consumes local Feather buffers. No new replay format. - Offline-first ergonomics:
--offlineruns produce reviewable Feather files; users can audit a local run before pushing it.
H.11.7 What changes in the migration plan
- R-2.1 follow-up (
noetl-arrow-cachecrate per § H.4): add anoetl-local-buffermodule that wraps the same Feather IPC writer for local-mode use. Same code paths as the shared-memory cache; different sink (file vstmpfs). - R-1.1 PR-2b (revised per § H.10): keep the CLI's recursive tree walker as the local-mode control loop. The
EventSinkimpl swaps from "POST to server" to "append to Feather buffer". No change to the tree walker's call graph. - R-3 (worker tool parity): the worker continues to emit events via NATS as today. No worker-side change.
H.11.8 Implementation deferred to
- A new
R-1.4phase under § H.5 once R-1.1 PR-2b lands. Concrete scope: ~300 LoC for theLocalEventBufferwriter + ~150 LoC for thenoetl pushCLI command + Feather-aware batch ingestion on the server side (Rust, R-4). Until R-4 the server-side ingestion stays Python; the Pythonbatch.acceptedpath gains a content-type sniff to decode Arrow file bytes when it seesapplication/vnd.apache.arrow.file.
H.12 Dynamically-generated playbook execution units
H.12.1 The scenario
NoETL's agent-orchestration architecture doc establishes that agents are themselves playbooks, dispatched via tool: kind: agent with framework: adk|langchain|custom. The natural extension is dynamically-generated playbooks — playbook execution units produced at runtime by an upstream agent (typically an LLM via the ADK / LangChain / MCP bridges) rather than read from a static YAML file in the catalog.
Example: a user submits a high-level intent ("rebuild the climate-research dataset from the upstream feed and notify the lab"). An LLM-backed agent decomposes the intent into a sequence of noetl-tools calls (data fetch → transformation → notification), emits the result as a Playbook YAML structure, and hands that structure to the runtime for execution. The generated playbook is not in the catalog; it exists only for this one execution.
This shape is referenced in the ai-meta overview doc as part of the "shell, catalog, scheduler, execution fabric" pattern that makes NoETL a distributed business operating system.
H.12.2 What this means for the runtime
The runtime must accept playbook execution units from three sources:
| Source | Today | Tomorrow (with H.12) |
|---|---|---|
| Static YAML in the catalog | ✓ | ✓ |
| YAML file on disk (CLI local mode) | ✓ | ✓ |
| Playbook object built in-memory by an upstream agent | ✗ | ✓ |
The third path requires:
- A typed builder API for constructing a
Playbookvalue programmatically. Thenoetl-executor::playbookmodule (shipped in R-1.1 PR-2a) already has the types; what's missing is an ergonomic builder. - Identity for ephemeral playbooks — the runtime needs to assign a stable
playbook_idto a generated execution unit so its events, lineage, and replay records have a consistent reference. The catalog records akind: ephemeralrow with the generating-agent's execution_id asparent_execution_idand the generated YAML serialised as the row'sdefinitionfield. The ephemeral row is deletable after archival; the events that reference it keep theirplaybook_idforeign key. - A trust boundary — generated playbooks must be policy-checked the same way catalog playbooks are. An LLM-generated playbook MUST NOT inherit the generating agent's credential scope without an explicit grant; the policy engine evaluates the generated playbook against the tenant's grant table the same way it evaluates a catalog playbook.
- A persistence contract — the generated YAML is stored alongside the execution archive (§ 7) so replay reproduces the exact playbook the agent generated. No "the agent ran something we can't see" gaps.
H.12.3 Shape under the Rust + Polars-wrapper endpoint (§ H.9)
After R-4 ships and pip install noetl exposes the runtime through a PyO3 wrapper, the generation surface becomes:
import noetl as nt
# An agent (LangChain, ADK, OpenAI Assistants, or a custom planner)
# decomposes a user intent into a typed Playbook.
playbook = nt.Playbook.builder()
.workload({"trip_id": trip_id, "user_uid": uid})
.step("fetch_calendar")
.tool(kind="agent", framework="noetl",
entrypoint="travel/playbooks/catalog/calendar/list")
.next("emit_widget")
.step("emit_widget")
.tool(kind="python", code="def main(events): ...")
.build()
# Hand the generated playbook to the runtime.
outcome = nt.run(playbook, parent_execution_id=agent_exec_id)
The generated Playbook is a typed Rust value that crosses the PyO3 boundary once. The runtime treats it identically to a catalog-loaded playbook from that point forward. Events, lineage, archive bundle, replay all behave the same.
H.12.4 Shape under the Rust-only worker (R-3)
The worker doesn't generate playbooks itself — generation is an agent-side concern. But the worker must accept dispatched commands whose playbook_ref points at an ephemeral catalog row instead of a versioned catalog row. This is one new condition in the LoadPlaybook step of the worker's NATS pull loop:
// pseudocode
match cmd.playbook_ref.kind() {
PlaybookKind::Catalog => catalog.load(cmd.playbook_ref.id, cmd.playbook_ref.version),
PlaybookKind::Ephemeral => ephemeral_store.load(cmd.playbook_ref.id),
}
ephemeral_store reads the agent-generated playbook from the catalog's kind: ephemeral rows. The rest of the worker's dispatch is unchanged.
H.12.5 Trust + audit
Three guardrails for generated playbooks:
- Grant inheritance is explicit. The agent's execution context defines a
delegable_credentialsset listing the credential aliases the agent is permitted to pass into a generated playbook. Generated playbooks reference credentials by alias only; resolution happens at step time against the delegated set, not the agent's full credential scope. - Tool-kind allowlist. The agent declares which
tool.kindvalues its generated playbooks are permitted to dispatch. An agent allowed to read data (kindshttp,postgres,duckdb) cannot generate a playbook that runsshellorpythonunless explicitly granted. - Persistence audit. Every generated playbook is persisted in the execution archive (§ 7) alongside the events it produced. Replay reproduces the exact playbook the agent generated; auditors can answer "what code did this agent decide to run?" deterministically.
H.12.6 What changes in the migration plan
- Phase R-5 (new in this section): add ephemeral-playbook support to the Rust runtime after R-4 ships. Concrete scope: typed
Playbookbuilder API onnoetl-executor::playbook(~200 LoC); ephemeral catalog row + persistence (~300 LoC; needs Python catalog change too); grant-inheritance enforcement in the policy engine (~150 LoC). - No R-1 / R-2 / R-3 change required. The current YAML loading paths continue to work; the ephemeral path is additive.
- Open questions added to § H.8:
- Should the ephemeral catalog row hold the YAML as text or as a typed Playbook serialization? Trade-off: YAML round-trips through the parser (slow, but human-readable in audit); typed serialization is faster but binary.
- Should generated playbooks be cached for re-use if the agent re-generates the same shape? Cache key would be a structural hash of the typed Playbook. Initial answer: no — agents that need stable playbooks should write them to the catalog explicitly.
- How does an LLM-generated playbook satisfy the Catalog-Driven MCP Architecture? The catalog's MCP-discovery surface assumes registered playbooks; ephemeral playbooks would need explicit opt-in flagging if discoverable, defaulting to private.
H.12.7 Relationship to the ai-meta agent-orchestration doc
The ai-meta agent-orchestration page frames NoETL as the "distributed business operating system" for AI agents — catalog, scheduler, executor, observability. Dynamically-generated playbooks are the natural endpoint of that framing: when an agent IS a playbook, an agent that plans is an agent that generates more playbooks. Cataloging every generated playbook would explode the catalog; treating them as ephemeral execution units lets the agent compose freely while keeping the execution archive complete and auditable.
The pattern aligns with the ai-meta overview doc's "shell + catalog + scheduler + execution fabric + event log + observability" composition. Generation becomes part of the shell layer; execution lands on the existing fabric without runtime-level changes beyond § H.12.4.