Server Dissolution and the Global Grid
Status: North-star blueprint — the destination the in-flight umbrellas (CQRS #103, Event-WAL #104, WASM plug-ins #105, references-in-state #101, orchestrator throughput #102) are already walking toward. This doc names the end-state, the runtime split it requires, and the sequenced path — so the umbrellas read as one road with a destination instead of parallel tracks.
It is the synthesis of Ephemeral Blueprints (the compute-data boundary), the Event WAL (NATS-as-WAL + derivable storage), the System Pool + WASM Plug-ins (system logic as hot-replaceable compiled playbooks), the CQRS Write-Path Cutover (moving the write path off the server), and the Global Hybrid-Cloud Grid (the topology). Read those first; this ties them together.
Named plainly: NoETL is a distributed multitenant operating system, and this blueprint is its kernel design. The grid is where it runs; the quantum-cloud-hybrid horizon is what it will ultimately schedule. The sections below are written as an OS design even where they don't use the word.
One-paragraph statement
The server stops being a control plane and becomes a stateless edge. All durable state is the NATS JetStream write-ahead log plus the object store — there is no Postgres source of truth. All processing is event-driven system and data playbooks running on sharded worker pools, dispatched off NATS subscriptions, not server request handlers. The edge does exactly three things: terminate the public API (gateways and clients), address and federate across the shard grid, and serve projections on demand from the WAL. Internal transport is Apache Arrow (Flight across nodes, shared memory when colocated) carrying reference URNs instead of payloads, over NATS subjects keyed by the resource locator. The system is sharded by locator key across regions, clouds, and datacenters — a global NoETL network.
NoETL as a distributed multitenant operating system
Naming it an OS is design discipline, not branding. Every primitive a kernel provides has a NoETL realization that already exists or is specified in the umbrellas:
| OS primitive | NoETL realization |
|---|---|
| Process | an ephemeral playbook block — worker = atomic compute block |
| Process table / PCB | the event log per execution_id — any process's state is replayable from it |
| Scheduler | the kernel pump dispatching off JetStream consumer lag (KEDA-shaped) |
| Syscall interface | the WASM capability ring — deny-by-default host functions (#105) |
| IPC | Arrow shm (colocated) + Arrow Flight (cross-node) + NATS subjects (control) |
| Address space / pointers | the shared cache scoped to execution_id+step; locator URNs are the pointers (#101/#104) |
| VFS / namespace | the object store + the noetl://… locator namespace — a global, content-addressed file system |
| Journaling / WAL | the JetStream write-ahead log (#104) |
| Device drivers | tool kinds — the registry. http, postgres, snowflake… and the empty slots: gpu, qpu. |
| Multitenant isolation | shard_key tenant pinning + WASM sandbox + capability ring + keychain + no-default-connection |
Two of these are load-bearing and already built:
- The syscall boundary is the WASM capability ring. A system playbook
reaches the platform only through deny-by-default host functions
(
object_put,result_put,event_publish); an ungranted import fails instantiation. That is a capability-based syscall interface, and it is the privilege boundary between kernel services (system playbooks) and userspace (user playbooks). - The driver model is the tool registry. A
tool: { kind }is a device driver; adding a backend is a registry row, not a deployment. The empty slots —gpu,qpu— are why the quantum horizon below needs almost no new abstraction.
Naming it an OS also says what not to build. An OS does not keep a resident process per user "in case they return" — it dispatches on demand and reclaims; that is the ephemeral-execution rule (no persistent per-tenant agent processes). And multitenancy is enforced at the kernel, not bolted on: shard-key pinning, the sandbox, the capability ring, the keychain, and no-default-connection are the isolation model, mandated by the rules rather than left optional.
1. The inversion
| Concern | Today (control-plane server) | Destination (edge) |
|---|---|---|
| Event ingest | POST /api/events → synchronous Postgres INSERT | producer publishes to the shard's JetStream WAL (publish-ack = durability) |
| Orchestration | in-server Rust drive loop | system/orchestrate plug-in on the system pool, subscribed to the shard's event stream |
| Read model / projection | server writes projection_snapshot | system/projector folds the WAL; reads served from cached projections / object-store Feather |
| Durable log | noetl.event (Postgres, source of truth) | JetStream stream (source of truth); Postgres demoted, then dropped |
| Large results | inline in events, or result_store | object-store Feather, addressed by locator URN; events carry the reference, never the payload |
| Server role | everything | public API · addressing/federation · projection-on-demand |
The destination is not a rewrite — it is the existing umbrellas reaching their limit. The CQRS cutover already moves the write path off the server; this doc just says where that road ends.
2. Kernel / userspace runtime split
If the orchestrator becomes a playbook, something must run the orchestrator. The answer is an OS-shaped split — small resident kernel, everything else a playbook:
- Kernel (resident, minimal, can't be a playbook): the NATS connection + JetStream consumers, the system-pool scheduler (dispatches off consumer lag), the edge's addressing/federation table, and the response-boundary credential scrub. This is the one component that stays compiled-in and must be correct by construction. Think microkernel.
- System playbooks (kernel services): orchestrate, project, materialize,
outbox-publish, cleanup, auth, credential-rotate — compiled to WASM
(#105), hot-replaceable, on
noetl-worker-system-pool. These are the logic that used to live in the server. - User playbooks (userspace): domain logic on the data worker pools.
The bootstrap question — "what drives the orchestrator-playbook?" — is answered
by the kernel scheduler: it dispatches the system/orchestrate block when its
consumer sees new events for an execution. No in-process orchestration logic
survives; the scheduler is a dumb pump, the intelligence is the (replaceable)
plug-in.
3. The pure-edge end-state
No Postgres source of truth. JetStream is the WAL and the system of record; the object store holds the derived Feather tier (large results and materialized projection snapshots). The edge owns no database.
The edge surface collapses to three roles:
- Public API — terminate gateway/client requests, session + auth at the boundary, SSE/callback delivery. (This is today's gateway role; the edge absorbs the thin server API on top of it.)
- Addressing + federation — resolve a locator URN to a cell/shard, route to the owning NATS subject space, bridge cross-shard subjects over the supercluster. The routing table is inherently edge state.
- Projection-on-demand — answer
GET /api/executions/{id}from a cached projection or an object-store Feather snapshot; on a miss, ask thesystem/projectorto rebuild that execution's projection from the WAL (bounded replay, the block-b mechanism from #103).
Write path: producer (worker) publishes the event to its shard's JetStream
WAL; the publish-ack is the durability boundary; system/materializer and
system/projector fold it asynchronously. This is exactly
CQRS cutover Option B.
Read path: edge → projection cache → object-store Feather → (miss) projector rebuild from the WAL. Reads are eventually-consistent by default with a read-your-writes path that consults the local buffer ahead of the durable offset (the Event WAL read cache).
4. How this evolves the data-access boundary
The data-access boundary rule today says "only the server has direct DB access; workers go through the API." The pure-edge end-state moves the boundary rather than breaking it — each of the rule's three reasons relocates:
- Connection-pool isolation → moot. There is no Postgres pool to exhaust; NATS fan-out to thousands of subscribers is native. The reason the rule existed (workers scaling 1→50 starving the server's pool) disappears with the pool.
- Sharding readiness → realized. The locator
shard_keymodel is the sharding; the rule's "future shard routing" is the present. - Single point of consistency (schema, audit, RBAC, scrub) → splits: schema/migration becomes the event-envelope contract + projection logic; audit becomes the WAL itself (it is, by construction, the immutable audit log); RBAC moves to the gateway/edge; the response-boundary scrub stays in the kernel — credentials never leave a response unmasked, edge or not.
So the boundary becomes: the WAL is the system of record; the edge gatekeeps responses and federation, not a database. The rule will need a revision when the path reaches step 4 below — not before.
5. Transport substrate
- Control + events: NATS subjects, namespaced by locator (
<cell>.<shard>.<tenant>…). The subscription runtime (#90, Modes A/B/C) is the worker-side mechanism. - Bulk data: Arrow Flight (gRPC) across nodes; shared memory when
producer and consumer are colocated (already the worker's
ResultRef.ipcshm hint). Events carry reference URNs (#101 + #104 locator), never the payload — the 5 MBcommand.issuedproblem is the canary this already fixed. - Geo federation: NATS superclusters + leaf nodes connect clusters across regions, clouds, and on-prem datacenters. The locator URN resolves to a cell, which maps to a NATS account/subject space — so cross-shard routing is subject routing, which NATS already does at planet scale.
6. The shard grid
shard_key = FNV-1a(tenant + project + affinity) % shard_count → region / cell / shard— the stable, testednoetl_tools::locatorfunction.- A shard =
{ NATS cluster (WAL) · system + data worker pools · object store · an edge }. No database. - Cross-shard executions resolve locator URNs globally; the edge routes; the supercluster carries the subjects. Coordination across shards is a saga / eventual model keyed by the locator (see §8).
- Global network: shards everywhere — a region close to every gateway, every tenant pinned to its affinity cell, the grid addressed uniformly by URN.
7. The sequenced path
Each step is independently shippable and reversible; the system runs in a hybrid state throughout, never a big-bang.
- CQRS cutover, Option B (#103) — producer publishes to JetStream; the materializer becomes the sole writer of the (still-Postgres) read model. Shadow gate is green as of 2026-06-17; this step is unblocked.
- Orchestrator → system plug-in (#105) —
extract the drive loop into a WASM
system/orchestrateplaybook subscribed to the shard's event stream; the server keeps only the kernel scheduler. - Per-shard NATS-as-WAL (#104) — JetStream becomes the durable system of record; Postgres demoted to a derived projection.
- Drop Postgres-as-source-of-truth → projection-on-demand — reads served from object-store Feather / cache; the edge no longer owns a DB. (This is where the data-access-boundary rule gets its revision.)
- Cross-shard federation — NATS superclusters + locator routing; the global grid lights up.
8. The hard parts (named honestly)
- Orchestration latency. In-process drive is microseconds; a subscribed playbook is milliseconds + a dispatch. Mitigations: the local read-buffer ahead of the durable offset, WASM-compiled orchestrator, colocated system workers, and event batching — the orchestrator-throughput work (#102) is already paying this down. The bet: the per-shard slice is small enough that per-shard latency stays flat as the grid grows.
- Projection-on-demand cost. A cold read replays the WAL. Mitigate with always-warm projections for active executions, Feather snapshots for cold ones, and bounded replay (block-b). Never an unbounded full-history scan on the request path.
- Cross-shard consistency. Per-shard is linearizable via JetStream; cross-shard is a saga / eventual model, with the locator URN as the coordination key. This is the genuinely hard distributed-systems work and the least specified — it deserves its own design note before step 5.
- Operability without a database. No Postgres to
psqlinto. Observability must come entirely from the WAL + traces — which observability.md already mandates (metrics + spans +execution_ideverywhere). The WAL is the debugger. - The kernel must stay minimal and correct. It is the one thing that can't be a hot-replaceable playbook. Keep it boring: consumers, a scheduler, a routing table, a scrub. Resist putting logic there.
9. What stays in the edge forever
Even fully dissolved, the edge cannot give up:
- Public-API termination + session/auth at the gateway boundary.
- The response-boundary credential scrub — secrets never leave unmasked.
- The addressing/federation routing table — inherently edge.
- The kernel scheduler + NATS consumers — the microkernel.
Everything else is a playbook.
The quantum-cloud-hybrid horizon
This is positioning, not a roadmap commitment. Quantum advantage is narrow and NISQ-era noisy; the realistic near-term is hybrid — quantum as an accelerator for specific sub-blocks, classical playbooks doing the orchestration, GPU pools alongside. What matters here is that the OS substrate is the right shape to host it, which shows up as a tell: a QPU needs almost no new abstraction.
- A QPU is a device driver —
tool: { kind: qpu, backend: … }targeting a quantum cell behind a locator. Vendor heterogeneity (IBM / IonQ / Quantinuum / Rigetti) is exactly what the tool-registry driver layer already absorbs for classical backends. - A circuit execution is an atomic compute block — and no-cloning makes this the only honest model: a quantum state can't be checkpointed mid-circuit, so a circuit must be claim → run → release-or-restart, which is the worker contract verbatim. Restart-on-failure is safe because the classical inputs (the parametrized circuit) are in the WAL; the quantum state was never meant to be durable.
- Hybrid algorithms are playbooks with loops. A variational loop (VQE / QAOA: classical optimizer ↔ quantum expectation estimation) is the cursor/loop control structure with quantum blocks in the loop body — a classical block computes parameters, a quantum block runs shots, a data block reduces the counts, the optimizer updates, repeat.
- QPU queue latency is the callback rule, verbatim. Real hardware queues jobs for minutes to hours. The block submits the job plus a callback subject, releases the worker slot, and resumes on the result event — time in the external system is free.
- The WAL records the classical boundary, never the quantum state. A measurement result is a counts histogram (columnar → Arrow); the parametrized circuit in and the counts out are durable; the wavefunction in between is ephemeral compute — exactly what the event-log-records-boundary-events model already assumes.
The genuinely new work, when the time comes, lives in the cells, not the
kernel: error mitigation / correction as workload, shot-budget scheduling, and
a circuit IR the qpu driver compiles per backend. The scheduler, the locator,
the WAL, the capability ring, and the callback pattern don't change — which is
the whole point of having named it an operating system. The OS schedules
classical, GPU, and quantum resources uniformly, and a playbook composes them;
that is "quantum-cloud-hybrid platform" stated precisely.
Related
- Ephemeral Blueprints — the compute-data boundary this completes.
- Event WAL and Derivable Result Storage — the WAL + Feather tier (#104).
- System Worker Pool and WASM Plug-ins — system logic as compiled playbooks (#105).
- CQRS Write-Path Cutover — step 1 of the path (#103).
- Global Hybrid-Cloud Grid — the topology this addresses.