Result Storage — Implementation Guide
This document describes the implementation details for ResultRef + manifests while remaining event-sourced.
See also:
docs/runtime/results.md(model),docs/runtime/events.md(envelope),docs/spec.md(normative semantics).
A. Worker-side changes (data plane)
A1. Add a result storage helper
Create noetl/results.py (or similar):
serialize_result(obj) -> bytes(stable JSON encoding; safe for decimals/datetimes)truncate_preview(obj, max_bytes) -> object(UI/debug)store_result_bytes(body: bytes, policy, ctx) -> ResultRef(artifact drivers)select_fields(obj, selectors) -> dict(for routing/templating)
Artifact drivers (choose any subset initially):
localfss3/miniogcspostgres_lo(optional)
A2. Centralize payload shaping in the event callback
Tool executors already call log_event_callback('task_complete', ..., data=result_data, meta=...).
Keep executors simple. Implement storage logic in the callback that constructs the JSON event posted to the server:
- Always include:
payload.inputs
- If output size is less than or equal to inline cap:
- set
payload.output_inline = result_data
- set
- If output size exceeds inline cap:
- upload body to artifact store
- set
payload.output_ref = ResultRef - set
payload.output_select = selected-fieldsif configured - set
payload.preview = previewif configured
This yields a consistent event envelope regardless of tool type (http, python, postgres, duckdb).
A3. Correlation keys for loop/pagination/retry
When a tool call is executed under loop/pagination/retry, attach these fields to the event envelope:
iteration,iteration_id(loop)page(pagination)attempt(retry)
These should be set by the runner/dispatcher that knows the current scope, not by the executor.
B. Server-side changes (control plane)
B1. Accept ResultRef fields in event ingestion
Update /api/events ingestion to persist:
payload.output_inlinepayload.output_refpayload.output_selectpayload.preview
The server MUST persist events even when the artifact upload fails; in that case it SHOULD:
- mark status=
error, and - include error details in
payload.error.
B2. Projection tables (rebuildable from events)
Add projections to avoid scanning full event streams for common reads.
noetl.result_index
Keyed by:
execution_idstep_namestep_run_idtool_run_iditeration,page,attempt
Stores:
result_ref(json)created_at
noetl.step_state
Keyed by:
execution_idstep_name
Stores:
statuslast_result_ref(json)aggregate_result_ref(json; manifest/materialized)
B3. Context binding rule (preserve template ergonomics)
When building the context for the next step:
- If
output_inlineexists: inject it astask_name - If
output_refexists:- inject
output_select(or{}) astask_name - inject
task_name.__ref__ = output_ref - optional
task_name.__preview__ = preview
- inject
This preserves patterns like:
{{ evaluate_weather_directly.alert }}
while allowing full-body access via:
{{ evaluate_weather_directly.__ref__ }}
B4. Manifest support
When aggregate.mode=manifest:
- emit a manifest object representing the step’s logical output:
- store manifest inline if small, else as an artifact
- expose as
aggregate_result_ref
Manifests enable streaming and avoid materializing huge merged arrays.
C. Minimal new tool: artifact.get (recommended)
Add a tool kind artifact with operation get:
- input:
ref(ResultRef ornoetl://logical URI) - output: decoded JSON/body
Downstream steps can then load externalized results without special-casing.
D. Exact fields added to event payload (summary)
When a tool finishes (canonical tool.processed or legacy task_complete), events SHOULD include:
payload.inputspayload.output_inlineORpayload.output_refpayload.output_select(recommended when output_ref is used)payload.preview(optional)
And SHOULD include correlation keys:
iteration,iteration_idpageattempt
This is sufficient to support efficient retrieval of:
- per-iteration, per-page, per-attempt pieces
- aggregated manifests
- lazy loading for next steps