Artifact Tool
The artifact tool provides a uniform interface for reading and (optionally) writing externally stored results referenced by ResultRef.
This tool is the standard way for downstream steps to load large outputs that were externalized from the event log.
1. When to use
Use artifact.get when:
- a prior step produced
payload.output_ref(externalized result) - the next step needs the full body (not just
output_select) - the step is processing a manifest (pagination/loop aggregation)
2. Tool kind
tool:
kind: artifact
op: get
kindMUST beartifactopMUST be one of the supported operations below
3. Operations
3.1 get
Fetch and return the artifact body for a given reference.
Inputs
ref(required)- may be a ResultRef object
- or a logical URI (
noetl://execution/...)
Optional:
resolve_manifest(bool, default: true)- If
refis a manifest, resolve and return the resolved parts.
- If
mode(string, default:json)json→ parse JSON and return an objecttext→ return decoded textbytes→ return raw bytes (base64 or stream handle, depending on runtime)
max_bytes(int)- safety cap on fetched body
range(object)- request a byte range (for large files)
{ start: 0, end: 1048575 }
Outputs
status(success|error)data(object|string)metacontent_type,compression,bytes,sha256, and storage info
Example
- step: load_users_manifest
tool:
kind: artifact
op: get
ref: "{{ fetch_users.users_manifest.__ref__ }}"
resolve_manifest: true
mode: json
3.2 head (recommended)
Return only artifact metadata (no body). Useful for planning and debugging.
Inputs:
ref(required)
Outputs:
statusmeta
3.3 put (optional)
Store a body as an artifact and return a ResultRef.
Use when:
- a python task produces a large output that should be externalized
- a step wants to persist intermediate datasets
Inputs:
data(required; json/text/bytes)content_type(default: application/json)compression(optional)store(driver + uri template)
Outputs:
statusref(ResultRef)
4. Security & access model
4.1 Worker access
Workers typically access artifacts directly via the storage driver credentials (S3/GCS/etc.).
4.2 Server access
The server MAY:
- stream artifacts to clients, or
- generate signed URLs for clients (recommended)
In either model, the event log stores only a ResultRef and metadata.
5. Manifest handling
When resolve_manifest=true and ref is a manifest:
artifact.getreturns either:data = { manifest, parts: [ {data, meta}, ... ] }(default)- OR streaming iterator handle (future extension)
For very large manifests, prefer:
- resolve parts incrementally (loop over
partsrefs) - materialize into DuckDB/Postgres
6. DSL pattern: passing refs between steps
If the prior step externalized its output, the server binds:
task_name.__ref__(ResultRef)task_name.__preview__(optional)task_name.<selected fields>(from output_select)
So you can:
-
use selected fields in routing:
{{ fetch_users.next_cursor }}
-
load the full body when needed:
ref: "{{ fetch_users.__ref__ }}"
7. Recommended implementation notes
- Prefer a single resolver in the server:
- resolve
noetl://...to(driver, uri, artifact_id)
- resolve
- Maintain
noetl.artifactmetadata table in Postgres. - Implement
artifact.getas:- worker-side direct download (for internal pipelines)
- server-side signed URL/stream (for UI and external clients)
10. Postgres DDL (recommended)
A concrete PostgreSQL schema for event log + artifact metadata + projections is provided in:
docs/runtime/schema/postgres.sql
Postgres schema
A recommended PostgreSQL DDL for the event log + artifacts + projections is included at:
docs/runtime/schema/postgres.sql