Skip to main content

Internet to Postgres to GCS

This tutorial builds one complete NoETL data movement flow:

  1. Download JSON from a public internet API.
  2. Store the response as jsonb in Postgres.
  3. Read the saved Postgres row back.
  4. Write a CSV export to Google Cloud Storage.
  5. Register credentials, register the playbook, execute it, and check status with the NoETL CLI.

The runnable examples live in:

  • repos/e2e/fixtures/playbooks/tutorials/internet_postgres_gcs/internet_postgres_gcs_hmac.yaml
  • repos/e2e/fixtures/playbooks/tutorials/internet_postgres_gcs/internet_postgres_gcs_workload_identity.yaml

Use the HMAC playbook for local kind or any cluster where you want explicit GCS credentials. Use the Workload Identity playbook on GKE when the NoETL worker Kubernetes service account is already bound to a Google service account with bucket write access.

Prerequisites

  • NoETL CLI installed and pointing at a running NoETL cluster.
  • jq for shell examples.
  • A Postgres database reachable from NoETL workers.
  • A GCS bucket where the worker identity or registered GCS credential can write objects.

Never commit the credential files you create below. Keep them in /tmp or another local-only secret location.

Demo Path

Use this section for the current GKE demo cluster when the standard demo credentials are already registered:

  • NoETL API: http://127.0.0.1:18082
  • Postgres credential: pg_k8s
  • HMAC credential: gcs_hmac_local
  • Workload Identity bucket: noetl-demo-output
  • HMAC bucket: noetl-demo-19700101

Register both playbooks:

export NOETL_URL=http://127.0.0.1:18082

noetl --server-url "$NOETL_URL" register playbook \
--file repos/e2e/fixtures/playbooks/tutorials/internet_postgres_gcs/internet_postgres_gcs_workload_identity.yaml

noetl --server-url "$NOETL_URL" register playbook \
--file repos/e2e/fixtures/playbooks/tutorials/internet_postgres_gcs/internet_postgres_gcs_hmac.yaml

Expected registration output from the latest demo validation:

Resource 'fixtures/playbooks/tutorials/internet_postgres_gcs/workload_identity' version '7' registered.
Resource 'fixtures/playbooks/tutorials/internet_postgres_gcs/hmac' version '6' registered.

Run the Workload Identity path:

WI_EXEC=$(noetl --server-url "$NOETL_URL" exec \
fixtures/playbooks/tutorials/internet_postgres_gcs/workload_identity \
--runtime distributed \
--payload '{
"pg_auth": "pg_k8s",
"gcs_bucket": "noetl-demo-output",
"gcs_prefix": "noetl/tutorial/demo-workload-identity"
}' \
--json | jq -r '.execution_id')

noetl --server-url "$NOETL_URL" status "$WI_EXEC" --json | jq '{
execution_id,
completed,
failed,
current_step,
duration_human
}'

Validated output:

{
"execution_id": "627592523359191239",
"completed": true,
"failed": false,
"current_step": "end",
"duration_human": "6s"
}

Validated object:

gs://noetl-demo-output/noetl/tutorial/demo-workload-identity/github_repo_627592523359191239.csv

Run the HMAC path:

HMAC_EXEC=$(noetl --server-url "$NOETL_URL" exec \
fixtures/playbooks/tutorials/internet_postgres_gcs/hmac \
--runtime distributed \
--payload '{
"pg_auth": "pg_k8s",
"gcs_credential": "gcs_hmac_local",
"gcs_bucket": "noetl-demo-19700101",
"gcs_prefix": "noetl/tutorial/demo-hmac"
}' \
--json | jq -r '.execution_id')

noetl --server-url "$NOETL_URL" status "$HMAC_EXEC" --json | jq '{
execution_id,
completed,
failed,
current_step,
duration_human
}'

Validated output:

{
"execution_id": "627592528442687692",
"completed": true,
"failed": false,
"current_step": "end",
"duration_human": "7s"
}

Validated object:

gs://noetl-demo-19700101/noetl/tutorial/demo-hmac/github_repo_627592528442687692.csv

Verify all workflow commands completed:

noetl --server-url "$NOETL_URL" query \
"SELECT execution_id, step_name, tool_kind, status, error::text AS error
FROM noetl.command
WHERE execution_id IN ('$WI_EXEC', '$HMAC_EXEC')
ORDER BY execution_id, created_at" \
--schema noetl \
--format json

Expected result: every row has status = "COMPLETED" and error = null. The latest validation returned completed rows for insert_payload, upload_csv_to_gcs, and export_postgres_to_gcs.

Expected CSV content:

execution_id,source_url,repo_name,repo_url,downloaded_at
627592523359191239,https://api.github.com/repos/noetl/noetl,noetl/noetl,https://github.com/noetl/noetl,2026-05-15T02:18:56Z

How the Playbook Works

The HMAC playbook has seven steps:

StepToolPurpose
create_tablepostgresCreates public.noetl_tutorial_downloads.
fetch_payloadpythonDownloads a small JSON record from https://api.github.com/repos/noetl/noetl.
insert_payloadpostgresInserts one JSONB row.
verify_postgrespostgresCounts rows for the current execution.
export_postgres_to_gcsduckdbAttaches Postgres and writes a CSV to gs://....
endpythonReturns the generated GCS URI in the execution result.

The Workload Identity playbook uses the same first four steps, then reads the Postgres row with read_postgres_for_export and uploads CSV with upload_csv_to_gcs using google.cloud.storage and the worker pod identity. This avoids shipping a GCS key into the cluster.

The default workload values are safe placeholders. Override pg_auth, gcs_bucket, and the GCS credential name at runtime.

Local NoETL Cluster

This path assumes a local kind cluster and an API port-forward on localhost:8082.

export NOETL_URL=http://localhost:8082

noetl context add local-kind --server-url "$NOETL_URL"
noetl context use local-kind
noetl context set-runtime distributed

If the API is not reachable, start a port-forward in another terminal:

kubectl -n noetl port-forward svc/noetl 8082:8082

1. Register Postgres Credentials

Create a local-only credential file. The values below match the common local demo Postgres service shape; adjust them for your cluster.

cat >/tmp/pg_tutorial_local.json <<'JSON'
{
"name": "pg_tutorial",
"type": "postgres",
"description": "Tutorial Postgres credential for local NoETL workers",
"data": {
"db_host": "postgres.postgres.svc.cluster.local",
"db_port": "5432",
"db_user": "demo",
"db_password": "REPLACE_WITH_POSTGRES_PASSWORD",
"db_name": "demo_noetl"
}
}
JSON

noetl --server-url "$NOETL_URL" register credential --file /tmp/pg_tutorial_local.json

Confirm it is registered:

noetl --server-url "$NOETL_URL" catalog list credential --json \
| jq -r '(.items // .)[]?.name' \
| grep '^pg_tutorial$'

2. Register GCS HMAC Credentials

For local kind, use a GCS HMAC key because Workload Identity is not available. The HMAC key needs permission to write to your tutorial bucket.

cat >/tmp/gcs_hmac_tutorial.json <<'JSON'
{
"name": "gcs_hmac_tutorial",
"type": "gcs_hmac",
"description": "Tutorial GCS HMAC credential for DuckDB httpfs",
"data": {
"key_id": "REPLACE_WITH_GCS_HMAC_KEY_ID",
"secret_key": "REPLACE_WITH_GCS_HMAC_SECRET",
"type": "gcs",
"service": "gcs"
}
}
JSON

noetl --server-url "$NOETL_URL" register credential --file /tmp/gcs_hmac_tutorial.json

3. Register the Playbook

From the ai-meta workspace root:

noetl --server-url "$NOETL_URL" register playbook \
--file repos/e2e/fixtures/playbooks/tutorials/internet_postgres_gcs/internet_postgres_gcs_hmac.yaml

Expected output:

Playbook registered successfully: ... "path":"fixtures/playbooks/tutorials/internet_postgres_gcs/hmac" ...

4. Execute the Playbook

Set your bucket name first:

export GCS_BUCKET=REPLACE_WITH_BUCKET_NAME

EXEC_ID=$(noetl --server-url "$NOETL_URL" exec \
fixtures/playbooks/tutorials/internet_postgres_gcs/hmac \
--runtime distributed \
--payload "{
\"pg_auth\": \"pg_tutorial\",
\"gcs_credential\": \"gcs_hmac_tutorial\",
\"gcs_bucket\": \"${GCS_BUCKET}\",
\"gcs_prefix\": \"noetl/tutorial/local-kind\"
}" \
--json | jq -r '.execution_id')

echo "$EXEC_ID"

5. Check Execution Status

noetl --server-url "$NOETL_URL" status "$EXEC_ID" --json | jq '{
execution_id,
completed,
failed,
current_step,
duration_human
}'

Wait until completed is true and failed is false.

6. Check Command Evidence with NoETL CLI

The noetl query command reads the NoETL control database. Use it to confirm the Postgres insert and GCS export steps completed:

noetl --server-url "$NOETL_URL" query \
"SELECT step_name, tool_kind, status, error::text AS error
FROM noetl.command
WHERE execution_id = '$EXEC_ID'
ORDER BY created_at" \
--schema noetl \
--format json

Expected rows include insert_payload, verify_postgres, and export_postgres_to_gcs, all with status = "COMPLETED" and error = null. If the execution completed, the GCS write succeeded.

GKE NoETL Cluster

For GKE, create a separate CLI context that points at the GKE NoETL API. The example below uses a local port-forward on 18082 so it does not collide with local kind.

kubectl config use-context REPLACE_WITH_GKE_CONTEXT
kubectl -n noetl port-forward svc/noetl 18082:8082

In another terminal:

export NOETL_URL=http://localhost:18082

noetl context add gke-noetl --server-url "$NOETL_URL"
noetl context use gke-noetl
noetl context set-runtime distributed

Option A: GKE with Workload Identity

Use this option when the NoETL worker Kubernetes service account can already write to the target bucket through Workload Identity. No GCS credential is registered in NoETL for this path.

Register the GKE Postgres credential. Use the in-cluster host that your GKE NoETL deployment uses, for example PgBouncer or Cloud SQL proxy service DNS.

cat >/tmp/pg_tutorial_gke.json <<'JSON'
{
"name": "pg_tutorial",
"type": "postgres",
"description": "Tutorial Postgres credential for GKE NoETL workers",
"data": {
"db_host": "REPLACE_WITH_GKE_POSTGRES_SERVICE",
"db_port": "5432",
"db_user": "REPLACE_WITH_POSTGRES_USER",
"db_password": "REPLACE_WITH_POSTGRES_PASSWORD",
"db_name": "REPLACE_WITH_POSTGRES_DATABASE"
}
}
JSON

noetl --server-url "$NOETL_URL" register credential --file /tmp/pg_tutorial_gke.json

Register and execute the Workload Identity playbook:

noetl --server-url "$NOETL_URL" register playbook \
--file repos/e2e/fixtures/playbooks/tutorials/internet_postgres_gcs/internet_postgres_gcs_workload_identity.yaml

export GCS_BUCKET=REPLACE_WITH_BUCKET_NAME

EXEC_ID=$(noetl --server-url "$NOETL_URL" exec \
fixtures/playbooks/tutorials/internet_postgres_gcs/workload_identity \
--runtime distributed \
--payload "{
\"pg_auth\": \"pg_tutorial\",
\"gcs_bucket\": \"${GCS_BUCKET}\",
\"gcs_prefix\": \"noetl/tutorial/gke-workload-identity\"
}" \
--json | jq -r '.execution_id')

noetl --server-url "$NOETL_URL" status "$EXEC_ID" --json

Expected successful status shape:

{
"execution_id": "627592523359191239",
"completed": true,
"failed": false,
"current_step": "end",
"duration_human": "6s"
}

The GKE validation run used:

noetl --server-url http://127.0.0.1:18082 exec \
fixtures/playbooks/tutorials/internet_postgres_gcs/workload_identity \
--runtime distributed \
--payload '{
"pg_auth": "pg_k8s",
"gcs_bucket": "noetl-demo-output",
"gcs_prefix": "noetl/tutorial/gke-workload-identity"
}'

It created:

gs://noetl-demo-output/noetl/tutorial/demo-workload-identity/github_repo_627592523359191239.csv

If you see a 403 such as does not have storage.objects.create access, the playbook is healthy but the worker's Google service account does not have write permission on that bucket.

Option B: GKE with Explicit GCS Credentials

Use this option when Workload Identity is not configured for the worker. The credential registration is the same HMAC shape used by local kind:

cat >/tmp/gcs_hmac_tutorial.json <<'JSON'
{
"name": "gcs_hmac_tutorial",
"type": "gcs_hmac",
"description": "Tutorial GCS HMAC credential for GKE NoETL workers",
"data": {
"key_id": "REPLACE_WITH_GCS_HMAC_KEY_ID",
"secret_key": "REPLACE_WITH_GCS_HMAC_SECRET",
"type": "gcs",
"service": "gcs"
}
}
JSON

noetl --server-url "$NOETL_URL" register credential --file /tmp/gcs_hmac_tutorial.json

Then register and execute the HMAC playbook against GKE:

noetl --server-url "$NOETL_URL" register playbook \
--file repos/e2e/fixtures/playbooks/tutorials/internet_postgres_gcs/internet_postgres_gcs_hmac.yaml

EXEC_ID=$(noetl --server-url "$NOETL_URL" exec \
fixtures/playbooks/tutorials/internet_postgres_gcs/hmac \
--runtime distributed \
--payload "{
\"pg_auth\": \"pg_tutorial\",
\"gcs_credential\": \"gcs_hmac_tutorial\",
\"gcs_bucket\": \"${GCS_BUCKET}\",
\"gcs_prefix\": \"noetl/tutorial/gke-hmac\"
}" \
--json | jq -r '.execution_id')

noetl --server-url "$NOETL_URL" status "$EXEC_ID" --json

Expected successful status shape:

{
"execution_id": "627592528442687692",
"completed": true,
"failed": false,
"current_step": "end",
"duration_human": "7s"
}

The GKE validation run used:

noetl --server-url http://127.0.0.1:18082 exec \
fixtures/playbooks/tutorials/internet_postgres_gcs/hmac \
--runtime distributed \
--payload '{
"pg_auth": "pg_k8s",
"gcs_credential": "gcs_hmac_local",
"gcs_bucket": "noetl-demo-19700101",
"gcs_prefix": "noetl/tutorial/gke-hmac"
}'

It created:

gs://noetl-demo-19700101/noetl/tutorial/demo-hmac/github_repo_627592528442687692.csv

The command-table check for the successful HMAC run returned all tutorial steps as COMPLETED, including insert_payload and export_postgres_to_gcs.

Troubleshooting

Credential Not Found

noetl --server-url "$NOETL_URL" catalog list credential --json

The playbook defaults to pg_tutorial and gcs_hmac_tutorial. If you use different names, pass them in the execution payload.

Postgres Connection Fails

The Postgres host must be reachable from NoETL worker pods, not from your laptop. Use Kubernetes service DNS such as postgres.postgres.svc.cluster.local, pgbouncer.postgres.svc.cluster.local, or the service name used by your GKE deployment.

GCS Upload Fails

For HMAC mode, verify the registered HMAC key can write to the bucket. For Workload Identity mode, verify the worker Kubernetes service account is bound to a Google service account with storage.objects.create permission on the bucket.