GKE Runbook Notes (Cloud SQL + PgBouncer)
This note captures the exact flow used to run distributed playbooks against:
- Project:
noetl-demo-19700101 - Cluster:
us-central1/noetl-cluster - Gateway:
https://gateway.mestumre.dev - Database used by stress tests:
demo_noetl(credentialpg_k8s)
1) Build and install local CLI binary
cd /Volumes/X10/projects/noetl/noetl
cargo build -p noetl
cp ./target/debug/noetl bin/
cp ./target/debug/noetl .venv/bin/
which noetl
noetl --version
2) Deploy updated NoETL image to GKE (local runtime)
Use local runtime for this automation playbook (executor.profile: local).
noetl run automation/gcp_gke/noetl_gke_fresh_stack.yaml --runtime local \
--set action=deploy \
--set project_id=noetl-demo-19700101 \
--set region=us-central1 \
--set cluster_name=noetl-cluster \
--set build_images=true \
--set build_noetl_image=true \
--set build_gateway_image=false \
--set build_gui_image=false \
--set deploy_ingress=false \
--set gateway_service_type=LoadBalancer \
--set gateway_load_balancer_ip=34.46.180.136 \
--set gui_service_type=LoadBalancer \
--set gui_load_balancer_ip=35.226.162.30 \
--set pgbouncer_default_pool_size=4 \
--set pgbouncer_min_pool_size=1 \
--set pgbouncer_reserve_pool_size=1 \
--set pgbouncer_max_db_connections=8 \
--set pgbouncer_server_idle_timeout=120
Verify public services after deploy:
kubectl get svc -n gateway gateway
kubectl get svc -n gui gui
Both services must have EXTERNAL-IP (not <pending> / <none>).
3) Bootstrap gateway auth (if required)
If auth resources are missing/stale, run:
noetl run automation/gcp_gke/noetl_gke_fresh_stack.yaml --runtime local \
--set action=deploy \
--set project_id=noetl-demo-19700101 \
--set region=us-central1 \
--set cluster_name=noetl-cluster \
--set build_images=false \
--set bootstrap_gateway_auth=true
4) Login and export session token
noetl auth login --context gke-prod --auth0 [email protected]
# paste full Auth0 callback URL when prompted (or valid JWT token)
export NOETL_SESSION_TOKEN='<session-token-from-success-output>'
Notes:
- A short opaque token is not a JWT and will fail with
Invalid JWT format. - Use the full callback URL (
...#id_token=...) or valid JWT form.
5) Verify and use context
noetl context list
noetl context use gke-prod
6) Register playbooks
Register only target playbooks (faster, lower failure surface):
noetl register playbook --file tests/fixtures/playbooks/batch_execution/server_oom_stress_chunk_worker_v2/server_oom_stress_chunk_worker_v2.yaml
noetl register playbook --file tests/fixtures/playbooks/batch_execution/server_oom_stress_test_v2/server_oom_stress_test_v2.yaml
Bulk register (optional):
find tests/fixtures/playbooks -name '*.yaml' | sort > /tmp/playbooks.list
split -l 20 /tmp/playbooks.list /tmp/pb_chunk_
for f in /tmp/pb_chunk_*; do
while IFS= read -r pb; do
noetl register playbook --file "$pb" || break 2
done < "$f"
done
7) Execute stress test
noetl exec catalog://tests/fixtures/playbooks/batch_execution/server_oom_stress_test_v2 \
--runtime distributed \
--payload '{"total_items":3000,"batch_size":50,"concurrent_batches":1,"items_max_in_flight":1}' \
--json
Capture execution_id from output.
8) Check status and duration
Single check:
id="<execution_id>"
noetl execute status "$id" --json | jq '{
completed,
failed,
current_step,
duration_human,
duration_seconds,
batch:.variables.batch,
summary:.variables.summarize
}'
Live watch:
watch -n 10 "noetl execute status $id --json | jq '{completed,failed,current_step,duration_human,duration_seconds,batch_number:.variables.batch.batch_number,batch_count:.variables.build_batch_plan.batch_count}'"
9) Validate DB results in demo_noetl
Use gateway PostgreSQL API with explicit credential and database:
curl -sS "https://gateway.mestumre.dev/noetl/postgres/execute" \
-H "Authorization: Bearer $NOETL_SESSION_TOKEN" \
-H "x-session-token: $NOETL_SESSION_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"query":"SELECT current_database() AS db, COUNT(*) AS cnt FROM public.stress_test_results;",
"credential":"pg_k8s",
"database":"demo_noetl",
"schema":"public"
}' | jq
Expected for the 3000-item run: cnt = 3000.
10) Payload correctness checks (JSONB)
SELECT pg_typeof(result_data) AS result_data_type
FROM public.stress_test_results
LIMIT 1;
SELECT
item_id,
(result_data ? 'payload') AS has_payload,
left(result_data->>'payload', 60) AS payload_preview,
((result_data->'data'->>'field_000') IS NOT NULL) AS has_field_000,
((result_data->'data'->>'field_599') IS NOT NULL) AS has_field_599,
(SELECT COUNT(*) FROM json_object_keys((result_data->'data')::json)) AS data_fields
FROM public.stress_test_results
ORDER BY item_id DESC
LIMIT 5;
Notes:
- Use
json_object_keys((... )::json)ifjsonb_object_length(jsonb)is unavailable. noetl querymay use a different default DB; for this test use/noetl/postgres/executewith explicitdatabase.
11) Known failure patterns and fixes
401 Unauthorized - Invalid JWT format- Use full Auth0 callback URL or proper JWT token during
noetl auth login.
- Use full Auth0 callback URL or proper JWT token during
404 Execution not foundon status- CLI now falls back to
/executions/{id}event-log path; use latest binary.
- CLI now falls back to
405 Method Not Allowedduring auth bootstrap- Run automation with
--runtime local; auth bootstrap must call NoETL API directly.
- Run automation with
- Cloud SQL saturation / instability
- Keep conservative PgBouncer settings (pool size 4/min 1/reserve 1/max db conns 8).