Agent Orchestration in NoETL
How tool: kind: agent lets a NoETL playbook dispatch external agent
runtimes — and how framework: noetl lets a playbook dispatch another
playbook as the agent runtime, without writing any Python glue.
This page is the reference for the agent contract. For the bigger picture (how playbooks and MCP servers compose into an AI operating system), see NoETL Catalog-Driven MCP Architecture and Playbook-as-MCP-Server.
The agent envelope
Every tool: kind: agent step returns the same shape regardless of
the framework underneath:
{
"status": "ok" | "error",
"framework": "adk" | "langchain" | "custom" | "noetl",
"entrypoint": "<framework-specific identifier>",
"data": <agent-produced output>,
"execution_id": "<for noetl framework: sub-playbook execution_id>",
"duration": <seconds>,
"error": { // only on failure
"kind": "agent.execution" | "agent.configuration",
"code": "<symbolic>",
"message": "<human-readable>",
"retryable": true | false,
"diagnosis": { ... } // optional, see Auto-troubleshoot
}
}
This single envelope is what makes "agents" compose: the caller doesn't need to know whether the agent was a Python ADK runtime, a LangChain chain, or a peer NoETL playbook. The shape is the contract.
Frameworks
framework | Entrypoint shape | What runs |
|---|---|---|
adk | pkg.module:factory_func | Google ADK runtime, instantiated via the factory |
langchain | pkg.module:chain_or_agent | LangChain chain or agent, invoked via .ainvoke |
custom | pkg.module:callable | Any callable; signature is inspected and dispatched |
noetl | catalog/path/to/playbook | A peer NoETL playbook, dispatched as a sub-flow |
Python-loaded frameworks (adk, langchain, custom) require the
target module to be importable from the worker. They're great for
calling out to existing Python agent code without rewriting it as a
playbook. noetl is for the inverse: wrapping any registered
playbook so it can be called as if it were an agent.
framework: noetl — playbook ≡ agent
The simplest worked example. A "search flights" playbook already
exists in the catalog at api_integration/amadeus_ai_api. Any other
playbook can call it as an agent:
- step: ask_amadeus
tool:
kind: agent
framework: noetl
entrypoint: api_integration/amadeus_ai_api
invoke_kwargs:
version: 2
payload:
query: "{{ user_query }}"
next:
arcs:
- step: render_results
Under the hood, the agent executor:
- Treats
entrypointas a catalog path (no Python import). - Merges
payloadandinvoke_kwargsinto the sub-playbook's workload. - Dispatches via
execute_playbook_task— the same plugintool: kind: playbookuses for fire-and-forget sub-execution. - Normalises the plugin's
success/errorstatus into the agent envelope'sok/error. - Wires the sub-execution's
execution_id,data,durationinto the envelope so callers can stitch it back into the event log.
This is what makes "any playbook is an MCP tool" work end-to-end:
the playbook-as-MCP-server endpoint
(reference) takes an MCP tools/call
and dispatches it via the same path.
Auto-troubleshoot on failure
When a framework: noetl sub-playbook fails, the executor can
optionally dispatch the
self-troubleshoot agent and attach
the diagnosis directly to the error envelope. Three opt-in levers,
in precedence order:
- Per-task —
task_config.on_failure.troubleshoot: true|false - Env-level —
NOETL_AGENT_AUTO_TROUBLESHOOT=1 - Default — off
Per-task always wins so operators can disable auto-diagnosis on inner-loop calls where the ~3s diagnostic call's wall-clock would dominate.
- step: ask_amadeus
tool:
kind: agent
framework: noetl
entrypoint: api_integration/amadeus_ai_api
payload:
query: "{{ user_query }}"
on_failure:
troubleshoot: true
ollama_model: gemma2:2b
confidence_threshold: 0.85
escalate_to: openai
When this step fails, the response carries:
{
"status": "error",
"framework": "noetl",
"entrypoint": "api_integration/amadeus_ai_api",
"execution_id": "exec-failed-1",
"error": {
"kind": "agent.execution",
"code": "PLAYBOOK_FAILED",
"message": "...",
"retryable": false,
"diagnosis": {
"category": "transient_5xx",
"confidence": 0.82,
"root_cause": "Amadeus sandbox returned HTTP 502",
"suggested_action": "Retry; if persistent, check api.amadeus.com status",
"source": "ollama",
"escalated": false
}
}
}
A recursion guard prevents the troubleshoot agent from auto-diagnosing its own failures. If the troubleshoot agent itself fails, the original error envelope is returned unchanged — diagnostics augment failures, they never replace them.
Workload pass-through to the troubleshoot agent
on_failure accepts the troubleshoot agent's workload knobs:
| Key | Default | What it controls |
|---|---|---|
troubleshoot | false | per-task opt-in (overrides env) |
troubleshoot_path | automation/agents/troubleshoot/diagnose_execution | catalog path of the diagnostic agent |
ollama_model | gemma2:2b | local model for first-pass triage |
ollama_mcp_server | mcp/ollama | catalog path of the Ollama MCP bridge |
confidence_threshold | 0.7 | escalate when local confidence < this |
escalate_to | openai | openai / claude / none |
openai_credential | openai_token | keychain entry for the API key |
openai_model | gpt-4o-mini | OpenAI model for escalation |
noetl_url | http://noetl-server.noetl.svc.cluster.local:8080 | NoETL API base for fetching events |
Unknown on_failure keys are ignored at the troubleshoot dispatch —
they're filtered to the known set so an arbitrary key doesn't leak
into the workload silently.
Optional-dependency contract
AI features in NoETL are optional. A deployment can run the
worker + server without ever touching tool: kind: agent framework=noetl, the playbook-as-MCP-server endpoint, the Ollama
bridge, or the self-troubleshoot agent. Core workflow execution
must keep working when those subsystems are missing.
The contract this enforces:
- No worker / server crashes when an AI subsystem is missing.
Module-level imports for AI-only paths are stdlib-only;
optional packages (
aiohttp,fastapi,uvicorn) are lazy-imported inside the functions that need them. A deployment without those packages still loads the noetl modules cleanly. - Playbook steps surface clean error envelopes, not tracebacks.
When
framework: noetlis invoked butnoetl.core.workflow.playbookcan't be imported, the agent executor returns a structured error witherror.kind = "agent.dependency"anderror.code = "WORKFLOW_PLUGIN_UNAVAILABLE". The worker keeps running; the playbook step fails with a clear "this feature is not available" message; non-AI playbooks are unaffected. - Auto-troubleshoot best-effort. When the troubleshoot agent itself can't be reached (Ollama down, agent not registered, the workflow plugin failed to import), the original error envelope is returned unchanged. Diagnostics augment failures, never replace them.
- Other agent frameworks unaffected.
framework: adk,langchain,customgo through a separate dispatch path that doesn't touch the workflow plugin. A deployment without the AI subsystems can still use Python-loaded agent runtimes.
The smoke test
scripts/optional_ai_smoke.py
exercises the contract: it loads the executor with
noetl.core.workflow.playbook deliberately missing and verifies
the structured error envelope; it asserts execute_playbook_task
references are confined to the framework=noetl helpers; it loads
noetl.tools.ollama_bridge and asserts no optional packages
leaked into sys.modules.
Configuration reference
tool:
kind: agent
# One of: adk | langchain | custom | noetl
framework: noetl
# For framework=noetl: catalog path. Otherwise: 'pkg.module:attr'.
entrypoint: api_integration/amadeus_ai_api
# Catalog version pin (framework=noetl only). Default: latest.
version: 2
# Workload-equivalent payload merged into the sub-flow's input.
payload:
query: "{{ user_query }}"
# Extra kwargs merged on top of payload (caller-side overrides).
invoke_kwargs:
timeout_s: 30
# framework=adk|langchain|custom only:
entrypoint_mode: factory # 'factory' (default) or 'callable'
entrypoint_args: {} # kwargs passed to factory
invoke_method: run_async # explicit method override
# Auto-troubleshoot hook (framework=noetl only).
on_failure:
troubleshoot: true
troubleshoot_path: automation/agents/troubleshoot/diagnose_execution
ollama_model: gemma2:2b
confidence_threshold: 0.7
escalate_to: openai
See also
- Playbook-as-MCP-Server — the inverse: expose any playbook as an MCP tool to external clients.
- Self-Troubleshoot Agent — what
on_failure.troubleshoot: trueactually invokes. - Ollama Bridge — deploying the cheap-first inference layer the troubleshoot agent uses.