Skip to main content

Agent Orchestration in NoETL

How tool: kind: agent lets a NoETL playbook dispatch external agent runtimes — and how framework: noetl lets a playbook dispatch another playbook as the agent runtime, without writing any Python glue.

This page is the reference for the agent contract. For the bigger picture (how playbooks and MCP servers compose into an AI operating system), see NoETL Catalog-Driven MCP Architecture and Playbook-as-MCP-Server.

The agent envelope

Every tool: kind: agent step returns the same shape regardless of the framework underneath:

{
"status": "ok" | "error",
"framework": "adk" | "langchain" | "custom" | "noetl",
"entrypoint": "<framework-specific identifier>",
"data": <agent-produced output>,
"execution_id": "<for noetl framework: sub-playbook execution_id>",
"duration": <seconds>,
"error": { // only on failure
"kind": "agent.execution" | "agent.configuration",
"code": "<symbolic>",
"message": "<human-readable>",
"retryable": true | false,
"diagnosis": { ... } // optional, see Auto-troubleshoot
}
}

This single envelope is what makes "agents" compose: the caller doesn't need to know whether the agent was a Python ADK runtime, a LangChain chain, or a peer NoETL playbook. The shape is the contract.

Frameworks

frameworkEntrypoint shapeWhat runs
adkpkg.module:factory_funcGoogle ADK runtime, instantiated via the factory
langchainpkg.module:chain_or_agentLangChain chain or agent, invoked via .ainvoke
custompkg.module:callableAny callable; signature is inspected and dispatched
noetlcatalog/path/to/playbookA peer NoETL playbook, dispatched as a sub-flow

Python-loaded frameworks (adk, langchain, custom) require the target module to be importable from the worker. They're great for calling out to existing Python agent code without rewriting it as a playbook. noetl is for the inverse: wrapping any registered playbook so it can be called as if it were an agent.

framework: noetl — playbook ≡ agent

The simplest worked example. A "search flights" playbook already exists in the catalog at api_integration/amadeus_ai_api. Any other playbook can call it as an agent:

- step: ask_amadeus
tool:
kind: agent
framework: noetl
entrypoint: api_integration/amadeus_ai_api
invoke_kwargs:
version: 2
payload:
query: "{{ user_query }}"
next:
arcs:
- step: render_results

Under the hood, the agent executor:

  1. Treats entrypoint as a catalog path (no Python import).
  2. Merges payload and invoke_kwargs into the sub-playbook's workload.
  3. Dispatches via execute_playbook_task — the same plugin tool: kind: playbook uses for fire-and-forget sub-execution.
  4. Normalises the plugin's success / error status into the agent envelope's ok / error.
  5. Wires the sub-execution's execution_id, data, duration into the envelope so callers can stitch it back into the event log.

This is what makes "any playbook is an MCP tool" work end-to-end: the playbook-as-MCP-server endpoint (reference) takes an MCP tools/call and dispatches it via the same path.

Auto-troubleshoot on failure

When a framework: noetl sub-playbook fails, the executor can optionally dispatch the self-troubleshoot agent and attach the diagnosis directly to the error envelope. Three opt-in levers, in precedence order:

  1. Per-tasktask_config.on_failure.troubleshoot: true|false
  2. Env-levelNOETL_AGENT_AUTO_TROUBLESHOOT=1
  3. Default — off

Per-task always wins so operators can disable auto-diagnosis on inner-loop calls where the ~3s diagnostic call's wall-clock would dominate.

- step: ask_amadeus
tool:
kind: agent
framework: noetl
entrypoint: api_integration/amadeus_ai_api
payload:
query: "{{ user_query }}"
on_failure:
troubleshoot: true
ollama_model: gemma2:2b
confidence_threshold: 0.85
escalate_to: openai

When this step fails, the response carries:

{
"status": "error",
"framework": "noetl",
"entrypoint": "api_integration/amadeus_ai_api",
"execution_id": "exec-failed-1",
"error": {
"kind": "agent.execution",
"code": "PLAYBOOK_FAILED",
"message": "...",
"retryable": false,
"diagnosis": {
"category": "transient_5xx",
"confidence": 0.82,
"root_cause": "Amadeus sandbox returned HTTP 502",
"suggested_action": "Retry; if persistent, check api.amadeus.com status",
"source": "ollama",
"escalated": false
}
}
}

A recursion guard prevents the troubleshoot agent from auto-diagnosing its own failures. If the troubleshoot agent itself fails, the original error envelope is returned unchanged — diagnostics augment failures, they never replace them.

Workload pass-through to the troubleshoot agent

on_failure accepts the troubleshoot agent's workload knobs:

KeyDefaultWhat it controls
troubleshootfalseper-task opt-in (overrides env)
troubleshoot_pathautomation/agents/troubleshoot/diagnose_executioncatalog path of the diagnostic agent
ollama_modelgemma2:2blocal model for first-pass triage
ollama_mcp_servermcp/ollamacatalog path of the Ollama MCP bridge
confidence_threshold0.7escalate when local confidence < this
escalate_toopenaiopenai / claude / none
openai_credentialopenai_tokenkeychain entry for the API key
openai_modelgpt-4o-miniOpenAI model for escalation
noetl_urlhttp://noetl-server.noetl.svc.cluster.local:8080NoETL API base for fetching events

Unknown on_failure keys are ignored at the troubleshoot dispatch — they're filtered to the known set so an arbitrary key doesn't leak into the workload silently.

Optional-dependency contract

AI features in NoETL are optional. A deployment can run the worker + server without ever touching tool: kind: agent framework=noetl, the playbook-as-MCP-server endpoint, the Ollama bridge, or the self-troubleshoot agent. Core workflow execution must keep working when those subsystems are missing.

The contract this enforces:

  • No worker / server crashes when an AI subsystem is missing. Module-level imports for AI-only paths are stdlib-only; optional packages (aiohttp, fastapi, uvicorn) are lazy-imported inside the functions that need them. A deployment without those packages still loads the noetl modules cleanly.
  • Playbook steps surface clean error envelopes, not tracebacks. When framework: noetl is invoked but noetl.core.workflow.playbook can't be imported, the agent executor returns a structured error with error.kind = "agent.dependency" and error.code = "WORKFLOW_PLUGIN_UNAVAILABLE". The worker keeps running; the playbook step fails with a clear "this feature is not available" message; non-AI playbooks are unaffected.
  • Auto-troubleshoot best-effort. When the troubleshoot agent itself can't be reached (Ollama down, agent not registered, the workflow plugin failed to import), the original error envelope is returned unchanged. Diagnostics augment failures, never replace them.
  • Other agent frameworks unaffected. framework: adk, langchain, custom go through a separate dispatch path that doesn't touch the workflow plugin. A deployment without the AI subsystems can still use Python-loaded agent runtimes.

The smoke test scripts/optional_ai_smoke.py exercises the contract: it loads the executor with noetl.core.workflow.playbook deliberately missing and verifies the structured error envelope; it asserts execute_playbook_task references are confined to the framework=noetl helpers; it loads noetl.tools.ollama_bridge and asserts no optional packages leaked into sys.modules.

Configuration reference

tool:
kind: agent

# One of: adk | langchain | custom | noetl
framework: noetl

# For framework=noetl: catalog path. Otherwise: 'pkg.module:attr'.
entrypoint: api_integration/amadeus_ai_api

# Catalog version pin (framework=noetl only). Default: latest.
version: 2

# Workload-equivalent payload merged into the sub-flow's input.
payload:
query: "{{ user_query }}"

# Extra kwargs merged on top of payload (caller-side overrides).
invoke_kwargs:
timeout_s: 30

# framework=adk|langchain|custom only:
entrypoint_mode: factory # 'factory' (default) or 'callable'
entrypoint_args: {} # kwargs passed to factory
invoke_method: run_async # explicit method override

# Auto-troubleshoot hook (framework=noetl only).
on_failure:
troubleshoot: true
troubleshoot_path: automation/agents/troubleshoot/diagnose_execution
ollama_model: gemma2:2b
confidence_threshold: 0.7
escalate_to: openai

See also