Script Tool

Execute scripts as Kubernetes Jobs with support for custom containers, dependencies, and resource management.

Overview

The Script tool runs scripts in isolated Kubernetes Job pods, providing:

Custom container images (Python, Node.js, etc.)
Automatic dependency installation
Resource limits and requests
Environment variable injection via Secrets
Script content via ConfigMaps
Pod log capture

Architecture

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│   NoETL Worker  │────▶│  Kubernetes API │────▶│   Script Job    │
│                 │     │                 │     │   (Container)   │
│  Creates:       │     │  Manages:       │     │                 │
│  - ConfigMap    │     │  - Job lifecycle│     │  Runs script    │
│  - Secret       │     │  - Pod creation │     │  with args      │
│  - Job          │     │  - Log capture  │     │                 │
└─────────────────┘     └─────────────────┘     └─────────────────┘

Configuration

Basic Structure

tool:
  kind: script
  script:
    type: inline
    content: |
      print("Hello from Kubernetes!")
  job:
    image: python:3.11-slim
    namespace: noetl
    timeout: 300

Script Sources

Type	Description
`inline`	Script content provided directly
`local`	Script content from local variable
`http`	Script fetched from HTTP URL
`gcs`	Script from Google Cloud Storage (requires auth)
`s3`	Script from AWS S3 (requires auth)

Examples

Basic Python Script

workflow:
  - step: run_analysis
    desc: Run Python analysis script
    tool:
      kind: script
      script:
        type: inline
        content: |
          import json
          import sys

          # Parse arguments
          args = json.loads(sys.argv[1]) if len(sys.argv) > 1 else {}

          # Process data
          result = {
              "input": args,
              "processed": True,
              "count": len(args.get("items", []))
          }

          print(json.dumps(result))
      args:
        items: [1, 2, 3, 4, 5]
      job:
        image: python:3.11-slim
        timeout: 60

With Dependencies

workflow:
  - step: data_processing
    desc: Process data with pandas
    tool:
      kind: script
      script:
        type: inline
        content: |
          import pandas as pd
          import json
          import sys

          args = json.loads(sys.argv[1])

          # Create DataFrame
          df = pd.DataFrame(args.get("data", []))

          # Process
          result = {
              "rows": len(df),
              "columns": list(df.columns),
              "summary": df.describe().to_dict()
          }

          print(json.dumps(result))
      args:
        data:
          - name: "Alice"
            age: 30
          - name: "Bob"
            age: 25
      job:
        image: python:3.11-slim
        install_dependencies:
          - pandas
          - numpy
        timeout: 300
        resources:
          requests:
            memory: "256Mi"
            cpu: "100m"
          limits:
            memory: "1Gi"
            cpu: "500m"

With Environment Variables

workflow:
  - step: api_integration
    desc: Call external API with credentials
    tool:
      kind: script
      script:
        type: inline
        content: |
          import os
          import requests
          import json

          api_key = os.environ.get("API_KEY")
          api_url = os.environ.get("API_URL")

          response = requests.get(
              f"{api_url}/data",
              headers={"Authorization": f"Bearer {api_key}"}
          )

          print(json.dumps(response.json()))
      job:
        image: python:3.11-slim
        install_dependencies:
          - requests
        env:
          API_KEY: "{{ secrets.EXTERNAL_API_KEY }}"
          API_URL: "https://api.example.com"
        timeout: 120

From HTTP URL

workflow:
  - step: run_remote_script
    desc: Execute script from URL
    tool:
      kind: script
      script:
        type: http
        uri: "https://raw.githubusercontent.com/org/repo/main/scripts/process.py"
      args:
        input_file: "/data/input.json"
        output_file: "/data/output.json"
      job:
        image: python:3.11-slim
        timeout: 600

Custom Container Image

workflow:
  - step: node_processing
    desc: Run Node.js script
    tool:
      kind: script
      script:
        type: inline
        content: |
          const args = JSON.parse(process.argv[2] || '{}');

          const result = {
            timestamp: new Date().toISOString(),
            input: args,
            processed: true
          };

          console.log(JSON.stringify(result));
      args:
        message: "Hello from Node.js"
      job:
        image: node:20-slim
        timeout: 60

With Resource Limits

workflow:
  - step: heavy_computation
    desc: Resource-intensive computation
    tool:
      kind: script
      script:
        type: inline
        content: |
          import numpy as np

          # Heavy computation
          data = np.random.randn(10000, 10000)
          result = np.linalg.svd(data, full_matrices=False)

          print(f"Computation complete: {result[1].shape}")
      job:
        image: python:3.11-slim
        install_dependencies:
          - numpy
        timeout: 1800
        backoff_limit: 1
        resources:
          requests:
            memory: "4Gi"
            cpu: "2"
          limits:
            memory: "8Gi"
            cpu: "4"

Configuration Reference

Script Configuration

Field	Type	Required	Description
`type`	string	Yes	Script source type
`content`	string	No*	Inline script content
`uri`	string	No*	URL for remote scripts

*Either content or uri is required depending on type.

Job Configuration

Field	Type	Required	Default	Description
`image`	string	No	`python:3.11-slim`	Container image
`namespace`	string	No	`noetl`	Kubernetes namespace
`timeout`	integer	No	`600`	Job timeout in seconds
`backoff_limit`	integer	No	`3`	Number of retries
`ttl_seconds_after_finished`	integer	No	`300`	TTL for completed jobs
`install_dependencies`	array	No	`[]`	Python packages to install
`image_pull_policy`	string	No	`IfNotPresent`	Image pull policy

Resource Configuration

Field	Type	Description
`resources.requests.cpu`	string	CPU request (e.g., `100m`, `1`)
`resources.requests.memory`	string	Memory request (e.g., `128Mi`, `1Gi`)
`resources.limits.cpu`	string	CPU limit
`resources.limits.memory`	string	Memory limit

Environment Variables

Field	Type	Description
`env`	map	Key-value pairs for environment variables

Response Format

Successful Execution

{
  "status": "success",
  "data": {
    "status": "completed",
    "job_name": "noetl-script-abc123-xyz789",
    "namespace": "noetl",
    "pod_name": "noetl-script-abc123-xyz789-abcde",
    "execution_time": 45.23,
    "output": "{\"result\": \"success\", \"count\": 100}",
    "succeeded": 1,
    "failed": 0,
    "exit_code": 0
  },
  "duration_ms": 47532
}

Failed Execution

{
  "status": "error",
  "error": "Job failed with 1 failures",
  "data": {
    "status": "failed",
    "job_name": "noetl-script-abc123-xyz789",
    "namespace": "noetl",
    "pod_name": "noetl-script-abc123-xyz789-abcde",
    "execution_time": 12.5,
    "output": "Traceback (most recent call last):\n  File \"script.py\", line 5...",
    "succeeded": 0,
    "failed": 1,
    "exit_code": 1
  },
  "duration_ms": 15234
}

Timeout

{
  "status": "error",
  "error": "Job failed with 1 failures",
  "data": {
    "status": "timeout",
    "job_name": "noetl-script-abc123-xyz789",
    "namespace": "noetl",
    "execution_time": 600.0,
    "succeeded": 0,
    "failed": 1
  },
  "duration_ms": 600234
}

Kubernetes Resources Created

For each script execution, the tool creates:

ConfigMap: Contains the script content
- Name: {job-name}-script
- Mounted at /scripts/script.py
Secret (if env vars provided): Contains environment variables
- Name: {job-name}-env
- Injected as environment variables
Job: Runs the script container
- Name: noetl-script-{execution-id}-{random}
- Auto-deleted after TTL

Best Practices

Set Appropriate Timeouts:
- Short scripts: 60-300 seconds
- Data processing: 600-1800 seconds
- Long-running: 3600+ seconds
Use Resource Limits:
- Always set memory limits to prevent OOM
- Set CPU limits for fair scheduling
Handle Script Output:
- Print results as JSON for easy parsing
- Use stdout for results, stderr for logs
Secure Secrets:
- Use env with secrets for credentials
- Never hardcode sensitive data in scripts
Manage Dependencies:
- Pin dependency versions for reproducibility
- Consider custom images for complex dependencies

Limitations

Script must output results to stdout
Maximum script size limited by ConfigMap (1MB)
GCS/S3 sources require additional authentication setup
Pod logs may be truncated for very long outputs

Troubleshooting

Job Stuck in Pending

kubectl describe pod -n noetl <pod-name>
# Check for resource constraints or image pull issues

Script Errors

# View job logs
kubectl logs -n noetl job/<job-name>

# Check events
kubectl get events -n noetl --sort-by='.lastTimestamp'

Resource Issues

# Check resource quotas
kubectl describe resourcequota -n noetl

# Check node capacity
kubectl describe nodes

Overview​

Architecture​

Configuration​

Basic Structure​

Script Sources​

Examples​

Basic Python Script​

With Dependencies​

With Environment Variables​

From HTTP URL​

Custom Container Image​

With Resource Limits​

Configuration Reference​

Script Configuration​

Job Configuration​

Resource Configuration​

Environment Variables​

Response Format​

Successful Execution​

Failed Execution​

Timeout​

Kubernetes Resources Created​

Best Practices​

Limitations​

Troubleshooting​

Job Stuck in Pending​

Script Errors​

Resource Issues​

See Also​