HTTP Pagination Implementation Summary
Overview
Implemented automatic HTTP pagination support for NoETL, allowing declarative configuration of paginated API calls with result merging and retry capabilities.
Implementation Date
January 2025 (v1.5.0)
Feature Description
Adds loop.pagination block to HTTP actions, enabling automatic continuation based on API response inspection. Supports multiple pagination patterns (page number, offset, cursor) with configurable merge strategies and built-in retry mechanism.
User-Facing Changes
New Syntax
- step: fetch_all_data
tool: http
url: "{{ api_url }}/data"
params:
page: 1
loop:
pagination:
type: response_based
continue_while: "{{ response.data.paging.hasMore }}"
next_page:
params:
page: "{{ (response.data.paging.page | int) + 1 }}"
merge_strategy: append
merge_path: data.data
max_iterations: 100
retry:
max_attempts: 3
backoff: exponential
Note: HTTP responses are wrapped as {id, status, data: <api_response>}, so:
- Use
response.data.*to access API fields (notresponse.*) - Use
merge_path: data.datafor nested data (firstdatais wrapper, second is API field)
Configuration Attributes
Required
type- Pagination type (currently onlyresponse_based)continue_while- Jinja2 boolean expression for continuationnext_page- Dict withparams,body, orheadersto updatemerge_strategy- How to combine results:append,extend,replace,collect
Optional
merge_path- JSONPath to data array (dot notation)max_iterations- Safety limit (default: 1000)retry- Retry configuration blockmax_attempts- Number of retries (default: 1)backoff-fixedorexponential(default: fixed)initial_delay- Seconds before first retry (default: 1)max_delay- Maximum backoff seconds (default: 60)
Implementation Details
Modified Files
-
noetl/tools/controller/iterator/config.py
- Added
extract_pagination_config()function - Extracts and validates pagination block from loop config
- Returns
pagination_configin config dict - Validates required fields and merge strategy
- Added
-
noetl/tools/controller/iterator/executor.py
- Added pagination detection in
execute_loop_task() - Delegates to pagination executor when
pagination_configpresent - Returns paginated results as task result
- Added pagination detection in
-
noetl/tools/controller/iterator/pagination.py (NEW)
- Main pagination orchestrator
execute_paginated_http()- Main execution functionexecute_with_retry()- Retry logic per requestmerge_response()- Result accumulation strategiesrender_dict()- Recursive template rendering
Key Functions
execute_paginated_http()
Main orchestrator that:
- Extracts pagination config
- Initializes accumulator
- Loops while continuation condition true:
- Renders HTTP config with current context
- Executes HTTP request with retry
- Merges response into accumulator
- Evaluates
continue_whileexpression - Updates request parameters for next page
- Returns accumulated results
execute_with_retry()
Handles retry logic:
- Calls async
execute_http_task()in sync context - Implements exponential or fixed backoff
- Returns response or raises last error
merge_response()
Implements merge strategies:
- append:
accumulated.extend(data_to_merge) - extend: Flattens nested arrays
- replace: Returns last response
- collect: Appends each response to array
Available Context Variables
In continue_while and next_page expressions:
{{ response }}- Current HTTP response body (parsed JSON){{ iteration }}- Current iteration number (0-based){{ accumulated }}- Merged results so far{{ workload.* }}- Global workflow variables{{ vars.* }}- Execution-scoped variables
Test Infrastructure
Mock Server
File: tests/fixtures/servers/paginated_api.py
Technology: FastAPI with uvicorn
Endpoints:
/api/v1/assessments- Page number pagination/api/v1/users- Offset pagination/api/v1/events- Cursor pagination/api/v1/flaky- Failure injection for retry testing/health- Health check
Configuration:
- 35 total items
- 10 items per page
- Realistic pagination metadata
Test Playbooks
Directory: tests/fixtures/playbooks/pagination/
-
test_pagination_basic.yaml
- Page number pagination
- Validates all 35 items fetched
- Tests
hasMoreflag
-
test_pagination_offset.yaml
- Offset-based pagination
- Tests
offset + limitcalculation - Validates user fetching
-
test_pagination_cursor.yaml
- Cursor-based pagination
- Tests opaque token handling
- Validates event fetching
-
test_pagination_retry.yaml
- Tests retry mechanism
- Page 2 configured to fail initially
- Validates exponential backoff
-
test_pagination_max_iterations.yaml
- Tests safety limit
max_iterations: 2stops at 2 pages- Validates only 20 items returned
Test Script
File: tests/scripts/test_pagination.sh
Features:
- Checks mock server and NoETL API health
- Registers playbooks via
/api/catalog/register - Executes via
/api/run/playbook - Polls execution status until completion
- Reports pass/fail for each test
- Summary with total/passed/failed counts
Usage:
# Start mock server
python tests/fixtures/servers/paginated_api.py 5555
# Run all tests
./tests/scripts/test_pagination.sh
Documentation
User Documentation
File: documentation/docs/features/pagination.md
Sections:
- Overview and quick start
- Pagination patterns (page, offset, cursor)
- Configuration reference (all attributes)
- Complete example playbook
- Best practices
- Troubleshooting guide
- See also links
Quick Reference
File: documentation/docs/reference/http_pagination_quick_reference.md
Contains:
- Minimal examples
- Common patterns
- All merge strategies
- Available variables
- File locations
Design Document
File: documentation/docs/features/pagination_design.md
Contains:
- Use cases
- Complete attribute reference
- Implementation phases
- Merge strategy details
- Example playbooks
- Error handling
Supported Pagination Patterns
1. Page Number
continue_while: "{{ response.paging.hasMore }}"
next_page:
params:
page: "{{ response.paging.page + 1 }}"
2. Offset-Based
continue_while: "{{ response.has_more }}"
next_page:
params:
offset: "{{ response.offset + response.limit }}"
3. Cursor-Based
continue_while: "{{ response.next_cursor is not none }}"
next_page:
params:
cursor: "{{ response.next_cursor }}"
Error Handling
- HTTP Errors: Retried based on retry configuration
- Max Iterations: Stops with warning, returns accumulated data
- Invalid Response: Stops with error if
continue_whileevaluation fails - Merge Errors: Stops with error if
merge_pathnot found - Async Context: Handles both sync and async event loop contexts
Backward Compatibility
- Fully backward compatible
- Pagination only active when
loop.paginationblock present
Performance Considerations
- Sequential Execution: Requests are sequential (not parallel)
- Memory: Accumulates all responses in memory
- Safety:
max_iterationsprevents runaway loops - Retry: Adds latency but improves reliability
Future Enhancements
Potential improvements:
- Parallel page fetching (when order doesn't matter)
- Streaming merge to database (avoid memory limits)
- Cursor-based pagination type with automatic token extraction
- Link header parsing (RFC 5988)
- Rate limiting support (X-RateLimit headers)
- Progress reporting (X out of Y pages)
Known Limitations
- Only works with HTTP tool (not postgres, duckdb, etc.)
- Sequential execution only (no parallel page fetching)
- All results accumulated in memory
- Requires JSON response (no XML, CSV, etc.)
Testing Checklist
- Config extraction and validation
- Pagination detection and delegation
- Page number pagination
- Offset pagination
- Cursor pagination
- Result merging (all 4 strategies)
- Retry mechanism with backoff
- Max iterations safety limit
- Mock server with realistic data
- Comprehensive test script
- User documentation
- Quick reference guide
- Integration with live NoETL deployment
- Performance benchmarking
- Error handling edge cases
Next Steps
-
Build and Deploy
task docker-build-noetl
task kind-load-image image=local/noetl:latest
task deploy-noetl -
Start Mock Server
python tests/fixtures/servers/paginated_api.py 5555 -
Run Tests
./tests/scripts/test_pagination.sh -
Validate Results
- All 5 tests should pass
- Check logs for pagination events
- Verify merged result counts
-
Update Copilot Instructions
- Add pagination pattern to examples
- Document in
.github/copilot-instructions.md
Related Issues/PRs
- Phase 2 Task 3: Variable Management API (completed)
- Pagination feature request (user provided example with
hasMoreflag) - HTTP action improvements roadmap
Contributors
- Implementation: AI Agent (GitHub Copilot)
- Review: User (akuksin)
- Testing: Pending
Version History
- v1.5.0 (2025-01): Initial pagination implementation
- Added
loop.paginationblock - Support for response-based continuation
- 4 merge strategies
- Retry integration
- Comprehensive test suite
- Added