> For the complete documentation index, see [llms.txt](https://docs.pullbay.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.pullbay.com/documentation/concepts/pagination.md).

# Pagination

## Overview

Pagination is essential for efficiently retrieving large datasets from APIs. Pullbay offers two pagination models to give you complete flexibility: **Standard pagination** for granular control and **Managed pagination** for simplicity. This guide explains both approaches, when to use each, and best practices for production workloads.

**Keywords covered:** API pagination, cursor-based pagination, managed pagination API, efficient data fetching, API design patterns.

***

## Why Pagination Matters

Large datasets present three critical challenges:

1. **Performance**: Retrieving millions of records in a single request strains both client and server resources, causing timeouts and memory exhaustion.
2. **Network Efficiency**: Breaking data into smaller pages reduces bandwidth usage and allows incremental processing.
3. **Credit Control**: Pullbay charges per request. Pagination lets you fetch only what you need, controlling costs and avoiding wasted credits on unused data.

Without pagination, a single request for 100,000 items would consume massive credits and likely fail. Pagination lets you process data incrementally, caching and stopping when you have enough results.

***

## Standard Pagination (You Control the Flow)

Standard pagination gives you complete control: you request pages one at a time, process each, and decide whether to continue. This model is ideal when you need only a subset of data or want to implement custom logic.

### How It Works

1. Make an initial request to the endpoint without a cursor
2. Receive the first page of results plus a `next_cursor` token
3. Request the next page using the `cursor` parameter with the returned token
4. Repeat until `has_more` is `false` (no more pages exist)

### Standard Pagination Example with cURL

**First Request (No Cursor):**

```bash
curl -X GET "https://api.pullbay.com/v1/app-store/reviews" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Accept: application/json" \
  -G \
  --data-urlencode "business_id=12345" \
  --data-urlencode "limit=50"
```

**Response:**

```json
{
  "data": [
    {
      "id": "rev_001",
      "text": "Great service!",
      "rating": 5,
      "created_at": "2024-01-15T10:30:00Z"
    },
    {
      "id": "rev_002",
      "text": "Good experience",
      "rating": 4,
      "created_at": "2024-01-14T09:20:00Z"
    }
  ],
  "pagination": {
    "next_cursor": "eyJpZCI6ICJyZXZfMDAyIiwgIm9mZnNldCI6IDUwfQ==",
    "has_more": true,
    "limit": 50,
    "count": 50
  }
}
```

**Second Request (With Cursor):**

```bash
curl -X GET "https://api.pullbay.com/v1/app-store/reviews" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Accept: application/json" \
  -G \
  --data-urlencode "business_id=12345" \
  --data-urlencode "limit=50" \
  --data-urlencode "cursor=eyJpZCI6ICJyZXZfMDAyIiwgIm9mZnNldCI6IDUwfQ=="
```

**Subsequent Response:**

```json
{
  "data": [
    {
      "id": "rev_051",
      "text": "Excellent work",
      "rating": 5,
      "created_at": "2024-01-13T14:45:00Z"
    }
  ],
  "pagination": {
    "next_cursor": "eyJpZCI6ICJyZXZfMDUxIiwgIm9mZnNldCI6IDEwMH0=",
    "has_more": false,
    "limit": 50,
    "count": 1
  }
}
```

When `has_more` is `false`, no more pages exist—stop paginating.

### Complete Python Implementation

```python
import requests
import time

class PullbayPaginator:
    def __init__(self, api_key, base_url="https://api.pullbay.com/v1"):
        self.api_key = api_key
        self.base_url = base_url
        self.session = requests.Session()
        self.session.headers.update({
            "Authorization": f"Bearer {api_key}",
            "Accept": "application/json"
        })

    def paginate(self, endpoint, params=None, limit=50, max_results=None):
        """
        Generator that yields individual items across all pages.

        Args:
            endpoint: API endpoint (e.g., 'reviews')
            params: Query parameters dict
            limit: Items per page (default 50)
            max_results: Stop after this many results (None = all)

        Yields:
            Individual items from all pages
        """
        if params is None:
            params = {}

        params['limit'] = limit
        cursor = None
        items_fetched = 0

        while True:
            if cursor:
                params['cursor'] = cursor

            try:
                response = self.session.get(
                    f"{self.base_url}/{endpoint}",
                    params=params,
                    timeout=30
                )
                response.raise_for_status()
            except requests.exceptions.RequestException as e:
                print(f"Request failed: {e}")
                raise

            data = response.json()
            items = data.get('data', [])
            pagination = data.get('pagination', {})

            # Yield each item
            for item in items:
                yield item
                items_fetched += 1

                # Stop if max_results reached
                if max_results and items_fetched >= max_results:
                    return

            # Check if more pages exist
            if not pagination.get('has_more', False):
                break

            cursor = pagination.get('next_cursor')
            if not cursor:
                break

            # Small delay to respect rate limits
            time.sleep(0.1)

# Usage Example
paginator = PullbayPaginator(api_key="your_api_key")

# Fetch all reviews for a business
all_reviews = []
for review in paginator.paginate(
    endpoint="reviews",
    params={"business_id": "12345"},
    limit=50,
    max_results=100  # Stop after 100 reviews for cost control
):
    all_reviews.append(review)
    print(f"Fetched review: {review['id']}")

print(f"Total reviews fetched: {len(all_reviews)}")
```

### Tips for Large Datasets with Standard Pagination

{% stepper %}
{% step %}

### Stream Results

Don't wait for all data before processing. Use generators (as shown above) to process results incrementally.

```python
for review in paginator.paginate("reviews", params={"business_id": "12345"}):
    process_review(review)  # Handle immediately
```

{% endstep %}

{% step %}

### Checkpoint and Resume

For long-running jobs, save your progress. If interrupted, resume from the last cursor instead of restarting.

```python
import json

checkpoint_file = "pagination_checkpoint.json"

# Load last cursor if exists
last_cursor = None
if os.path.exists(checkpoint_file):
    with open(checkpoint_file) as f:
        checkpoint = json.load(f)
        last_cursor = checkpoint.get('cursor')

# Resume paginating from checkpoint
for review in paginator.paginate("reviews", params={"business_id": "12345"}):
    if last_cursor:
        # Skip until we reach checkpoint
        if review['id'] == last_cursor:
            last_cursor = None  # Resume from here
            continue

    process_review(review)

    # Save checkpoint every 100 items
    if len(all_reviews) % 100 == 0:
        with open(checkpoint_file, 'w') as f:
            json.dump({'cursor': review['id'], 'count': len(all_reviews)}, f)
```

{% endstep %}

{% step %}

### Parallel Fetching

If you have multiple business IDs, fetch them concurrently to reduce total time.

```python
from concurrent.futures import ThreadPoolExecutor

business_ids = ["id1", "id2", "id3", "id4"]

def fetch_reviews(business_id):
    reviews = []
    for review in paginator.paginate("reviews", params={"business_id": business_id}):
        reviews.append(review)
    return reviews

with ThreadPoolExecutor(max_workers=4) as executor:
    all_reviews = executor.map(fetch_reviews, business_ids)
```

{% endstep %}
{% endstepper %}

### When to Use Standard Pagination

* **Fine-grained control**: You want to process data as it arrives
* **Subset of data**: You need only the first N results, not everything
* **Cost efficiency**: You want to minimize credit usage by stopping early
* **Custom applications**: You're building a custom app with specific requirements
* **Real-time processing**: You're streaming results to another system

***

## Managed Pagination (Pullbay Controls the Flow)

Managed pagination simplifies everything: make one request to the `/all` endpoint, and Pullbay fetches all pages internally, returning everything in a single response. Perfect for automation tools and batch jobs.

### How It Works

1. Make a single request to the `/all` endpoint
2. Pullbay internally fetches all pages with your parameters
3. Receive a single response containing all matching data
4. No cursor handling or loop logic needed

### Managed Pagination Example with cURL

```bash
curl -X GET "https://api.pullbay.com/v1/app-store/reviews/all" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Accept: application/json" \
  -G \
  --data-urlencode "business_id=12345" \
  --data-urlencode "timeout=120"
```

**Response (All Data in One Response):**

```json
{
  "data": [
    {
      "id": "rev_001",
      "author": "user_a",
      "rating": 5,
      "title": "Great service!",
      "content": "Absolutely love this app. Works flawlessly.",
      "date": "2024-01-15T10:30:00Z",
      "version": "3.1.0",
      "helpful_count": 12
    },
    {
      "id": "rev_002",
      "author": "user_b",
      "rating": 4,
      "title": "Good experience",
      "content": "Really solid app overall, minor issues with loading.",
      "date": "2024-01-14T09:20:00Z",
      "version": "3.0.9",
      "helpful_count": 5
    }
  ],
  "meta": {
    "total_results": 3,
    "pages_fetched": 1,
    "credits_used": 1,
    "request_id": "req_abc123"
  }
}
```

All 3 reviews returned in a single response—no pagination needed.

### Complete Python Implementation

```python
import requests
import json
from typing import List, Dict, Any

class PullbayManagedPaginator:
    def __init__(self, api_key, base_url="https://api.pullbay.com/v1"):
        self.api_key = api_key
        self.base_url = base_url
        self.session = requests.Session()
        self.session.headers.update({
            "Authorization": f"Bearer {api_key}",
            "Accept": "application/json"
        })

    def fetch_all(self, endpoint, params=None, timeout=120) -> List[Dict[str, Any]]:
        """
        Fetch all results in a single request using managed pagination.

        Args:
            endpoint: API endpoint (e.g., 'reviews')
            params: Query parameters dict
            timeout: Request timeout in seconds (120s recommended for managed)

        Returns:
            List of all items from the endpoint
        """
        if params is None:
            params = {}

        params['timeout'] = timeout

        try:
            response = self.session.get(
                f"{self.base_url}/{endpoint}/all",
                params=params,
                timeout=timeout + 10  # Add buffer to HTTP timeout
            )
            response.raise_for_status()
        except requests.exceptions.RequestException as e:
            print(f"Failed to fetch all results: {e}")
            raise

        data = response.json()
        return data.get('data', [])

# Usage Example
managed_paginator = PullbayManagedPaginator(api_key="your_api_key")

# Fetch all reviews in one request
all_reviews = managed_paginator.fetch_all(
    endpoint="reviews",
    params={"business_id": "12345"},
    timeout=120
)

print(f"Total reviews: {len(all_reviews)}")
for review in all_reviews:
    print(f"Review {review['id']}: {review['text']}")
```

### When to Use Managed Pagination

* **Automation tools**: Using n8n, Make.com, Zapier, or similar platforms
* **Complete datasets**: You need all matching records, not a subset
* **Simplicity**: You prefer one request over pagination loops
* **Batch jobs**: Running scheduled exports or data synchronization
* **Small to medium datasets**: Under 10,000 items (see warnings below)

### Important: Timeout Configuration

Managed pagination fetches all pages internally, which takes longer. Always set an appropriate timeout:

* **Standard pagination**: 30 seconds (default)
* **Managed pagination**: 120 seconds (required)

Without sufficient timeout, the request may fail before Pullbay finishes fetching all pages.

```python
# Correct: 120s timeout for managed
response = managed_paginator.fetch_all(
    endpoint="reviews",
    params={"business_id": "12345"},
    timeout=120
)

# Risky: 30s timeout may timeout during fetch
response = managed_paginator.fetch_all(
    endpoint="reviews",
    params={"business_id": "12345"},
    timeout=30  # Too short!
)
```

***

## Choosing the Right Pagination Model

Use this decision table to quickly choose between Standard and Managed pagination:

| Scenario                     | Best Choice  | Reason                                   |
| ---------------------------- | ------------ | ---------------------------------------- |
| Building custom application  | **Standard** | Fine-grained control over data flow      |
| Using n8n, Make.com, Zapier  | **Managed**  | These platforms prefer single requests   |
| Need first 100 results only  | **Standard** | Avoid fetching unnecessary data          |
| Exporting all data monthly   | **Managed**  | Simplicity; no pagination loop needed    |
| Real-time data processing    | **Standard** | Process results as they arrive           |
| Reducing credit usage        | **Standard** | Fetch only what you need, stop early     |
| Large dataset (10k+ items)   | **Standard** | Managed may timeout or exhaust memory    |
| Want simplest implementation | **Managed**  | Single API call vs pagination loop       |
| API integration in workflow  | **Standard** | Handle pagination as part of integration |
| Scheduled batch job          | **Managed**  | Fire-and-forget simplicity               |

***

## Performance Comparison

This table compares both models across key dimensions:

| Metric                      | Standard Pagination             | Managed Pagination             |
| --------------------------- | ------------------------------- | ------------------------------ |
| **Time to first result**    | 1-2 seconds                     | 30-120 seconds                 |
| **Total time for all data** | Gradual (per page)              | All at once                    |
| **Memory usage**            | Low (process incrementally)     | High (store entire result)     |
| **Code complexity**         | High (loops, cursor handling)   | Low (single request)           |
| **Partial fetching**        | Yes (stop when you have enough) | No (fetches everything)        |
| **Best for**                | Custom apps, cost optimization  | Automation tools, exports      |
| **Failure recovery**        | Easy (resume from cursor)       | Difficult (retry entire fetch) |

**Time to first result**: Standard returns your first items in 1-2 seconds. Managed takes 30-120 seconds because Pullbay fetches all pages server-side before responding.

**Memory usage**: Standard processes items one at a time (you can process and discard each item). Managed stores the entire result in memory.

**Code complexity**: Standard requires pagination loops and cursor handling. Managed is a single function call.

***

## Credit Costs

**The credit cost is identical for both models when fetching the same data.**

If you fetch 5,000 reviews for a business, the total credit cost is the same whether you use:

* Standard pagination: 100 requests × 50 results = 5,000 items
* Managed pagination: 1 request that fetches 5,000 items server-side

The difference is **control**, not cost:

* **Standard**: You control how many requests (and how many items) to fetch
* **Managed**: Pullbay controls the requests; you pay for all items

To minimize costs with Managed pagination, use query filters to reduce the total dataset size before requesting:

```python
# Expensive: fetch ALL reviews, get all in one request
all_reviews = managed_paginator.fetch_all("reviews", params={"business_id": "12345"})

# Cheaper: fetch only recent reviews with high ratings
recent_reviews = managed_paginator.fetch_all(
    "reviews",
    params={
        "business_id": "12345",
        "rating_min": 4,
        "created_after": "2024-01-01"
    }
)
```

***

## Caching Strategies

Combine pagination with caching to maximize efficiency and reduce credit usage.

### Standard Pagination Caching

Cache results by **cursor key** to avoid re-fetching pages:

```python
import hashlib
import json
from datetime import datetime, timedelta

class CachedPaginator:
    def __init__(self, api_key, cache_ttl=3600):
        self.paginator = PullbayPaginator(api_key)
        self.cache = {}
        self.cache_ttl = cache_ttl

    def paginate_cached(self, endpoint, params):
        """Paginate with caching."""
        cursor = None

        while True:
            # Build cache key from endpoint, params, and cursor
            cache_key = self._build_cache_key(endpoint, params, cursor)

            # Check cache
            if cache_key in self.cache:
                cached_data = self.cache[cache_key]
                if datetime.now() < cached_data['expires']:
                    print(f"Cache hit: {cache_key}")
                    items = cached_data['items']
                    next_cursor = cached_data['next_cursor']
                else:
                    # Cache expired
                    del self.cache[cache_key]
            else:
                # Fetch from API
                print(f"Cache miss: {cache_key}")
                response = self._fetch_page(endpoint, params, cursor)
                items = response['data']
                next_cursor = response['pagination'].get('next_cursor')

                # Cache result
                self.cache[cache_key] = {
                    'items': items,
                    'next_cursor': next_cursor,
                    'expires': datetime.now() + timedelta(seconds=self.cache_ttl)
                }

            for item in items:
                yield item

            if not next_cursor:
                break

            cursor = next_cursor

    def _build_cache_key(self, endpoint, params, cursor):
        key_data = json.dumps({
            'endpoint': endpoint,
            'params': params,
            'cursor': cursor
        }, sort_keys=True)
        return hashlib.md5(key_data.encode()).hexdigest()

    def _fetch_page(self, endpoint, params, cursor):
        if cursor:
            params = params.copy()
            params['cursor'] = cursor

        response = self.paginator.session.get(
            f"{self.paginator.base_url}/{endpoint}",
            params=params
        )
        response.raise_for_status()
        return response.json()
```

### Managed Pagination Caching

Cache the entire result as a complete dataset:

```python
class CachedManagedPaginator:
    def __init__(self, api_key, cache_ttl=3600):
        self.managed_paginator = PullbayManagedPaginator(api_key)
        self.cache = {}
        self.cache_ttl = cache_ttl

    def fetch_all_cached(self, endpoint, params):
        """Fetch all results with caching."""
        cache_key = self._build_cache_key(endpoint, params)

        # Check cache
        if cache_key in self.cache:
            cached_data = self.cache[cache_key]
            if datetime.now() < cached_data['expires']:
                print(f"Cache hit: {endpoint}")
                return cached_data['data']
            else:
                del self.cache[cache_key]

        # Fetch from API
        print(f"Cache miss: {endpoint}")
        data = self.managed_paginator.fetch_all(endpoint, params, timeout=120)

        # Cache result
        self.cache[cache_key] = {
            'data': data,
            'expires': datetime.now() + timedelta(seconds=self.cache_ttl)
        }

        return data

    def _build_cache_key(self, endpoint, params):
        key_data = json.dumps({
            'endpoint': endpoint,
            'params': params
        }, sort_keys=True)
        return hashlib.md5(key_data.encode()).hexdigest()

# Usage
cached_managed = CachedManagedPaginator(api_key="your_api_key", cache_ttl=3600)

# First call: fetches from API
reviews = cached_managed.fetch_all_cached("reviews", {"business_id": "12345"})

# Second call within 1 hour: serves from cache
reviews = cached_managed.fetch_all_cached("reviews", {"business_id": "12345"})
```

### Invalidation Strategy

Decide cache TTL based on data freshness requirements:

| Data Type             | Freshness | TTL       | Strategy                   |
| --------------------- | --------- | --------- | -------------------------- |
| Real-time metrics     | Minutes   | 5 minutes | No cache or very short TTL |
| Daily reports         | Hours     | 6 hours   | Hourly refresh             |
| Historical data       | Days      | 1 day     | Daily refresh              |
| Static reference data | Weeks     | 7 days    | Manual invalidation        |

***

## FAQ

<details>

<summary>Are cursor strings safe to store?</summary>

**Yes, but treat them as temporary tokens:**

* Cursors are **opaque** (you can't read or modify them)
* They may **expire** after a certain period (typically 24-48 hours)
* They're tied to your **specific query** (don't reuse across different endpoints or params)
* You can safely store them in your database for **resuming interrupted jobs**

```python
# Safe: store cursor for resuming
checkpoint = {
    'endpoint': 'reviews',
    'params': {'business_id': '12345'},
    'cursor': 'eyJpZCI6ICJyZXZfMDAyIiwgIm9mZnNldCI6IDUwfQ==',
    'timestamp': datetime.now().isoformat()
}

with open('checkpoint.json', 'w') as f:
    json.dump(checkpoint, f)

# Later: resume from checkpoint
with open('checkpoint.json') as f:
    checkpoint = json.load(f)
    # Use checkpoint['cursor'] to resume
```

</details>

<details>

<summary>Do cursors expire?</summary>

**Yes, cursors are temporary.** They typically remain valid for 24-48 hours. If you resume from an expired cursor, you'll get an error. Handle cursor expiration gracefully:

```python
def resume_pagination(checkpoint):
    try:
        results = list(paginator.paginate(
            endpoint=checkpoint['endpoint'],
            params=checkpoint['params'],
            cursor=checkpoint['cursor']
        ))
        return results
    except Exception as e:
        if 'cursor expired' in str(e).lower():
            print("Cursor expired, restarting from beginning")
            return list(paginator.paginate(
                endpoint=checkpoint['endpoint'],
                params=checkpoint['params']
            ))
        raise
```

</details>

<details>

<summary>What's the credit cost difference between standard and managed?</summary>

**There is no difference in credit cost for the same data.**

Both charge per item retrieved. If you fetch 5,000 reviews:

* Standard: You make 100 requests (at 50 items each) = 5,000 items charged
* Managed: You make 1 request = 5,000 items charged

**The cost difference comes from controlling what you fetch:**

```
Standard: Fetch 100 reviews (cost: 100 credits) ✓ Efficient
Managed: Fetch 5,000 reviews (cost: 5,000 credits) ✗ Wasteful
```

Use Standard to control cost; use Managed for simplicity (knowing it fetches everything).

</details>

<details>

<summary>Can I use managed pagination in n8n?</summary>

**Yes, absolutely. Managed pagination is ideal for n8n:**

```json
{
  "node": "Pullbay API",
  "operation": "Fetch all reviews",
  "endpoint": "reviews/all",
  "parameters": {
    "business_id": "12345",
    "timeout": 120
  }
}
```

n8n will execute the managed request, wait for all results (up to the 120-second timeout), and return the complete dataset. No pagination loop needed in your workflow.

</details>

***

## Summary

**Choose Standard pagination for:**

* Custom control over data flow
* Fetching subsets (first N results)
* Cost optimization (stop early)
* Processing results incrementally

**Choose Managed pagination for:**

* Automation tools (n8n, Make.com)
* Complete dataset exports
* Simplicity (one request)
* Batch jobs with sufficient timeout

Both models charge the same per item retrieved. The difference is flexibility and control. Combine either with caching for maximum efficiency and cost savings.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.pullbay.com/documentation/concepts/pagination.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
