Rate Limits

The Claro API implements rate limiting to ensure fair usage and maintain service quality for all users. This guide explains how rate limits work and how to handle them in your application.

Rate Limit Tiers

Rate limits vary based on authentication status:

Tier	Requests per Minute	Requests per Hour
Unauthenticated	100	1,000
Authenticated	1,000	10,000

Enterprise customers can request higher rate limits. Contact [email protected] for custom limits.

Rate Limit Headers

Every API response includes headers indicating your current rate limit status:

X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 847
X-RateLimit-Reset: 1705329600

Header	Description
`X-RateLimit-Limit`	Maximum requests allowed in the current window
`X-RateLimit-Remaining`	Number of requests remaining in the current window
`X-RateLimit-Reset`	Unix timestamp when the rate limit window resets

429 Too Many Requests

When you exceed the rate limit, the API returns a 429 Too Many Requests response:

{
  "error": {
    "code": "rate_limit_exceeded",
    "message": "Rate limit exceeded. Please retry after 60 seconds.",
    "retry_after": 60
  }
}

The response includes:

Retry-After header: Seconds until you can retry
retry_after field: Same value in the JSON response body

Handling Rate Limits

Automatic Retry with SDK

The Python SDK automatically handles rate limits with exponential backoff:

from baytos.claro import BaytClient

client = BaytClient(
    api_key="your_api_key",
    max_retries=3  # Automatically retry up to 3 times
)

# SDK handles retries automatically
prompt = client.get_prompt("@workspace/my-prompt:v1")

The SDK will:

Wait for the duration specified in Retry-After header
Retry with exponential backoff (1s, 2s, 4s, …)
Throw BaytRateLimitError if all retries are exhausted

Manual Retry Logic

If calling the API directly, implement retry logic:

Python Example

import time
import requests

def get_prompt_with_retry(package_name, api_key, max_retries=3):
    url = f"https://api.baytos.ai/v1/prompts/{package_name}"
    headers = {"Authorization": f"Bearer {api_key}"}

    for attempt in range(max_retries):
        response = requests.get(url, headers=headers)

        if response.status_code == 200:
            return response.json()

        if response.status_code == 429:
            # Get retry delay from header
            retry_after = int(response.headers.get('Retry-After', 60))

            if attempt < max_retries - 1:
                print(f"Rate limited. Retrying after {retry_after}s...")
                time.sleep(retry_after)
                continue
            else:
                raise Exception("Rate limit exceeded after all retries")

        response.raise_for_status()

    raise Exception("Max retries exceeded")

JavaScript Example

async function getPromptWithRetry(packageName, apiKey, maxRetries = 3) {
  const url = `https://api.baytos.ai/v1/prompts/${packageName}`;
  const headers = {
    'Authorization': `Bearer ${apiKey}`,
    'Content-Type': 'application/json'
  };

  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const response = await fetch(url, { headers });

    if (response.ok) {
      return await response.json();
    }

    if (response.status === 429) {
      const retryAfter = parseInt(response.headers.get('Retry-After') || '60');

      if (attempt < maxRetries - 1) {
        console.log(`Rate limited. Retrying after ${retryAfter}s...`);
        await new Promise(resolve => setTimeout(resolve, retryAfter * 1000));
        continue;
      } else {
        throw new Error('Rate limit exceeded after all retries');
      }
    }

    throw new Error(`HTTP ${response.status}: ${response.statusText}`);
  }

  throw new Error('Max retries exceeded');
}

Best Practices

Monitor rate limit headers

Always check X-RateLimit-Remaining to know when you’re approaching the limit:

response = requests.get(url, headers=headers)
remaining = int(response.headers.get('X-RateLimit-Remaining', 0))

if remaining < 10:
    print("Warning: Approaching rate limit")

Implement exponential backoff

Don’t retry immediately. Use exponential backoff to avoid hammering the API:

import time

def exponential_backoff(attempt, base_delay=1):
    return base_delay * (2 ** attempt)

for attempt in range(max_retries):
    try:
        return make_request()
    except RateLimitError:
        if attempt < max_retries - 1:
            delay = exponential_backoff(attempt)
            time.sleep(delay)

Batch requests when possible

Instead of fetching prompts one at a time, use the list endpoint:

# Less efficient - multiple requests
for package_name in package_names:
    prompt = client.get_prompt(package_name)

# More efficient - single request
result = client.list_prompts(limit=50)
prompts = result['prompts']

Cache responses locally

Cache prompt data to reduce API calls:

import time
from functools import lru_cache

@lru_cache(maxsize=100)
def get_cached_prompt(package_name, cache_time):
    return client.get_prompt(package_name)

# Cache for 5 minutes
cache_key = int(time.time() / 300)
prompt = get_cached_prompt("@workspace/my-prompt:v1", cache_key)

Spread requests over time

If processing many prompts, add delays between requests:

import time

for package_name in package_names:
    prompt = client.get_prompt(package_name)
    process_prompt(prompt)

    # Small delay to avoid hitting rate limits
    time.sleep(0.1)  # 100ms delay

Use server-side caching

For production applications, implement server-side caching:

import redis
import json

cache = redis.Redis(host='localhost', port=6379, db=0)

def get_prompt_cached(package_name):
    # Check cache first
    cached = cache.get(f"prompt:{package_name}")
    if cached:
        return json.loads(cached)

    # Fetch from API if not cached
    prompt = client.get_prompt(package_name)

    # Cache for 5 minutes
    cache.setex(
        f"prompt:{package_name}",
        300,
        json.dumps(prompt.to_dict())
    )

    return prompt

Rate Limit Response Example

Checking Headers Before Hitting Limit

curl -i https://api.baytos.ai/v1/prompts \
  -H "Authorization: Bearer YOUR_API_KEY"

Response:

HTTP/2 200
Content-Type: application/json
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 847
X-RateLimit-Reset: 1705329600

{
  "data": {
    "prompts": [...]
  }
}

When Rate Limited

HTTP/2 429
Content-Type: application/json
Retry-After: 60
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1705329600

{
  "error": {
    "code": "rate_limit_exceeded",
    "message": "Rate limit exceeded. Please retry after 60 seconds.",
    "retry_after": 60
  }
}

Monitoring Rate Limit Usage

Track your rate limit usage in application logs:

import logging

logger = logging.getLogger(__name__)

def log_rate_limit_status(response):
    limit = response.headers.get('X-RateLimit-Limit')
    remaining = response.headers.get('X-RateLimit-Remaining')
    reset = response.headers.get('X-RateLimit-Reset')

    logger.info(
        f"Rate limit: {remaining}/{limit} remaining. "
        f"Resets at: {reset}"
    )

response = requests.get(url, headers=headers)
log_rate_limit_status(response)

Enterprise Rate Limits

Enterprise customers receive:

Higher default rate limits
Dedicated rate limit pools
Priority support for limit adjustments
Custom rate limit configurations per API key

Contact [email protected] to discuss enterprise options.

Next Steps

Error Handling

Learn how to handle all API errors

Python SDK

Use the SDK with built-in retry logic

Best Practices

Optimize your API usage

Authentication

Learn about API authentication

REST API

Endpoints

Rate Limit Tiers

Rate Limit Headers

429 Too Many Requests

Handling Rate Limits

Automatic Retry with SDK

Manual Retry Logic

Python Example

JavaScript Example

Best Practices

Rate Limit Response Example

Checking Headers Before Hitting Limit

When Rate Limited

Monitoring Rate Limit Usage

Enterprise Rate Limits

Next Steps

Error Handling

Python SDK

Best Practices

Authentication

REST API

Endpoints

​Rate Limit Tiers

​Rate Limit Headers

​429 Too Many Requests

​Handling Rate Limits

​Automatic Retry with SDK

​Manual Retry Logic

​Python Example

​JavaScript Example

​Best Practices

​Rate Limit Response Example

​Checking Headers Before Hitting Limit

​When Rate Limited

​Monitoring Rate Limit Usage

​Enterprise Rate Limits

​Next Steps

Error Handling

Python SDK

Best Practices

Authentication

Rate Limit Tiers

Rate Limit Headers

429 Too Many Requests

Handling Rate Limits

Automatic Retry with SDK

Manual Retry Logic

Python Example

JavaScript Example

Best Practices

Rate Limit Response Example

Checking Headers Before Hitting Limit

When Rate Limited

Monitoring Rate Limit Usage

Enterprise Rate Limits

Next Steps