Skip to main content
The Claro API implements rate limiting to ensure fair usage and maintain service quality for all users. This guide explains how rate limits work and how to handle them in your application.

Rate Limit Tiers

Rate limits vary based on authentication status:
TierRequests per MinuteRequests per Hour
Unauthenticated1001,000
Authenticated1,00010,000
Enterprise customers can request higher rate limits. Contact [email protected] for custom limits.

Rate Limit Headers

Every API response includes headers indicating your current rate limit status:
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 847
X-RateLimit-Reset: 1705329600
HeaderDescription
X-RateLimit-LimitMaximum requests allowed in the current window
X-RateLimit-RemainingNumber of requests remaining in the current window
X-RateLimit-ResetUnix timestamp when the rate limit window resets

429 Too Many Requests

When you exceed the rate limit, the API returns a 429 Too Many Requests response:
{
  "error": {
    "code": "rate_limit_exceeded",
    "message": "Rate limit exceeded. Please retry after 60 seconds.",
    "retry_after": 60
  }
}
The response includes:
  • Retry-After header: Seconds until you can retry
  • retry_after field: Same value in the JSON response body

Handling Rate Limits

Automatic Retry with SDK

The Python SDK automatically handles rate limits with exponential backoff:
from baytos.claro import BaytClient

client = BaytClient(
    api_key="your_api_key",
    max_retries=3  # Automatically retry up to 3 times
)

# SDK handles retries automatically
prompt = client.get_prompt("@workspace/my-prompt:v1")
The SDK will:
  1. Wait for the duration specified in Retry-After header
  2. Retry with exponential backoff (1s, 2s, 4s, …)
  3. Throw BaytRateLimitError if all retries are exhausted

Manual Retry Logic

If calling the API directly, implement retry logic:

Python Example

import time
import requests

def get_prompt_with_retry(package_name, api_key, max_retries=3):
    url = f"https://api.baytos.ai/v1/prompts/{package_name}"
    headers = {"Authorization": f"Bearer {api_key}"}

    for attempt in range(max_retries):
        response = requests.get(url, headers=headers)

        if response.status_code == 200:
            return response.json()

        if response.status_code == 429:
            # Get retry delay from header
            retry_after = int(response.headers.get('Retry-After', 60))

            if attempt < max_retries - 1:
                print(f"Rate limited. Retrying after {retry_after}s...")
                time.sleep(retry_after)
                continue
            else:
                raise Exception("Rate limit exceeded after all retries")

        response.raise_for_status()

    raise Exception("Max retries exceeded")

JavaScript Example

async function getPromptWithRetry(packageName, apiKey, maxRetries = 3) {
  const url = `https://api.baytos.ai/v1/prompts/${packageName}`;
  const headers = {
    'Authorization': `Bearer ${apiKey}`,
    'Content-Type': 'application/json'
  };

  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const response = await fetch(url, { headers });

    if (response.ok) {
      return await response.json();
    }

    if (response.status === 429) {
      const retryAfter = parseInt(response.headers.get('Retry-After') || '60');

      if (attempt < maxRetries - 1) {
        console.log(`Rate limited. Retrying after ${retryAfter}s...`);
        await new Promise(resolve => setTimeout(resolve, retryAfter * 1000));
        continue;
      } else {
        throw new Error('Rate limit exceeded after all retries');
      }
    }

    throw new Error(`HTTP ${response.status}: ${response.statusText}`);
  }

  throw new Error('Max retries exceeded');
}

Best Practices

Always check X-RateLimit-Remaining to know when you’re approaching the limit:
response = requests.get(url, headers=headers)
remaining = int(response.headers.get('X-RateLimit-Remaining', 0))

if remaining < 10:
    print("Warning: Approaching rate limit")
Don’t retry immediately. Use exponential backoff to avoid hammering the API:
import time

def exponential_backoff(attempt, base_delay=1):
    return base_delay * (2 ** attempt)

for attempt in range(max_retries):
    try:
        return make_request()
    except RateLimitError:
        if attempt < max_retries - 1:
            delay = exponential_backoff(attempt)
            time.sleep(delay)
Instead of fetching prompts one at a time, use the list endpoint:
# Less efficient - multiple requests
for package_name in package_names:
    prompt = client.get_prompt(package_name)

# More efficient - single request
result = client.list_prompts(limit=50)
prompts = result['prompts']
Cache prompt data to reduce API calls:
import time
from functools import lru_cache

@lru_cache(maxsize=100)
def get_cached_prompt(package_name, cache_time):
    return client.get_prompt(package_name)

# Cache for 5 minutes
cache_key = int(time.time() / 300)
prompt = get_cached_prompt("@workspace/my-prompt:v1", cache_key)
If processing many prompts, add delays between requests:
import time

for package_name in package_names:
    prompt = client.get_prompt(package_name)
    process_prompt(prompt)

    # Small delay to avoid hitting rate limits
    time.sleep(0.1)  # 100ms delay
For production applications, implement server-side caching:
import redis
import json

cache = redis.Redis(host='localhost', port=6379, db=0)

def get_prompt_cached(package_name):
    # Check cache first
    cached = cache.get(f"prompt:{package_name}")
    if cached:
        return json.loads(cached)

    # Fetch from API if not cached
    prompt = client.get_prompt(package_name)

    # Cache for 5 minutes
    cache.setex(
        f"prompt:{package_name}",
        300,
        json.dumps(prompt.to_dict())
    )

    return prompt

Rate Limit Response Example

Checking Headers Before Hitting Limit

curl -i https://api.baytos.ai/v1/prompts \
  -H "Authorization: Bearer YOUR_API_KEY"
Response:
HTTP/2 200
Content-Type: application/json
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 847
X-RateLimit-Reset: 1705329600

{
  "data": {
    "prompts": [...]
  }
}

When Rate Limited

HTTP/2 429
Content-Type: application/json
Retry-After: 60
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1705329600

{
  "error": {
    "code": "rate_limit_exceeded",
    "message": "Rate limit exceeded. Please retry after 60 seconds.",
    "retry_after": 60
  }
}

Monitoring Rate Limit Usage

Track your rate limit usage in application logs:
import logging

logger = logging.getLogger(__name__)

def log_rate_limit_status(response):
    limit = response.headers.get('X-RateLimit-Limit')
    remaining = response.headers.get('X-RateLimit-Remaining')
    reset = response.headers.get('X-RateLimit-Reset')

    logger.info(
        f"Rate limit: {remaining}/{limit} remaining. "
        f"Resets at: {reset}"
    )

response = requests.get(url, headers=headers)
log_rate_limit_status(response)

Enterprise Rate Limits

Enterprise customers receive:
  • Higher default rate limits
  • Dedicated rate limit pools
  • Priority support for limit adjustments
  • Custom rate limit configurations per API key
Contact [email protected] to discuss enterprise options.

Next Steps