Documentation Index Fetch the complete documentation index at: https://docs.baytos.ai/llms.txt
Use this file to discover all available pages before exploring further.
The Claro API implements rate limiting to ensure fair usage and maintain service quality for all users. This guide explains how rate limits work and how to handle them in your application.
Rate Limit Tiers
Rate limits vary based on authentication status:
Tier Requests per Minute Requests per Hour Unauthenticated 100 1,000 Authenticated 1,000 10,000
Enterprise customers can request higher rate limits. Contact support@baytos.ai for custom limits.
Every API response includes headers indicating your current rate limit status:
X-RateLimit-Limit : 1000
X-RateLimit-Remaining : 847
X-RateLimit-Reset : 1705329600
Header Description X-RateLimit-LimitMaximum requests allowed in the current window X-RateLimit-RemainingNumber of requests remaining in the current window X-RateLimit-ResetUnix timestamp when the rate limit window resets
429 Too Many Requests
When you exceed the rate limit, the API returns a 429 Too Many Requests response:
{
"error" : {
"code" : "rate_limit_exceeded" ,
"message" : "Rate limit exceeded. Please retry after 60 seconds." ,
"retry_after" : 60
}
}
The response includes:
Retry-After header: Seconds until you can retry
retry_after field: Same value in the JSON response body
Handling Rate Limits
Automatic Retry with SDK
The Python SDK automatically handles rate limits with exponential backoff:
from baytos.claro import BaytClient
client = BaytClient(
api_key = "your_api_key" ,
max_retries = 3 # Automatically retry up to 3 times
)
# SDK handles retries automatically
prompt = client.get_prompt( "@workspace/my-prompt:v1" )
The SDK will:
Wait for the duration specified in Retry-After header
Retry with exponential backoff (1s, 2s, 4s, …)
Throw BaytRateLimitError if all retries are exhausted
Manual Retry Logic
If calling the API directly, implement retry logic:
Python Example
import time
import requests
def get_prompt_with_retry ( package_name , api_key , max_retries = 3 ):
url = f "https://api.baytos.ai/v1/prompts/ { package_name } "
headers = { "Authorization" : f "Bearer { api_key } " }
for attempt in range (max_retries):
response = requests.get(url, headers = headers)
if response.status_code == 200 :
return response.json()
if response.status_code == 429 :
# Get retry delay from header
retry_after = int (response.headers.get( 'Retry-After' , 60 ))
if attempt < max_retries - 1 :
print ( f "Rate limited. Retrying after { retry_after } s..." )
time.sleep(retry_after)
continue
else :
raise Exception ( "Rate limit exceeded after all retries" )
response.raise_for_status()
raise Exception ( "Max retries exceeded" )
JavaScript Example
async function getPromptWithRetry ( packageName , apiKey , maxRetries = 3 ) {
const url = `https://api.baytos.ai/v1/prompts/ ${ packageName } ` ;
const headers = {
'Authorization' : `Bearer ${ apiKey } ` ,
'Content-Type' : 'application/json'
};
for ( let attempt = 0 ; attempt < maxRetries ; attempt ++ ) {
const response = await fetch ( url , { headers });
if ( response . ok ) {
return await response . json ();
}
if ( response . status === 429 ) {
const retryAfter = parseInt ( response . headers . get ( 'Retry-After' ) || '60' );
if ( attempt < maxRetries - 1 ) {
console . log ( `Rate limited. Retrying after ${ retryAfter } s...` );
await new Promise ( resolve => setTimeout ( resolve , retryAfter * 1000 ));
continue ;
} else {
throw new Error ( 'Rate limit exceeded after all retries' );
}
}
throw new Error ( `HTTP ${ response . status } : ${ response . statusText } ` );
}
throw new Error ( 'Max retries exceeded' );
}
Best Practices
Monitor rate limit headers
Implement exponential backoff
Don’t retry immediately. Use exponential backoff to avoid hammering the API: import time
def exponential_backoff ( attempt , base_delay = 1 ):
return base_delay * ( 2 ** attempt)
for attempt in range (max_retries):
try :
return make_request()
except RateLimitError:
if attempt < max_retries - 1 :
delay = exponential_backoff(attempt)
time.sleep(delay)
Batch requests when possible
Instead of fetching prompts one at a time, use the list endpoint: # Less efficient - multiple requests
for package_name in package_names:
prompt = client.get_prompt(package_name)
# More efficient - single request
result = client.list_prompts( limit = 50 )
prompts = result[ 'prompts' ]
Cache prompt data to reduce API calls: import time
from functools import lru_cache
@lru_cache ( maxsize = 100 )
def get_cached_prompt ( package_name , cache_time ):
return client.get_prompt(package_name)
# Cache for 5 minutes
cache_key = int (time.time() / 300 )
prompt = get_cached_prompt( "@workspace/my-prompt:v1" , cache_key)
Spread requests over time
If processing many prompts, add delays between requests: import time
for package_name in package_names:
prompt = client.get_prompt(package_name)
process_prompt(prompt)
# Small delay to avoid hitting rate limits
time.sleep( 0.1 ) # 100ms delay
For production applications, implement server-side caching: import redis
import json
cache = redis.Redis( host = 'localhost' , port = 6379 , db = 0 )
def get_prompt_cached ( package_name ):
# Check cache first
cached = cache.get( f "prompt: { package_name } " )
if cached:
return json.loads(cached)
# Fetch from API if not cached
prompt = client.get_prompt(package_name)
# Cache for 5 minutes
cache.setex(
f "prompt: { package_name } " ,
300 ,
json.dumps(prompt.to_dict())
)
return prompt
Rate Limit Response Example
curl -i https://api.baytos.ai/v1/prompts \
-H "Authorization: Bearer YOUR_API_KEY"
Response:
HTTP/2 200
Content-Type : application/json
X-RateLimit-Limit : 1000
X-RateLimit-Remaining : 847
X-RateLimit-Reset : 1705329600
{
"data" : {
"prompts" : [ ... ]
}
}
When Rate Limited
HTTP/2 429
Content-Type : application/json
Retry-After : 60
X-RateLimit-Limit : 1000
X-RateLimit-Remaining : 0
X-RateLimit-Reset : 1705329600
{
"error" : {
"code" : "rate_limit_exceeded" ,
"message" : "Rate limit exceeded. Please retry after 60 seconds." ,
"retry_after" : 60
}
}
Monitoring Rate Limit Usage
Track your rate limit usage in application logs:
import logging
logger = logging.getLogger( __name__ )
def log_rate_limit_status ( response ):
limit = response.headers.get( 'X-RateLimit-Limit' )
remaining = response.headers.get( 'X-RateLimit-Remaining' )
reset = response.headers.get( 'X-RateLimit-Reset' )
logger.info(
f "Rate limit: { remaining } / { limit } remaining. "
f "Resets at: { reset } "
)
response = requests.get(url, headers = headers)
log_rate_limit_status(response)
Enterprise Rate Limits
Enterprise customers receive:
Higher default rate limits
Dedicated rate limit pools
Priority support for limit adjustments
Custom rate limit configurations per API key
Contact support@baytos.ai to discuss enterprise options.
Next Steps
Error Handling Learn how to handle all API errors
Python SDK Use the SDK with built-in retry logic
Best Practices Optimize your API usage
Authentication Learn about API authentication