Rate Limits

Rate limits are restrictions that our API imposes on the number of times a user or client can access our services within a specified period of time.

Why do we have rate limits?

Rate limits are a common practice for APIs, and they're put in place for a few different reasons:

They help protect against abuse or misuse of the API. For example, a malicious actor could flood the API with requests in an attempt to overload it or cause disruptions in service. By setting rate limits, RockAPI can prevent this kind of activity.
Rate limits help ensure that everyone has fair access to the API. If one person or organization makes an excessive number of requests, it could bog down the API for everyone else. By throttling the number of requests that a single user can make, RockAPI ensures that the most number of people have an opportunity to use the API without experiencing slowdowns.
Rate limits can help RockAPI manage the aggregate load on its infrastructure. If requests to the API increase dramatically, it could tax the servers and cause performance issues. By setting rate limits, RockAPI can help maintain a smooth and consistent experience for all users.

How do these rate limits work?

Our API rate limits are primarily measured in RPM (Requests Per Minute). The current limit is set at 100 RPM for standard accounts. If you exceed 100 requests per minute, you'll receive a 429 response code.

We understand that some users may require higher limits to meet their specific needs. If you find that your usage consistently approaches or exceeds the standard limit, we encourage you to reach out to our support team. We're here to help you scale your integration and ensure you have the capacity you need.

For inquiries about increasing your rate limits or to discuss custom solutions, please contact our support team at:

https://rockapi.ru/contact.html

Our team is committed to working with you to find the best solution for your use case while maintaining the overall health and performance of our API infrastructure.

Error Mitigation

What are some steps I can take to mitigate this?

The OpenAI Cookbook has a Python notebook that explains how to avoid rate limit errors, as well an example Python script for staying under rate limits while batch processing API requests.

You should also exercise caution when providing programmatic access, bulk processing features, and automated social media posting - consider only enabling these for trusted customers.

To protect against automated and high-volume misuse, set a usage limit for individual users within a specified time frame (daily, weekly, or monthly). Consider implementing a hard cap or a manual review process for users who exceed the limit.

Retrying with exponential backoff

One easy way to avoid rate limit errors is to automatically retry requests with a random exponential backoff. Retrying with exponential backoff means performing a short sleep when a rate limit error is hit, then retrying the unsuccessful request. If the request is still unsuccessful, the sleep length is increased and the process is repeated. This continues until the request is successful or until a maximum number of retries is reached. This approach has many benefits:

Automatic retries mean you can recover from rate limit errors without crashes or missing data.
Exponential backoff means that your first retries can be tried quickly, while still benefiting from longer delays if your first few retries fail.
Adding random jitter to the delay helps retries from all hitting at the same time.

Note that unsuccessful requests contribute to your per-minute limit, so continuously resending a request won’t work.

Below are a few example solutions for Python that use exponential backoff.

Example 1: Using the Tenacity library

Tenacity is an Apache 2.0 licensed general-purpose retrying library, written in Python, to simplify the task of adding retry behavior to just about anything. To add exponential backoff to your requests, you can use the tenacity.retry decorator. The below example uses the tenacity.wait_random_exponential function to add random exponential backoff to a request.

from openai import OpenAI
client = OpenAI(
    api_key='$ROCKAPI_API_KEY',
    base_url='https://api.rockapi.ru/openai/v1'
)

from tenacity import (
    retry,
    stop_after_attempt,
    wait_random_exponential,
)  # for exponential backoff
 
@retry(wait=wait_random_exponential(min=1, max=60), stop=stop_after_attempt(6))
def completion_with_backoff(**kwargs):
    return client.completions.create(**kwargs)

completion_with_backoff(model="gpt-4o-mini", prompt="Once upon a time,")

Note that the Tenacity library is a third-party tool, and OpenAI makes no guarantees about its reliability or security.

Example 2: Using the backoff library

Another python library that provides function decorators for backoff and retry is backoff:

import backoff 
import openai
from openai import OpenAI
client = OpenAI(
    api_key='$ROCKAPI_API_KEY',
    base_url='https://api.rockapi.ru/openai/v1'
)

@backoff.on_exception(backoff.expo, openai.RateLimitError)
def completions_with_backoff(**kwargs):
    return client.completions.create(**kwargs)
 
completions_with_backoff(model="gpt-4o-mini", prompt="Once upon a time,")

Like Tenacity, the backoff library is a third-party tool, and OpenAI makes no guarantees about its reliability or security.

Example 3: Manual backoff implementation

If you don't want to use third-party libraries, you can implement your own backoff logic following this example:

#imports

import random
import time
 
import openai
from openai import OpenAI
client = OpenAI(
    api_key='$ROCKAPI_API_KEY',
    base_url='https://api.rockapi.ru/openai/v1'
)
 
#define a retry decorator

def retry_with_exponential_backoff(
    func,
    initial_delay: float = 1,
    exponential_base: float = 2,
    jitter: bool = True,
    max_retries: int = 10,
    errors: tuple = (openai.RateLimitError,),
):    
    """Retry a function with exponential backoff."""
 
    def wrapper(*args, **kwargs):
        # Initialize variables
        num_retries = 0
        delay = initial_delay
 
        # Loop until a successful response or max_retries is hit or an exception is raised
        while True:
            try:
                return func(*args, **kwargs)
 
            # Retry on specific errors
            except errors as e:
                # Increment retries
                num_retries += 1
 
                # Check if max retries has been reached
                if num_retries > max_retries:
                    raise Exception(
                        f"Maximum number of retries ({max_retries}) exceeded."
                    )
 
                # Increment the delay
                delay *= exponential_base * (1 + jitter * random.random())
 
                # Sleep for the delay
                time.sleep(delay)
 
            # Raise exceptions for any errors not specified
            except Exception as e:
                raise e
 
    return wrapper

@retry_with_exponential_backoff
def completions_with_backoff(**kwargs):
    return client.completions.create(**kwargs)

Again, OpenAI makes no guarantees on the security or efficiency of this solution but it can be a good starting place for your own solution.

Reduce the max_tokens to match the size of your completions

Your rate limit is calculated as the maximum of max_tokens and the estimated number of tokens based on the character count of your request. Try to set the max_tokens value as close to your expected response size as possible.

Rate Limits

Why do we have rate limits?​

How do these rate limits work?​

Error Mitigation​

What are some steps I can take to mitigate this?​

Retrying with exponential backoff​

Example 1: Using the Tenacity library​

Example 2: Using the backoff library​

Example 3: Manual backoff implementation​

Reduce the max_tokens to match the size of your completions​