Rate Limits

Wictz API implements rate limits to ensure fair usage and maintain system stability for all users. Understanding and respecting these limits is essential for building reliable applications. Your specific rate limits are determined by your subscription plan and can be viewed in your Wictz API dashboard.

Types of Limits

Wictz API employs several types of rate limits, which may apply at different levels (per API key, per user, per organization, per model, or globally):

Requests Per Minute (RPM): The maximum number of API requests you can make in a 60-second window.
Tokens Per Minute (TPM): The maximum number of tokens (sum of prompt and completion tokens) you can process in a 60-second window. This is often specific to certain models or providers.
Daily/Monthly Request Quotas: Some plans may have overall limits on the total number of requests allowed per day or per billing cycle.
Daily/Monthly Token Quotas: Similar to request quotas, but for the total number of tokens processed.
Concurrent Request Limits: The maximum number of API requests that can be active simultaneously.

Customizable Limits

Administrators within an organization may have the ability to further customize rate limits for individual users or API keys, as well as set specific limits for access to particular models. Always check your Wictz API dashboard for the most accurate and up-to-date information on your applicable rate limits.

Handling Rate Limit Errors (429)

When you exceed a rate limit, the API will respond with an HTTP 429 Too Many Requests status code. The response body will typically include a JSON object with more details about the specific limit that was hit.

Example 429 Error Response

{
  "error": {
    "message": "You have exceeded your RPM limit for model gpt-4. Limit: 20 RPM. Please try again after some time.",
    "type": "rate_limit_exceeded",
    "param": null,
    "code": "rpm_exceeded",
    "details": {
      "model": "gpt-4",
      "limit_type": "requests_per_minute",
      "limit_value": 20,
      "retry_after_seconds": 3 
    }
  }
}

The retry_after_seconds field (if present) can provide a hint as to how long you should wait before retrying the request.

Best Practices for Managing Rate Limits

Implement Exponential Backoff: When you receive a 429 error, wait before retrying. Increase the wait time exponentially for subsequent retries (e.g., 1s, 2s, 4s, 8s). Add some random jitter to the delay to avoid thundering herd problems. See the Error Handling page for an example.
Monitor Your Usage: Regularly check your usage and rate limit status in the Wictz API dashboard.
Optimize Token Usage: For TPM-limited models, make your prompts concise and, if possible, limit the max_tokens parameter for completions to reduce overall token consumption per request.
Distribute Requests: If possible, spread out your API calls over time rather than sending them in large bursts.
Client-Side Rate Limiting: Consider implementing rate limiting within your own application to prevent it from overwhelming the API.
Caching: Cache responses for frequently requested, non-dynamic content to reduce redundant API calls.
Plan Upgrades: If your application consistently requires higher limits, consider upgrading your subscription plan.

Rate Limit Headers

Wictz API may include the following headers in API responses to help you track your rate limit status. Support for these headers can vary.

X-RateLimit-Limit-Requests: The request limit for the current time window.
X-RateLimit-Remaining-Requests: The number of requests remaining in the current time window.
X-RateLimit-Reset-Requests: The time (in UTC epoch seconds or relative seconds) when the request limit resets.
(Similar headers may exist for token limits, e.g., X-RateLimit-Limit-Tokens, etc.)

Programmatically checking these headers can help your application adapt to rate limits proactively.

Global Rate Limits

In addition to plan-specific limits, Wictz API may enforce global rate limits to protect the platform from abuse or ensure overall stability. These are generally high and not hit by typical usage patterns. These might include IP-based limits or limits on specific, high-load endpoints.