Guides

Timeout Handling

Strategies for managing and mitigating API request timeouts with A4F.

Understanding Timeouts

API request timeouts can occur for various reasons when interacting with LLMs. Understanding these can help you build more resilient applications. Common causes include:

Network Latency: Slow or unstable network connections between your client, A4F, and the underlying model provider.
Provider Overload: The specific LLM provider (e.g., OpenAI, Anthropic) might be experiencing high traffic or temporary issues, leading to slower response times.
Complex Prompts: Very long or computationally intensive prompts can take longer for the model to process.
Cold Starts: Less frequently used models might experience a "cold start" delay as the provider provisions resources.
A4F Processing: While A4F aims for minimal overhead (typically ~30ms), routing and other internal processing can contribute, though usually negligibly.

It's important to set appropriate timeout values in your client-side HTTP requests to avoid indefinitely waiting for a response.

Common Timeout Errors

When a timeout occurs, you might receive specific HTTP status codes. While A4F primarily proxies responses, these are common timeout-related codes:

408 Request Timeout: This indicates that the server (either A4F or an upstream provider) did not receive a complete request from the client within the time it was prepared to wait. This is less common for API calls if the request itself is sent quickly.
504 Gateway Timeout: This is more common. It means A4F, acting as a gateway, did not receive a timely response from the upstream LLM provider it was trying to reach on your behalf.

Additionally, your HTTP client library might raise its own timeout exceptions if a configured client-side timeout is exceeded before the server responds. For more general error handling, see our Errors documentation.

Client-Side Strategies

Connection vs. Read Timeouts

Most HTTP client libraries allow you to configure two types of timeouts:

Connection Timeout: The maximum time allowed to establish a connection with the server (A4F). A shorter timeout (e.g., 5-10 seconds) is usually sufficient.
Read/Response Timeout: The maximum time allowed to wait for data from the server after the connection has been established and the request sent. This should be longer, accounting for model processing time (e.g., 30-120 seconds, depending on the model and expected response length).

Setting these appropriately in your client is the first line of defense against hanging requests.

Retry Mechanisms

Implementing a retry mechanism can significantly improve the resilience of your application. Key considerations for retries:

Idempotency: Only retry requests that are idempotent (i.e., making the same request multiple times has the same effect as making it once). Most LLM generation requests are idempotent if you're not relying on state changed by a previous failed attempt.
Exponential Backoff: Instead of retrying immediately, wait for an increasing amount of time between retries (e.g., 1s, 2s, 4s, 8s). This helps to avoid overwhelming a temporarily struggling service.
Jitter: Add a small random delay to backoff times to prevent "thundering herd" problems where many clients retry simultaneously.
Max Retries: Limit the number of retries to avoid indefinite loops.
Retry on Specific Errors: Only retry on transient errors like timeouts (504) or temporary server issues (503 Service Unavailable). Do not retry on client errors (4xx, e.g., 401 Unauthorized, 400 Bad Request) without modification.

import time
import requests

MAX_RETRIES = 3
INITIAL_BACKOFF_SECONDS = 1

def make_request_with_retry(url, headers, payload):
    retries = 0
    backoff_seconds = INITIAL_BACKOFF_SECONDS
    while retries < MAX_RETRIES:
        try:
            # Example: 5s connect timeout, 30s read timeout
            response = requests.post(url, headers=headers, json=payload, timeout=(5, 30)) 
            response.raise_for_status() # Raises HTTPError for bad responses (4XX or 5XX)
            return response.json()
        except requests.exceptions.Timeout:
            print(f"Request timed out. Retrying in {backoff_seconds}s... (Attempt {retries + 1}/{MAX_RETRIES})")
        except requests.exceptions.RequestException as e:
            # Handle other request-related errors (e.g., connection error)
            print(f"Request failed: {e}. Retrying in {backoff_seconds}s... (Attempt {retries + 1}/{MAX_RETRIES})")
        
        time.sleep(backoff_seconds)
        retries += 1
        backoff_seconds *= 2 # Exponential backoff
    
    print("Max retries reached. Request failed.")
    return None

# Example Usage (ensure A4F_API_ENDPOINT, headers, data_payload are defined):
# A4F_API_ENDPOINT = "https://api.a4f.co/v1/chat/completions"
# headers = {"Authorization": "Bearer YOUR_A4F_KEY", "Content-Type": "application/json"}
# data_payload = {"model": "provider-X/some-model", "messages": [{"role": "user", "content": "Hello"}]}
# result = make_request_with_retry(A4F_API_ENDPOINT, headers, data_payload)
# if result:
#    print("Success:", result)

Streaming Considerations

When using streaming, timeouts behave slightly differently:

The initial response (headers) should arrive quickly.
The read timeout applies to the time between chunks of data. If the stream stalls for too long, a read timeout might occur.
Some providers send keep-alive pings (e.g., SSE comments) to prevent intermediate proxies or clients from closing the connection due to inactivity. A4F passes these through.

Client-Side Stream Handling

Ensure your client library for handling Server-Sent Events (SSE) has its own mechanisms for detecting stalled streams or handling disconnections gracefully. Long-running streams can be susceptible to network interruptions.

A4F Internal Timeouts

A4F may have its own internal timeouts when communicating with upstream providers to ensure overall system stability and prevent requests from hanging indefinitely if a provider is unresponsive. If A4F times out waiting for a provider, it will typically return a 504 Gateway Timeout error to your client.

The exact values for these internal timeouts are managed by A4F and optimized for performance and reliability. They are generally longer than typical client-side connection timeouts but shorter than extremely long read timeouts you might set for very slow models.

Was this page helpful?

Errors

Understand common API error codes from A4F.

Using OpenAI SDK

Guide to integrating A4F with OpenAI SDKs.

Documentation