Documentation

API Reference
POST

Responses

/v1/responses

The /v1/responses endpoint is an advanced API for interacting with models that support structured, multi-step, and agentic behaviors. Unlike the standard /v1/chat/completions endpoint which is designed for conversational interactions, /v1/responses provides a more powerful interface for tasks that may involve complex instructions, tool usage, and observable reasoning steps from the model.

This endpoint is ideal for building applications that require a model to "think" or follow a chain of thought before producing a final answer, making it suitable for complex problem-solving, data analysis, and agent-like functionalities.

Supported Models

Only models specifically designed for the "responses" API type can be used with this endpoint. Using a model of type chat/completion or any other type will result in an error.

Model IDProviderDescriptionAvailable Plans
provider-5/gpt-5-codexOpenAIAdvanced model with reasoning and multimodal support, ideal for complex tasks like coding and analysis.
ultra

Headers

Authorization

string
Required
Bearer token for authentication. Your A4F API key. Example: Bearer ddc-a4f-xxxxxxxx. See Authentication.

Content-Type

string
Required
The content type of the request body.

Default: application/json

Request Body

This endpoint expects a JSON object in the request body with the following fields:

model

string
Required
ID of the model to use. Must be a responses-compatible model, e.g., provider-5/gpt-5-codex. See Models.

input

string | array of objects
Required
The primary input for the model. Can be a single string for a direct prompt, or a list of ChatMessage objects to provide conversational context.

role

string
Required
The role of the message author. Must be one of 'system', 'user', 'assistant', or 'tool'.

content

string | array | null
Required
The content of the message. Can be a simple string or a list for multimodal content. Required for user, system, and tool roles. For assistant role, either content or tool_calls (or both) must be present.

name

string
An optional name for the participant in a multi-user chat.

tool_calls

array of objects
Used by assistant messages to specify tool calls the model wants to make.

tool_call_id

string
Required for tool role messages. The ID of the tool call this message is a response to.

instructions

string
System-level guidance for the model to follow throughout the generation process.

tools

array of objects
A list of tools the model may call. See the Tool Calling documentation for structure.

tool_choice

string | object
Controls which tool the model should use (e.g., "auto").

stream

boolean
If set, partial message deltas will be sent as server-sent events. Note: Streaming is not yet supported for this endpoint and will result in an error.

Default: false

store

boolean
Whether to store the response and its context for future reference.

previous_response_id

string
The ID of a previous response to continue the conversation from.

temperature

number
Sampling temperature (0-2). Higher values mean more randomness.

Default: 1

top_p

number
Nucleus sampling parameter (0-1).

max_tokens

integer
Maximum number of tokens to generate in the response.

text

object
Advanced settings for controlling structured output formats.

Example Request & Response

Here's a complete example showing how the endpoint works:

Example Request:

{
  "model": "provider-5/gpt-5-codex",
  "input": [
    {
      "role": "user",
      "content": "Analyze the correlation between user engagement and feature adoption."
    }
  ],
  "instructions": "You are a data analyst. Provide step-by-step reasoning before your final answer.",
  "max_tokens": 1024,
  "temperature": 0.5
}

Example Response:

{
  "id": "resp_abc123def456",
  "object": "response",
  "created_at": 1729584251,
  "model": "provider-5/gpt-5-codex",
  "output": [
    {
      "id": "reasoning_1",
      "type": "reasoning",
      "content": [
        "Step 1: Acknowledge the user's request to find a correlation.",
        "Step 2: Formulate a plan to analyze the data.",
        "Step 3: Perform the correlation calculation.",
        "Step 4: Structure the final answer as a JSON object."
      ],
      "summary": [
        "User wants correlation analysis. I will outline steps and provide final JSON output."
      ]
    },
    {
      "id": "msg_xyz789",
      "type": "message",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "{\"correlation_factor\": 0.82, \"summary\": \"Strong positive correlation found...\"}",
          "annotations": [],
          "logprobs": []
        }
      ],
      "status": "completed"
    }
  ],
  "usage": {
    "prompt_tokens": 56,
    "completion_tokens": 121,
    "total_tokens": 177
  }
}

Response Body (200 OK)

A successful request returns a JSON object with a structured output array that separates the model's reasoning from its final message:

id

string
A unique identifier for the response, prefixed with resp_.

object

string
The object type, which is always "response".

created_at

integer
The Unix timestamp (in seconds) of when the response was created.

model

string
The model that generated the response.

output

array of objects
A list containing the structured output from the model, including reasoning and message items. Each item has a type field which can be either 'reasoning' or 'message'.
MessageItem Object (type: "message"):

id

string
Unique ID for the message item.

type

string
Always "message".

role

string
Always "assistant".

content

array of objects
List of OutputTextContent objects containing the actual text generated by the model.

status

string
Optional status for the message (e.g., 'completed').
ReasoningItem Object (type: "reasoning"):

id

string
Unique ID for the reasoning item.

type

string
Always "reasoning".

content

array
List of detailed reasoning steps or thoughts from the model.

summary

array
Summary of the reasoning process.
OutputTextContent Object:

type

string
Always "output_text".

text

string
The main text content of the message.

annotations

array
List of annotations related to the text (e.g., citations).

logprobs

array
Log probabilities for the generated tokens, if requested.

usage

object
Token usage statistics for the request.

prompt_tokens

integer
Number of tokens in the input prompt.

completion_tokens

integer
Number of tokens in the generated response.

total_tokens

integer
Total number of tokens used in the request.

Rate Limiting

This endpoint is rate-limited. Your current limits are returned in the following response headers on every successful request:

  • X-RateLimit-Limit-Minute: Your RPM (Requests Per Minute) limit
  • X-RateLimit-Remaining-Minute: Requests remaining in the current minute
  • X-RateLimit-Reset-Minute: The UTC timestamp when your minute limit will reset
  • X-RateLimit-Limit-Day: Your RPD (Requests Per Day) limit
  • X-RateLimit-Remaining-Day: Requests remaining in the current day
  • X-RateLimit-Reset-Day: The UTC timestamp when your daily limit will reset

Error Handling

The endpoint uses standard HTTP status codes to indicate the success or failure of a request:

  • 400 Bad Request: The request payload is invalid. This can be due to malformed JSON, missing required fields like model or input, or using a model that is not of type responses.
  • 401 Unauthorized: Your API key is missing, invalid, or expired.
  • 403 Forbidden: Your API key is valid but does not have permission to perform the request. This can occur if the model is disabled by an administrator, not available for your current subscription plan, on your API key's blacklist, or your API key has a whitelist and the model is not on it.
  • 404 Not Found: The requested model ID does not exist.
  • 429 Too Many Requests: You have exceeded your rate limit (RPM or RPD).
  • 500 Internal Server Error: An unexpected error occurred on the server.
  • 501 Not Implemented: The provider for the requested model does not support the /v1/responses endpoint.

Key Differences from /v1/chat/completions

While both endpoints generate text, they are designed for different purposes:

Feature/v1/chat/completions/v1/responses
Primary Inputmessagesinput
System PromptChatMessage with role: "system"Dedicated instructions field
Primary Outputchoicesoutput
Output StructureSingle message objectSequence of reasoning + message items
Primary Use CaseConversational AI, chatbotsAgentic workflows, chain-of-thought

Was this page helpful?