API Reference

POST

Responses

/v1/responses

The /v1/responses endpoint is an advanced API for interacting with models that support structured, multi-step, and agentic behaviors. Unlike the standard /v1/chat/completions endpoint which is designed for conversational interactions, /v1/responses provides a more powerful interface for tasks that may involve complex instructions, tool usage, and observable reasoning steps from the model.

This endpoint is ideal for building applications that require a model to "think" or follow a chain of thought before producing a final answer, making it suitable for complex problem-solving, data analysis, and agent-like functionalities.

When to Use This Endpoint

Use /v1/responses when you need observable reasoning steps, structured outputs, or when building agentic systems that require the model to "think through" complex problems before providing a final answer.

Supported Models

Only models specifically designed for the "responses" API type can be used with this endpoint. Using a model of type chat/completion or any other type will result in an error.

Model ID	Provider	Description	Available Plans
provider-5/gpt-5-codex	OpenAI	Advanced model with reasoning and multimodal support, ideal for complex tasks like coding and analysis.	ultra

Model Availability

The availability of this model is subject to your API key's plan and any specific whitelist/blacklist configurations. Check the Models page for the most up-to-date list.

Headers

Authorization

string

Required

Bearer token for authentication. Your A4F API key. Example: Bearer ddc-a4f-xxxxxxxx. See Authentication.

Content-Type

string

Required

The content type of the request body.

Default: application/json

Request Body

This endpoint expects a JSON object in the request body with the following fields:

model

string

Required

ID of the model to use. Must be a responses-compatible model, e.g., provider-5/gpt-5-codex. See Models.

input

string | array of objects

Required

The primary input for the model. Can be a single string for a direct prompt, or a list of ChatMessage objects to provide conversational context.

role

string

Required

The role of the message author. Must be one of 'system', 'user', 'assistant', or 'tool'.

content

string | array | null

Required

The content of the message. Can be a simple string or a list for multimodal content. Required for user, system, and tool roles. For assistant role, either content or tool_calls (or both) must be present.

name

string

An optional name for the participant in a multi-user chat.

tool_calls

array of objects

Used by assistant messages to specify tool calls the model wants to make.

tool_call_id

string

Required for tool role messages. The ID of the tool call this message is a response to.

instructions

string

System-level guidance for the model to follow throughout the generation process.

tools

array of objects

A list of tools the model may call. See the Tool Calling documentation for structure.

tool_choice

string | object

Controls which tool the model should use (e.g., "auto").

stream

boolean

If set, partial message deltas will be sent as server-sent events. Note: Streaming is not yet supported for this endpoint and will result in an error.

Default: false

store

boolean

Whether to store the response and its context for future reference.

previous_response_id

string

The ID of a previous response to continue the conversation from.

temperature

number

Sampling temperature (0-2). Higher values mean more randomness.

Default: 1

top_p

number

Nucleus sampling parameter (0-1).

max_tokens

integer

Maximum number of tokens to generate in the response.

text

object

Advanced settings for controlling structured output formats.

Streaming Not Yet Supported

As of the latest implementation, streaming (stream: true) is not yet supported for this endpoint and will result in an error.

Example Request & Response

Here's a complete example showing how the endpoint works:

Example Request:

{
  "model": "provider-5/gpt-5-codex",
  "input": [
    {
      "role": "user",
      "content": "Analyze the correlation between user engagement and feature adoption."
    }
  ],
  "instructions": "You are a data analyst. Provide step-by-step reasoning before your final answer.",
  "max_tokens": 1024,
  "temperature": 0.5
}

Example Response:

{
  "id": "resp_abc123def456",
  "object": "response",
  "created_at": 1729584251,
  "model": "provider-5/gpt-5-codex",
  "output": [
    {
      "id": "reasoning_1",
      "type": "reasoning",
      "content": [
        "Step 1: Acknowledge the user's request to find a correlation.",
        "Step 2: Formulate a plan to analyze the data.",
        "Step 3: Perform the correlation calculation.",
        "Step 4: Structure the final answer as a JSON object."
      ],
      "summary": [
        "User wants correlation analysis. I will outline steps and provide final JSON output."
      ]
    },
    {
      "id": "msg_xyz789",
      "type": "message",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "{\"correlation_factor\": 0.82, \"summary\": \"Strong positive correlation found...\"}",
          "annotations": [],
          "logprobs": []
        }
      ],
      "status": "completed"
    }
  ],
  "usage": {
    "prompt_tokens": 56,
    "completion_tokens": 121,
    "total_tokens": 177
  }
}

Response Body (200 OK)

A successful request returns a JSON object with a structured output array that separates the model's reasoning from its final message:

id

string

A unique identifier for the response, prefixed with resp_.

object

string

The object type, which is always "response".

created_at

integer

The Unix timestamp (in seconds) of when the response was created.

model

string

The model that generated the response.

output

array of objects

A list containing the structured output from the model, including reasoning and message items. Each item has a type field which can be either 'reasoning' or 'message'.

MessageItem Object (type: "message"):

id

string

Unique ID for the message item.

type

string

Always "message".

role

string

Always "assistant".

content

array of objects

List of OutputTextContent objects containing the actual text generated by the model.

status

string

Optional status for the message (e.g., 'completed').

ReasoningItem Object (type: "reasoning"):

id

string

Unique ID for the reasoning item.

type

string

Always "reasoning".

content

array

List of detailed reasoning steps or thoughts from the model.

summary

array

Summary of the reasoning process.

OutputTextContent Object:

type

string

Always "output_text".

text

string

The main text content of the message.

annotations

array

List of annotations related to the text (e.g., citations).

logprobs

array

Log probabilities for the generated tokens, if requested.

usage

object

Token usage statistics for the request.

prompt_tokens

integer

Number of tokens in the input prompt.

completion_tokens

integer

Number of tokens in the generated response.

total_tokens

integer

Total number of tokens used in the request.

Rate Limiting

This endpoint is rate-limited. Your current limits are returned in the following response headers on every successful request:

X-RateLimit-Limit-Minute: Your RPM (Requests Per Minute) limit
X-RateLimit-Remaining-Minute: Requests remaining in the current minute
X-RateLimit-Reset-Minute: The UTC timestamp when your minute limit will reset
X-RateLimit-Limit-Day: Your RPD (Requests Per Day) limit
X-RateLimit-Remaining-Day: Requests remaining in the current day
X-RateLimit-Reset-Day: The UTC timestamp when your daily limit will reset

Error Handling

The endpoint uses standard HTTP status codes to indicate the success or failure of a request:

400 Bad Request: The request payload is invalid. This can be due to malformed JSON, missing required fields like model or input, or using a model that is not of type responses.
401 Unauthorized: Your API key is missing, invalid, or expired.
403 Forbidden: Your API key is valid but does not have permission to perform the request. This can occur if the model is disabled by an administrator, not available for your current subscription plan, on your API key's blacklist, or your API key has a whitelist and the model is not on it.
404 Not Found: The requested model ID does not exist.
429 Too Many Requests: You have exceeded your rate limit (RPM or RPD).
500 Internal Server Error: An unexpected error occurred on the server.
501 Not Implemented: The provider for the requested model does not support the /v1/responses endpoint.

Key Differences from /v1/chat/completions

While both endpoints generate text, they are designed for different purposes:

Feature	/v1/chat/completions	/v1/responses
Primary Input	`messages`	`input`
System Prompt	ChatMessage with role: "system"	Dedicated `instructions` field
Primary Output	`choices`	`output`
Output Structure	Single message object	Sequence of reasoning + message items
Primary Use Case	Conversational AI, chatbots	Agentic workflows, chain-of-thought

Was this page helpful?

Chat Completion

API reference for the standard chat completions endpoint.

Image Generation

API reference for generating images from text prompts.

Documentation

Responses

When to Use This Endpoint

Supported Models

Model Availability

Headers

Authorization

Content-Type

Request Body

model

input

role

content

name

tool_calls

tool_call_id

instructions

tools

tool_choice

stream

store

previous_response_id

temperature

top_p

max_tokens

text

Streaming Not Yet Supported

Example Request & Response

Example Request:

Example Response:

Response Body (200 OK)

id

object

created_at

model

output

id

type

role

content

status

id

type

content

summary

type

text

annotations

logprobs

usage

prompt_tokens

completion_tokens

total_tokens

Rate Limiting

Error Handling

Key Differences from /v1/chat/completions