Reasoning Models

What is reasoning?

Reasoning mode is a capability available in certain AI models that allows them to engage in explicit step-by-step reasoning before providing their final answer. When reasoning mode is enabled, the model generates internal "thoughts" that show its reasoning process, problem-solving steps, and decision-making logic.

Reasoning mode can unlock better inference capabilities in complex use cases; however, it can add extra cost and latency, since the reasoning content is generated prior to the response and count towards the used tokens. It is important to consider the trade-off when enabling reasoning mode.

Configuration

All providers have a different way of configuring reasoning mode or returning the reasoning content:

OpenAI and xAI expose a reasoning effort parameter (low, medium, high).
Anthropic and Google allow providing a thinking budget, limiting the number of tokens used for thinking.
Fireworks does not support configuring reasoning mode

To reconcile differences between providers, AnotherAI converts back and forth between a reasoning effort and a reasoning budget (also called thinking budget).

Each reasoning effort level corresponds to a reasoning budget that allocates a specific percentage of the model's maximum output tokens.

Reasoning Effort	Maximum Token Budget
`disabled`	Disables reasoning when possible
`low`	20% of maximum output tokens
`medium`	50% of maximum output tokens
`high`	80% of maximum output tokens

In the inverse, the reasoning budget is converted to a reasoning effort:

Token Budget Range	Converted to Effort
`0`%	`disabled` (disables reasoning when possible)
Up to 20% of max tokens	`low`
20% - 50% of max tokens	`medium`
Above 50% of max tokens	`high`

Reasoning can be configured via the reasoning request parameter which is an object with the following fields:

budget: integer, the reasoning budget in tokens
effort: string, the reasoning effort, one of disabled, low, medium, high

{
  "reasoning": {
    "budget": 10000
  }
}

{
  "reasoning": {
    "effort": "medium"
  }
}

Either budget or effort can be provided, but not both.

As explained above, the way providers allow configuring reasoning is different. The same value can be sent differently to each provider. For example, given a reasoning budget of 50k tokens, AnotherAI will send:

a reasoning effort of medium if using o3, since o3 has max output tokens of 100k
a thinking budget of 50k if using claude 4 sonnet
nothing if using deepseek r1, since fireworks does not support configuring reasoning

OpenAI completion API exposes a reasoning_effort (low, medium, high) parameter. It is also supported by AnotherAI but does not allow configuring a granular thinking budget or disabling reasoning.

Usage

Completion API

As explained above, the reasoning effort can be passed as a parameter to the completion API. Thoughts can then be retrieved from the choice object via a AnotherAI specific field reasoning_content.

As the reasoning_content field is not part of the OpenAI API response, it will likely throw a typing issue when accessed.

For now, since AnotherAI relies on the OpenAI completion API which does not return the reasoning content, the reasoning content will not be available on OpenAI models.

res = openai.chat.completions.create(
  model="claude-4-sonnet",
  messages=[{"role": "user", "content": "What is the meaning of life?"}],
  extra_body={
    "reasoning": {
      "budget": 10000,
      # or "effort": "low",
    }
  }
)
# Access the reasoning content
print(res.choices[0].message.reasoning_content) # type: ignore
# Access the reasoning tokens
print(res.usage.completion_tokens_details.reasoning_tokens)

const res = await openai.chat.completions.create({
  model: "claude-4-sonnet",
  messages: [{ role: "user", content: "What is the meaning of life?" }],
  extra_body: {
    reasoning: {
      budget: 10000,
      // or "effort": "low",
    }
  }
});
// Access the reasoning content
// @ts-expect-error - reasoning_content is not part of the OpenAI API
console.log(res.choices[0].message.reasoning_content); 
// Access the reasoning tokens usage
console.log(res.usage.completion_tokens_details.reasoning_tokens);

{
  "model": "claude-4-sonnet",
  "messages": [
    {
      "role": "user", 
      "content": "What is the meaning of life?"
    }
  ],
  "reasoning": {
    "budget": 10000,
    // or "effort": "low",
  }
}

When streaming, the reasoning content deltas are also returned at the same level as the content field.

print(res.choices[0].delta.reasoning_content)
print(res.choices[0].delta.content)

console.log(res.choices[0].delta.reasoning_content); 
console.log(res.choices[0].delta.content);

Viewing reasoning models

The AnotherAI models endpoint exposes the parameter supports.reasoning.

{
  "data": [
    {
      "id": "claude-4-sonnet",
      ...,
      "supports": {
        "reasoning": true
      }
    },
    ...
  ]
}

It is also possible to filter for reasoning models via the reasoning query parameter.

models = openai.models.list(extra_query={"reasoning": True})
# The supports field is ignored by the OpenAI SDK so it is not accessible
print(models.data)

curl https://api.anotherai.dev/v1/models?reasoning=true