Reasoning Models
Enable reasoning on capable models and retrieve the reasoning content.
What is reasoning?
Reasoning mode is a capability available in certain AI models that allows them to engage in explicit step-by-step reasoning before providing their final answer. When reasoning mode is enabled, the model generates internal "thoughts" that show its reasoning process, problem-solving steps, and decision-making logic.
Reasoning mode can unlock better inference capabilities in complex use cases; however, it can add extra cost and latency, since the reasoning content is generated prior to the response and count towards the used tokens. It is important to consider the trade-off when enabling reasoning mode.
Configuration
All providers have a different way of configuring reasoning mode or returning the reasoning content:
- OpenAI and xAI expose a reasoning effort parameter (
low
,medium
,high
). - Anthropic and Google allow providing a thinking budget, limiting the number of tokens used for thinking.
- Fireworks does not support configuring reasoning mode
To reconcile differences between providers, AnotherAI converts back and forth between a reasoning effort and a reasoning budget (also called thinking budget).
Each reasoning effort level corresponds to a reasoning budget that allocates a specific percentage of the model's maximum output tokens.
Reasoning Effort | Maximum Token Budget |
---|---|
disabled | Disables reasoning when possible |
low | 20% of maximum output tokens |
medium | 50% of maximum output tokens |
high | 80% of maximum output tokens |
In the inverse, the reasoning budget is converted to a reasoning effort:
Token Budget Range | Converted to Effort |
---|---|
0 % | disabled (disables reasoning when possible) |
Up to 20% of max tokens | low |
20% - 50% of max tokens | medium |
Above 50% of max tokens | high |
Reasoning can be configured via the reasoning
request parameter which is an object with the following fields:
budget
: integer, the reasoning budget in tokenseffort
: string, the reasoning effort, one ofdisabled
,low
,medium
,high
{
"reasoning": {
"budget": 10000
}
}
{
"reasoning": {
"effort": "medium"
}
}
Either budget
or effort
can be provided, but not both.
As explained above, the way providers allow configuring reasoning is different. The same value can be sent differently to each provider. For example, given a reasoning budget of 50k tokens
, AnotherAI will send:
- a reasoning effort of
medium
if using o3, since o3 has max output tokens of 100k - a thinking budget of
50k
if using claude 4 sonnet - nothing if using deepseek r1, since fireworks does not support configuring reasoning
OpenAI completion API exposes a reasoning_effort
(low
, medium
, high
) parameter. It is also supported by AnotherAI but does not allow configuring a granular thinking budget or disabling reasoning.
Usage
Completion API
As explained above, the reasoning effort can be passed as a parameter to the completion API. Thoughts can then be retrieved from the choice object via a AnotherAI specific field reasoning_content
.
As the reasoning_content
field is not part of the OpenAI API response, it will likely throw a typing issue when accessed.
For now, since AnotherAI relies on the OpenAI completion API which does not return the reasoning content, the reasoning content will not be available on OpenAI models.
res = openai.chat.completions.create(
model="claude-4-sonnet",
messages=[{"role": "user", "content": "What is the meaning of life?"}],
extra_body={
"reasoning": {
"budget": 10000,
# or "effort": "low",
}
}
)
# Access the reasoning content
print(res.choices[0].message.reasoning_content) # type: ignore
# Access the reasoning tokens
print(res.usage.completion_tokens_details.reasoning_tokens)
const res = await openai.chat.completions.create({
model: "claude-4-sonnet",
messages: [{ role: "user", content: "What is the meaning of life?" }],
extra_body: {
reasoning: {
budget: 10000,
// or "effort": "low",
}
}
});
// Access the reasoning content
// @ts-expect-error - reasoning_content is not part of the OpenAI API
console.log(res.choices[0].message.reasoning_content);
// Access the reasoning tokens usage
console.log(res.usage.completion_tokens_details.reasoning_tokens);
{
"model": "claude-4-sonnet",
"messages": [
{
"role": "user",
"content": "What is the meaning of life?"
}
],
"reasoning": {
"budget": 10000,
// or "effort": "low",
}
}
When streaming, the reasoning content deltas are also returned at the same level as the content field.
print(res.choices[0].delta.reasoning_content)
print(res.choices[0].delta.content)
console.log(res.choices[0].delta.reasoning_content);
console.log(res.choices[0].delta.content);
Viewing reasoning models
The AnotherAI models endpoint exposes the parameter supports.reasoning
.
{
"data": [
{
"id": "claude-4-sonnet",
...,
"supports": {
"reasoning": true
}
},
...
]
}
It is also possible to filter for reasoning models via the reasoning
query parameter.
models = openai.models.list(extra_query={"reasoning": True})
# The supports field is ignored by the OpenAI SDK so it is not accessible
print(models.data)
curl https://api.anotherai.dev/v1/models?reasoning=true
How is this guide?