Deployments
Why use deployments?
At heart, deployments are simply a way to manage some completion parameters that would traditionally be committed and deployed with the codebase. With the speed at which LLMs are evolving, re-deploying code any time a prompt needs to be adjusted or a model needs to be changed is not sustainable.
The flow is simple:
Create completions via the API or the playground mcp tool. AnotherAI will separate the completion parameters into the static components (aka Version) and the dynamic components (aka Input).
Once you are happy with a set of parameters, create a deployment via the MCP tool.
Update your code to point to the deployed version and remove the hardcoded parameters.
In other words, change:
const completion = await openai.chat.completions.create({
model: "gpt-4o",
messages: [
{
"role": "system",
// Using a template here instead of a string format to allow separating a static system message template and
// input variables.
"content": `You are an expert on {{ country }}. You are helping a customer traveling to {{ country }}. Answer questions in {{ language }}.`
},
{ role: "user", content: "Any customs I should be mindful about at the dinner table ?" },
],
temperature: 0.5,
// Input variables
input: {
country: "France",
language: "English",
}
agent_id: "travel-assistant"
});
to:
const completion = await openai.chat.completions.create({
// model and temperature are stored in the deployment
model: "anotherai/deployment/travel-assistant:production#1",
messages: [
// System message template is stored in the deployment
// User message is dynamic here so it is not stored in the deployment
{ role: "user", content: "Any customs I should be mindful about at the dinner table ?" },
],
input: {
country: "France",
language: "English",
}
// agent_id is stored in the deployment
});
Understanding versions and input
To understand deployments, it is important to understand how AnotherAI separates the static (Version) and dynamic (Input) portions of the completion call, and how AnotherAI re-creates the completion call from a Version and Input.
Separating version and input
The rules for separating the Version and Input are simple:
- All completion parameters (model, temperature, etc.) besides
messages
are part of the Version - All messages up to the last message containing a templated content is part of the Version (
version.prompt
) - If no message contains a templated content and if the first message is a system message, the first system message is part of the Version
The input contains the rest:
- the input variables
- the messages that are not part of the Version
For example, in the following code:
- The version contains the first system message but not the user message since the user message does not contain a templated content.
- The input contains the user message and the input variables.
const completion = await openai.chat.completions.create({
model: "gpt-4o",
messages: [
{
"role": "system",
// Using a template here instead of a string format to allow separating a static system message template and
// input variables.
"content": `You are an expert on {{ country }}. You are helping a customer traveling to {{ country }}. Answer questions in {{ language }}.`
},
{ role: "user", content: "Any customs I should be mindful about at the dinner table ?" },
],
temperature: 0.5,
// Input variables
input: {
country: "France",
language: "English",
}
agent_id: "travel-assistant"
});
Compiling a completion call from a version and input
Most providers API only accept a list of messages as input so AnotherAI needs to compile the completion call from the Version and Input.
Building the message list is done in two steps:
-
- The message templates that belong to the Version are rendered using the input variables.
-
- The messages that belong to the Input are added to the message list.
Response format and Variables schema
A Version also refers to a specific response format and set of variables. That's because both are usually tightly linked to the prompt itself:
- the system message often refers to specific fields in the response format
- if the prompt is templated, it directly refers to a set of input variables
Deployments
A deployment is a way to refer to a Version via an alias so that the underlying prompt and parameters can be edited without having to change the codebase.
Following with the example above, if we deploy the associated version with the alias travel-assistant/production#1
, we can use it in the codebase like this:
const completion = await openai.chat.completions.create({
model: "anotherai/deployment/travel-assistant:production#1",
messages: [
{ role: "user", content: "Any customs I should be mindful about at the dinner table ?" },
],
input: {
country: "France",
language: "English",
}
});
When AnotherAI receives the request, it retrieves the Version using the alias (travel-assistant/production#1
), follow the steps described above before sending the request to the provider.
Deployment (and Version) compatibility
As explained above, it is possible to change the Version that is targeted by a deployment. If there were no restrictions, on the replacement Version, it would be possible to break production code. For example, consider a Deployment A
that requires two variables name
and age
. If we were to replace it with a Deployment B
that requires three variables name
, age
and city
, it would not be possible to correctly render the template. The same example can occur can be done with a response format (aka output schema).
To prevent any issue, AnotherAI prevents replacing a deployment with a version that has an incompatible set of variables or response format. Incompatible means that the two object definitions (JSON schemas) have different properties or property types.
// Consider an agent that extracts the capital of a country given a city
const completion = await openai.chat.completions.create({
model: "gpt-4o",
messages: [
{ role: "user", content: "Give me the capital of the country of {{ city }} ?" },
],
input: {
city: "Toulouse"
}
response_format: {
type: "json_schema",
json_schema: {
name: "capital_response",
schema: {
type: "object",
properties: {
capital: { type: "string" }
}
}
}
}
});
// the following is compatible since it uses the same variable schema and response format
const completion = await openai.chat.completions.create({
model: "gpt-4.1", // different model
messages: [
// A new system message
{role: "system": content: "You are a geography expert."}
{ role: "user", content: "What is the capital closest to {{ city }} ?" },
],
temperature: 0, // a different temperature
input: {
city: "Toulouse"
}
response_format: {
type: "json_schema",
json_schema: {
name: "capital_response",
schema: {
type: "object",
properties: {
capital: {
type: "string",
// Different JSON Schema metadata are not considered as incompatible
description: "<city>, <country>",
examples: ["Paris, France", "London, United Kingdom"]
}
}
}
}
}
});
// the following is not compatible since it uses a different variable schema
const completion = await openai.chat.completions.create({
model: "gpt-4.1",
messages: [
// city -> country would be a breaking change in code
{ role: "user", content: "Give me the capital of {{ country }} ?" },
],
input: {
country: "France" // breaking change
}
...
});
That does not mean that the new code cannot be deployed. It should simply create a new deployment instead of replacing the existing one. You can create as many deployments as you want. We suggest a naming convention like <agent-id>/<environment>#<number>
.
Reconciliating code and deployments
The code allows targeting a deployment but still provide completion parameters. For example, one could write:
const completion = await openai.chat.completions.create({
model: "anotherai/deployment/travel-assistant:production#1",
input: {
country: "France"
}
temperature: 0.5, // temperature might be different from the deployment
});
In the above example, we need to decide which temperature should be used, the one from the deployment or the one from the code.
We believe that code should be the source of truth which means that in the above case the temperature should be the one from the code. The reconciliation between code and deployments follows the following rules:
- any provided completion parameter can override the corresponding deployment parameter
- if the override creates a version that is incompatible with the deployment an error is raised.
Consider a deployment travel-assistant/production#1
created with:
- model: "gpt-4o"
- temperature: 0.5
- variables:
country: string
// Accepted since the version is compatible with the deployment
const completion = await openai.chat.completions.create({
model: "anotherai/deployment/travel-assistant:production#1",
input: {
country: "France"
}
temperature: 1, // temperature 1 is used
tools: [...] // tools are used
});
// Rejected since the version is incompatible with the deployment
const completion = await openai.chat.completions.create({
model: "anotherai/deployment/travel-assistant:production#1",
input: {
country: "France"
}
response_format: {
type: "json_schema",
json_schema: ... // response format is incompatible with the deployment
}
});
Creating New vs. Updating Existing Deployments
When you want to deploy and use a new version of your agent, depending on the scope of the changes, you may be able to update an existing deployment instead of creating a new one.
Benefits of updating an existing deployment
Updating an existing deployment does not require any code changes. Because no code changes are required, updating an existing deployment is generally much faster than creating and releasing a new deployment.
To prevent unwanted deployments that could negatively impact your production environment, your coding agent will require you to confirm all deployment updates using the web app.
What updates are compatible with an existing deployment?
You can update an existing deployment if the new version is considered a non-breaking change.
Creating a New Deployment
Creating a new deployment is required when the changes you are making are considered breaking changes.
When a new deployment is created, you will need to update your code to point to the new deployment.
Example: Updating code for a new deployment
// Before - using old deployment
const completion = await openai.chat.completions.create({
model: "anotherai/deployment/travel-assistant:production#1",
input: {
country: "France"
}
});
// After - using new deployment with breaking changes
const completion = await openai.chat.completions.create({
model: "anotherai/travel-assistant/production#2", // new deployment
input: {
destination: "France", // variable renamed: country -> destination
traveler_type: "business" // new required variable added
}
});
How is this guide?