Non-Text Inputs (Images, PDFs, Audio)

Just like OpenAI, AnotherAI supports passing images and audio to the completion endpoint. AnotherAI also supports passing PDFs (although OpenAI models do not support processing PDFs, other models including ones from Gemini, Claude, and Mistral do).

Handling Images

Images can be passed using the image_url field, which accepts both public URLs and base64-encoded data.

const response = await openai.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "What's in this image?" },
        { type: "image_url", image_url: { url: "https://example.com/image.png" } }
      ]
    }
  ]
});

response = openai.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {"type": "image_url", "image_url": {"url": "https://example.com/image.png"}}
            ]
        }
    ]
)

Handling Audio Files

Audio can be passed using the input_audio field, which accepts both public URLs and base64-encoded data. When using a URL, simply pass it in the data field and the format parameter will be ignored.

const response = await openai.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [
    {
      role: "user",
      content: [{ type: "input_audio", input_audio: { 
        data: "https://example.com/audio.mp3", // just pass a URL in the data field
        format: "mp3" // when the data field is a URL, the format is ignored
    } }]
    }
  ]
});

Non-Text Inputs Usage in Templates

As described in the Input Variables section, it is possible to separate static instructions from dynamic data by using Jinja2 variables in the text content of messages.

For non-text inputs (images, audio, PDFs), template variables must be passed as their specific content type in a separate content field, not embedded in text strings. For example, for images, the template variable goes in the image_url field of the image content object:

completion = openai.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Please describe this image:"},
                {"type": "image_url", "image_url": {"url": "{{image_url}}"}}
            ]
        }
    ],
    extra_body={
        "input": {
            "image_url": "https://example.com/image.png"
        }
    }
)

const response = await openai.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "Please describe this image:" },
        { type: "image_url", image_url: { url: "{{image_url}}" } }
      ]
    }
  ],
  extra_body: {
    input: {
      image_url: "https://example.com/image.png",
    },
  },
});

Important Note: Attempting to use template variables for modalities directly in text strings will not work. The following approach is incorrect and will result in your AI Agent being unable to access the content properly.

Non-Text Inputs (Images, PDFs, Audio)

Handling Images

Handling Audio Files

Non-Text Inputs Usage in Templates

Example of Incorrect Format

On this page