inspect_ai.model

Generation

get_model

Get an instance of a model.

Calls to get_model() are memoized (i.e. a call with the same arguments will return an existing instance of the model rather than creating a new one). You can disable this with memoize=False.

If you prefer to immediately close models after use (as well as prevent caching) you can employ the async context manager built in to the Model class. For example:

async with get_model("openai/gpt-4o") as model:
    response = await model.generate("Say hello")

In this case, the model client will be closed at the end of the context manager and will not be available in the get_model() cache.

def get_model(
    model: str | Model | None = None,
    *,
    role: str | None = None,
    default: str | Model | None = None,
    config: GenerateConfig = GenerateConfig(),
    base_url: str | None = None,
    api_key: str | None = None,
    memoize: bool = True,
    **model_args: Any,
) -> Model

model str | Model | None: Model specification. If Model is passed it is returned unmodified, if None is passed then the model currently being evaluated is returned (or if there is no evaluation then the model referred to by INSPECT_EVAL_MODEL).
role str | None: Optional named role for model (e.g. for roles specified at the task or eval level). Provide a default as a fallback in the case where the role hasn’t been externally specified.
default str | Model | None: Optional. Fallback model in case the specified model or role is not found.
config GenerateConfig: Configuration for model.
base_url str | None: Optional. Alternate base URL for model.
api_key str | None: Optional. API key for model.
memoize bool: Use/store a cached version of the model based on the parameters to get_model()
**model_args Any: Additional args to pass to model constructor.

Model

Model interface.

Use get_model() to get an instance of a model. Model provides an async context manager for closing the connection to it after use. For example:

async with get_model("openai/gpt-4o") as model:
    response = await model.generate("Say hello")

class Model

Attributes

api ModelAPI: Model API.
config GenerateConfig: Generation config.
name str: Model name.
role str | None: Model role.

Methods

__init__

Create a model.

def __init__(
    self,
    api: ModelAPI,
    config: GenerateConfig,
    model_args: dict[str, Any] | None = None,
) -> None

api ModelAPI: Model API provider.
config GenerateConfig: Model configuration.
model_args dict[str, Any] | None: Optional model args

canonical_name

Canonical model name for model info database lookup.

def canonical_name(self) -> str

generate

Generate output from the model.

async def generate(
    self,
    input: str | list[ChatMessage],
    tools: Sequence[Tool | ToolDef | ToolInfo | ToolSource] | ToolSource = [],
    tool_choice: ToolChoice | None = None,
    config: GenerateConfig = GenerateConfig(),
    cache: bool | CachePolicy | NotGiven = NOT_GIVEN,
) -> ModelOutput

input str | list[ChatMessage]: Chat message input (if a str is passed it is converted to a ChatMessageUser).
tools Sequence[Tool | ToolDef | ToolInfo | ToolSource] | ToolSource: Tools available for the model to call.
tool_choice ToolChoice | None: Directives to the model as to which tools to prefer.
config GenerateConfig: Model configuration.
cache bool | CachePolicy | NotGiven: Caching behavior for generate responses (defaults to no caching).

generate_loop

Generate output from the model, looping as long as the model calls tools.

Similar to generate(), but runs in a loop resolving model tool calls. The loop terminates when the model stops calling tools. The final ModelOutput as well the message list for the conversation are returned as a tuple.

async def generate_loop(
    self,
    input: str | list[ChatMessage],
    tools: Sequence[Tool | ToolDef | ToolSource] | ToolSource = [],
    config: GenerateConfig = GenerateConfig(),
    cache: bool | CachePolicy | NotGiven = NOT_GIVEN,
) -> tuple[list[ChatMessage], ModelOutput]

input str | list[ChatMessage]: Chat message input (if a str is passed it is converted to a ChatMessageUser).
tools Sequence[Tool | ToolDef | ToolSource] | ToolSource: Tools available for the model to call.
config GenerateConfig: Model configuration.
cache bool | CachePolicy | NotGiven: Caching behavior for generate responses (defaults to no caching).

count_tokens

Estimate token count for input.

async def count_tokens(self, input: str | list[ChatMessage]) -> int

input str | list[ChatMessage]: Input to count tokens for.

count_tool_tokens

Count tokens for tool definitions.

async def count_tool_tokens(self, tools: Sequence[ToolInfo]) -> int

tools Sequence[ToolInfo]: List of tool definitions.

GenerateConfig

Model generation options.

class GenerateConfig(BaseModel)

Attributes

max_retries int | None: Maximum number of times to retry request (defaults to unlimited).
timeout int | None: Timeout (in seconds) for an entire request (including retries).
attempt_timeout int | None: Timeout (in seconds) for any given attempt (if exceeded, will abandon attempt and retry according to max_retries).
max_connections int | None: Maximum number of concurrent connections to Model API (default is model specific).
system_message str | None: Override the default system message.
max_tokens int | None: The maximum number of tokens that can be generated in the completion (default is model specific).
top_p float | None: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass.
temperature float | None: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
stop_seqs list[str] | None: Sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
best_of int | None: Generates best_of completions server-side and returns the ‘best’ (the one with the highest log probability per token). vLLM only.
frequency_penalty float | None: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim. OpenAI, Google, Grok, Groq, vLLM, and SGLang only.
presence_penalty float | None: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics. OpenAI, Google, Grok, Groq, vLLM, and SGLang only.
logit_bias dict[int, float] | None: Map token Ids to an associated bias value from -100 to 100 (e.g. “42=10,43=-10”). OpenAI, Grok, Grok, and vLLM only.
seed int | None: Random seed. OpenAI, Google, Mistral, Groq, HuggingFace, and vLLM only.
top_k int | None: Randomly sample the next word from the top_k most likely next words. Anthropic, Google, HuggingFace, vLLM, and SGLang only.
num_choices int | None: How many chat completion choices to generate for each input message. OpenAI, Grok, Google, TogetherAI, vLLM, and SGLang only.
logprobs bool | None: Return log probabilities of the output tokens. OpenAI, Grok, TogetherAI, Huggingface, llama-cpp-python, vLLM, and SGLang only.
top_logprobs int | None: Number of most likely tokens (0-20) to return at each token position, each with an associated log probability. OpenAI, Grok, Huggingface, vLLM, and SGLang only.
parallel_tool_calls bool | None: Whether to enable parallel function calling during tool use (defaults to True). OpenAI and Groq only.
internal_tools bool | None: Whether to automatically map tools to model internal implementations (e.g. ‘computer’ for anthropic).
max_tool_output int | None: Maximum tool output (in bytes). Defaults to 16 * 1024.
cache_prompt Literal['auto'] | bool | None: Whether to cache the prompt prefix. Defaults to “auto”, which will enable caching for requests with tools. Anthropic only.
verbosity Literal['low', 'medium', 'high'] | None: Constrains the verbosity of the model’s response. Lower values will result in more concise responses, while higher values will result in more verbose responses. GPT 5.x models only (defaults to “medium” for OpenAI models).
effort Literal['low', 'medium', 'high'] | None: Control how many tokens are used for a response, trading off between response thoroughness and token efficiency. Anthropic Claude 4.5 Opus only.
reasoning_effort Literal['none', 'minimal', 'low', 'medium', 'high', 'xhigh'] | None: Constrains effort on reasoning. Defaults vary by provider and model and not all models support all values (please consult provider documentation for details).
reasoning_tokens int | None: Maximum number of tokens to use for reasoning. Anthropic Claude models only.
reasoning_summary Literal['none', 'concise', 'detailed', 'auto'] | None: Provide summary of reasoning steps (OpenAI reasoning models only). Use ‘auto’ to access the most detailed summarizer available for the current model (defaults to ‘auto’ if your organization is verified by OpenAI).
reasoning_history Literal['none', 'all', 'last', 'auto'] | None: Include reasoning in chat message history sent to generate.
response_schema ResponseSchema | None: Request a response format as JSONSchema (output should still be validated). OpenAI, Google, Mistral, vLLM, and SGLang only.
extra_body dict[str, Any] | None: Extra body to be sent with requests to OpenAI compatible servers. OpenAI, vLLM, and SGLang only.
cache bool | CachePolicy | None: Policy for caching of model generate output.
batch bool | int | BatchConfig | None: Use batching API when available. True to enable batching with default configuration, False to disable batching, a number to enable batching of the specified batch size, or a BatchConfig object specifying the batching configuration.

Methods

merge

Merge another model configuration into this one.

def merge(
    self, other: Union["GenerateConfig", GenerateConfigArgs]
) -> "GenerateConfig"

other Union[GenerateConfig, GenerateConfigArgs]: Configuration to merge.

GenerateConfigArgs

Type for kwargs that selectively override GenerateConfig.

class GenerateConfigArgs(TypedDict, total=False)

Attributes

max_retries int | None: Maximum number of times to retry request (defaults to unlimited).
timeout int | None: Request timeout (in seconds).
attempt_timeout int | None: Timeout (in seconds) for any given attempt (if exceeded, will abandon attempt and retry according to max_retries).
max_connections int | None: Maximum number of concurrent connections to Model API (default is model specific).
system_message str | None: Override the default system message.
max_tokens int | None: The maximum number of tokens that can be generated in the completion (default is model specific).
top_p float | None: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass.
temperature float | None: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
stop_seqs list[str] | None: Sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
best_of int | None: Generates best_of completions server-side and returns the ‘best’ (the one with the highest log probability per token). vLLM only.
frequency_penalty float | None: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim. OpenAI, Google, Grok, Groq, and vLLM only.
presence_penalty float | None: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics. OpenAI, Google, Grok, Groq, and vLLM only.
logit_bias dict[int, float] | None: Map token Ids to an associated bias value from -100 to 100 (e.g. “42=10,43=-10”). OpenAI and Grok only.
seed int | None: Random seed. OpenAI, Google, Mistral, Groq, HuggingFace, and vLLM only.
top_k int | None: Randomly sample the next word from the top_k most likely next words. Anthropic, Google, and HuggingFace only.
num_choices int | None: How many chat completion choices to generate for each input message. OpenAI, Grok, Google, and TogetherAI only.
logprobs bool | None: Return log probabilities of the output tokens. OpenAI, Google, Grok, TogetherAI, Huggingface, llama-cpp-python, and vLLM only.
top_logprobs int | None: Number of most likely tokens (0-20) to return at each token position, each with an associated log probability. OpenAI, Google, Grok, and Huggingface only.
parallel_tool_calls bool | None: Whether to enable parallel function calling during tool use (defaults to True). OpenAI and Groq only.
internal_tools bool | None: Whether to automatically map tools to model internal implementations (e.g. ‘computer’ for anthropic).
max_tool_output int | None: Maximum tool output (in bytes). Defaults to 16 * 1024.
cache_prompt Literal['auto'] | bool | None: Whether to cache the prompt prefix. Defaults to “auto”, which will enable caching for requests with tools. Anthropic only.
verbosity Literal['low', 'medium', 'high'] | None: Constrains the verbosity of the model’s response. Lower values will result in more concise responses, while higher values will result in more verbose responses. GPT 5.x models only (defaults to “medium” for OpenAI models).
effort Literal['low', 'medium', 'high'] | None: Control how many tokens are used for a response, trading off between response thoroughness and token efficiency. Anthropic Claude 4.5 Opus only.
reasoning_effort Literal['none', 'minimal', 'low', 'medium', 'high', 'xhigh'] | None: Constrains effort on reasoning. Defaults vary by provider and model and not all models support all values (please consult provider documentation for details).
reasoning_tokens int | None: Maximum number of tokens to use for reasoning. Anthropic Claude models only.
reasoning_summary Literal['none', 'concise', 'detailed', 'auto'] | None: Provide summary of reasoning steps (OpenAI reasoning models only). Use ‘auto’ to access the most detailed summarizer available for the current model (defaults to ‘auto’ if your organization is verified by OpenAI).
reasoning_history Literal['none', 'all', 'last', 'auto'] | None: Include reasoning in chat message history sent to generate.
response_schema ResponseSchema | None: Request a response format as JSONSchema (output should still be validated). OpenAI, Google, and Mistral only.
extra_body dict[str, Any] | None: Extra body to be sent with requests to OpenAI compatible servers. OpenAI, vLLM, and SGLang only.
cache bool | CachePolicy | None: Policy for caching of model generations.
batch bool | int | BatchConfig | None: Use batching API when available. True to enable batching with default configuration, False to disable batching, a number to enable batching of the specified batch size, or a BatchConfig object specifying the batching configuration.

GenerateFilter

Filter a model generation.

A filter may substitute for the default model generation by returning a ModelOutput, modify the input parameters by returning a GenerateInput, or return None to allow default processing to continue.

GenerateFilter: TypeAlias = Callable[
    [str, list[ChatMessage], list[ToolInfo], ToolChoice | None, GenerateConfig],
    Awaitable[ModelOutput | GenerateInput | None],
]

BatchConfig

Batch processing configuration.

class BatchConfig(BaseModel)

Attributes

size int | None

Target minimum number of requests to include in each batch. If not specified, uses default of 100. Batches may be smaller if the timeout is reached or if requests don’t fit within size limits.

max_size int | None

Maximum number of requests to include in each batch. If not specified, falls back to the provider-specific maximum batch size.

send_delay float | None

Maximum time (in seconds) to wait before sending a partially filled batch. If not specified, uses a default of 15 seconds. This prevents indefinite waiting when request volume is low.

tick float | None

Time interval (in seconds) between checking for new batch requests and batch completion status. If not specified, uses a default of 15 seconds.

When expecting a very large number of concurrent batches, consider increasing this value to reduce overhead from continuous polling since an http request must be made for each batch on each tick.

max_batches int | None

Maximum number of batches to have in flight at once for a provider (defaults to 100).

max_consecutive_check_failures int | None

Maximum number of consecutive check failures before failing a batch (defaults to 1000).

ResponseSchema

Schema for model response when using Structured Output.

class ResponseSchema(BaseModel)

Attributes

name str: The name of the response schema. Must be a-z, A-Z, 0-9, or contain underscores and dashes, with a maximum length of 64.
json_schema JSONSchema: The schema for the response format, described as a JSON Schema object.
description str | None: A description of what the response format is for, used by the model to determine how to respond in the format.
strict bool | None: Whether to enable strict schema adherence when generating the output. If set to true, the model will always follow the exact schema defined in the schema field. OpenAI and Mistral only.

ModelOutput

Output from model generation.

class ModelOutput(BaseModel)

Attributes

model str: Model used for generation.
choices list[ChatCompletionChoice]: Completion choices.
completion str: Model completion.
usage ModelUsage | None: Model token usage
time float | None: Time elapsed (in seconds) for call to generate.
metadata dict[str, Any] | None: Additional metadata associated with model output.
error str | None: Error message in the case of content moderation refusals.
stop_reason StopReason: First message stop reason.
message ChatMessageAssistant: First message choice.

Methods

from_message

Create ModelOutput from a ChatMessageAssistant.

@staticmethod
def from_message(
    message: ChatMessage,
    stop_reason: StopReason = "stop",
) -> "ModelOutput"

message ChatMessage: Assistant message.
stop_reason StopReason: Stop reason for generation

from_content

Create ModelOutput from a str or list[Content].

@staticmethod
def from_content(
    model: str,
    content: str | list[Content],
    stop_reason: StopReason = "stop",
    error: str | None = None,
) -> "ModelOutput"

model str: Model name.
content str | list[Content]: Text content from generation.
stop_reason StopReason: Stop reason for generation.
error str | None: Error message.

for_tool_call

Returns a ModelOutput for requesting a tool call.

@staticmethod
def for_tool_call(
    model: str,
    tool_name: str,
    tool_arguments: dict[str, Any],
    internal: JsonValue | None = None,
    tool_call_id: str | None = None,
    content: str | None = None,
) -> "ModelOutput"

model str: model name
tool_name str: The name of the tool.
tool_arguments dict[str, Any]: The arguments passed to the tool.
internal JsonValue | None: The model’s internal info for the tool (if any).
tool_call_id str | None: Optional ID for the tool call. Defaults to a random UUID.
content str | None: Optional content to include in the message. Defaults to “tool call for tool {tool_name}”.

ModelConfig

Model config.

class ModelConfig(BaseModel)

Attributes

model str: Model name.
config GenerateConfig: Generate config
base_url str | None: Model base url.
args dict[str, Any]: Model specific arguments.

ModelCall

Model call (raw request/response data).

class ModelCall(BaseModel)

Attributes

request dict[str, JsonValue]: Raw data posted to model.
response dict[str, JsonValue]: Raw response data from model.
time float | None: Time taken for underlying model call.

Methods

create

Create a ModelCall object.

Create a ModelCall from arbitrary request and response objects (they might be dataclasses, Pydandic objects, dicts, etc.). Converts all values to JSON serialiable (exluding those that can’t be)

@staticmethod
def create(
    request: Any,
    response: Any,
    filter: ModelCallFilter | None = None,
    time: float | None = None,
) -> "ModelCall"

request Any: Request object (dict, dataclass, BaseModel, etc.)
response Any: Response object (dict, dataclass, BaseModel, etc.)
filter ModelCallFilter | None: Function for filtering model call data.
time float | None: Time taken for underlying ModelCall

ModelConversation

Model conversation.

class ModelConversation(Protocol)

Attributes

messages list[ChatMessage]: Conversation history.
output ModelOutput: Model output.

ModelUsage

Token usage for completion.

class ModelUsage(BaseModel)

Attributes

input_tokens int: Total input tokens used.
output_tokens int: Total output tokens used.
total_tokens int: Total tokens used.
input_tokens_cache_write int | None: Number of tokens written to the cache.
input_tokens_cache_read int | None: Number of tokens retrieved from the cache.
reasoning_tokens int | None: Number of tokens used for reasoning.

StopReason

Reason that the model stopped or failed to generate.

StopReason = Literal[
    "stop",
    "max_tokens",
    "model_length",
    "tool_calls",
    "content_filter",
    "unknown",
]

ChatCompletionChoice

Choice generated for completion.

class ChatCompletionChoice(BaseModel)

Attributes

message ChatMessageAssistant: Assistant message.
stop_reason StopReason: Reason that the model stopped generating.
logprobs Logprobs | None: Logprobs.

Messages

ChatMessage

Message in a chat conversation

ChatMessage = Union[
    ChatMessageSystem, ChatMessageUser, ChatMessageAssistant, ChatMessageTool
]

ChatMessageBase

Base class for chat messages.

class ChatMessageBase(BaseModel)

Attributes

id str | None

Unique identifer for message.

content str | list[Content]

Content (simple string or list of content objects)

source Literal['input', 'generate'] | None

Source of message.

metadata dict[str, Any] | None

Additional message metadata.

text str

Get the text content of this message.

ChatMessage content is very general and can contain either a simple text value or a list of content parts (each of which can either be text or an image). Solvers (e.g. for prompt engineering) often need to interact with chat messages with the assumption that they are a simple string. The text property returns either the plain str content, or if the content is a list of text and images, the text items concatenated together (separated by newline)

Methods

metadata_as

Metadata as a Pydantic model.

def metadata_as(self, metadata_cls: Type[MT]) -> MT

metadata_cls Type[MT]: BaseModel derived class.

ChatMessageSystem

System chat message.

class ChatMessageSystem(ChatMessageBase)

Attributes

role Literal['system']: Conversation role.

ChatMessageUser

User chat message.

class ChatMessageUser(ChatMessageBase)

Attributes

role Literal['user']: Conversation role.
tool_call_id list[str] | None: ID(s) of tool call(s) this message has the content payload for.

ChatMessageAssistant

Assistant chat message.

class ChatMessageAssistant(ChatMessageBase)

Attributes

role Literal['assistant']: Conversation role.
tool_calls list[ToolCall] | None: Tool calls made by the model.
model str | None: Model used to generate assistant message.

ChatMessageTool

Tool chat message.

class ChatMessageTool(ChatMessageBase)

Attributes

role Literal['tool']: Conversation role.
tool_call_id str | None: ID of tool call.
function str | None: Name of function called.
error ToolCallError | None: Error which occurred during tool call.

trim_messages

Trim message list to fit within model context.

Trim the list of messages by: - Retaining all system messages. - Retaining the ‘input’ messages from the sample. - Preserving a proportion of the remaining messages (preserve=0.7 by default). - Ensuring that all assistant tool calls have corresponding tool messages. - Ensuring that the sequence of messages doesn’t end with an assistant message.

async def trim_messages(
    messages: list[ChatMessage], preserve: float = 0.7
) -> list[ChatMessage]

messages list[ChatMessage]: List of messages to trim.
preserve float: Ratio of converation messages to preserve (defaults to 0.7)

user_prompt

Get the last “user” message within a message history.

def user_prompt(messages: list[ChatMessage]) -> ChatMessageUser

messages list[ChatMessage]: Message history.

Content

Content

Content sent to or received from a model.

Content = Union[
    ContentText,
    ContentReasoning,
    ContentImage,
    ContentAudio,
    ContentVideo,
    ContentData,
    ContentToolUse,
    ContentDocument,
]

ContentText

Text content.

class ContentText(ContentBase)

Attributes

type Literal['text']: Type.
text str: Text content.
refusal bool | None: Was this a refusal message?
citations Sequence[Citation] | None: Citations supporting the text block.

ContentReasoning

Reasoning content.

See the specification for thinking blocks for Claude models.

class ContentReasoning(ContentBase)

Attributes

type Literal['reasoning']: Type.
reasoning str: Reasoning content.
summary str | None: Reasoning summary.
signature str | None: Signature for reasoning content (used by some models to ensure that reasoning content is not modified for replay)
redacted bool: Indicates that the explicit content of this reasoning block has been redacted.
text str: Pure text rendering of reasoning (used for replay/interop).

ContentImage

Image content.

class ContentImage(ContentBase)

Attributes

type Literal['image']

Type.

image str

Either a URL of the image or the base64 encoded image data.

detail Literal['auto', 'low', 'high']

Specifies the detail level of the image.

Currently only supported for OpenAI. Learn more in the Vision guide.

ContentAudio

Audio content.

class ContentAudio(ContentBase)

Attributes

type Literal['audio']: Type.
audio str: Audio file path or base64 encoded data URL.
format Literal['wav', 'mp3']: Format of audio data (‘mp3’ or ‘wav’)

ContentVideo

Video content.

class ContentVideo(ContentBase)

Attributes

type Literal['video']: Type.
video str: Video file path or base64 encoded data URL.
format Literal['mp4', 'mpeg', 'mov']: Format of video data (‘mp4’, ‘mpeg’, or ‘mov’)

ContentDocument

Document content (e.g. a PDF).

class ContentDocument(ContentBase)

Attributes

type Literal['document']: Type.
document str: Document file path or base64 encoded data URL.
filename str: Document filename (automatically determined from ‘document’ if not specified).
mime_type str: Document mime type (automatically determined from ‘document’ if not specified).

ContentData

Model internal.

class ContentData(ContentBase)

Attributes

type Literal['data']: Type.
data dict[str, JsonValue]: Model provider specific payload - required for internal content.

ContentToolUse

Server side tool use.

class ContentToolUse(ContentBase)

Attributes

type Literal['tool_use']: Type.
tool_type Literal['web_search', 'mcp_call', 'code_execution']: The type of the tool call.
id str: The unique ID of the tool call.
name str: Name of the tool.
context str | None: Tool context (e.g. MCP Server)
arguments str: Arguments passed to the tool.
result str: Result from the tool call.
error str | None: The error from the tool call (if any).

Citation

Citation

A citation sent to or received from a model.

Citation: TypeAlias = Annotated[
    Union[
        ContentCitation,
        DocumentCitation,
        UrlCitation,
    ],
    Discriminator("type"),
]

CitationBase

Base class for citations.

class CitationBase(BaseModel)

Attributes

cited_text str | tuple[int, int] | None

The cited text

This can be the text itself or a start/end range of the text content within the container that is the cited text.

title str | None

Title of the cited resource.

internal dict[str, JsonValue] | None

Model provider specific payload - typically used to aid transformation back to model types.

UrlCitation

A citation that refers to a URL.

class UrlCitation(CitationBase)

Attributes

type Literal['url']: Type.
url str: URL of the cited resource.

DocumentCitation

A citation that refers to a page range in a document.

class DocumentCitation(CitationBase)

Attributes

type Literal['document']: Type.
range DocumentRange | None: Range of the document that is cited.

ContentCitation

A generic content citation.

class ContentCitation(CitationBase)

Attributes

type Literal['content']: Type.

Tools

execute_tools

Perform tool calls in the last assistant message.

async def execute_tools(
    messages: list[ChatMessage],
    tools: Sequence[Tool | ToolDef | ToolSource] | ToolSource,
    max_output: int | None = None,
) -> ExecuteToolsResult

messages list[ChatMessage]: Current message list
tools Sequence[Tool | ToolDef | ToolSource] | ToolSource: Available tools
max_output int | None: Maximum output length (in bytes). Defaults to max_tool_output from active GenerateConfig (16 * 1024 by default).

ExecuteToolsResult

Result from executing tools in the last assistant message.

In conventional tool calling scenarios there will be only a list of ChatMessageTool appended and no-output. However, if there are handoff() tools (used in multi-agent systems) then other messages may be appended and an output may be available as well.

class ExecuteToolsResult(NamedTuple)

Attributes

messages list[ChatMessage]: Messages added to conversation.
output ModelOutput | None: Model output if a generation occurred within the conversation.

Compaction

compaction

Create a conversation compaction handler.

Call the Compact handler with the full conversation history before sending input to the model. Send the returned input and append the supplemental message returned (if any) to the full history.

See the Compaction for additional details on using compaction.

def compaction(
    strategy: CompactionStrategy,
    prefix: list[ChatMessage],
    tools: Sequence[Tool | ToolDef | ToolInfo | ToolSource] | ToolSource | None = None,
    model: str | Model | None = None,
) -> Compact

strategy CompactionStrategy: Compaction strategy (e.g. editing, trimming, summary, etc.)
prefix list[ChatMessage]: Chat messages to always preserve in compacted conversations.
tools Sequence[Tool | ToolDef | ToolInfo | ToolSource] | ToolSource | None: Tool definitions (included in token count as they consume context).
model str | Model | None: Target model for compacted input (defaults to active model).

Compact

Compact messages.

class Compact(Protocol):
    async def __call__(
        self, messages: list[ChatMessage]
    ) -> tuple[list[ChatMessage], ChatMessageUser | None]

messages list[ChatMessage]: Full message history

CompactionStrategy

Compaction strategy.

class CompactionStrategy(abc.ABC)

Methods

__init__

Compaction strategy.

def __init__(self, threshold: int | float = 0.9, memory: bool = True)

threshold int | float: Token count or percent of context window to trigger compaction.
memory bool: Warn the model to save critical content to memory prior to compaction when the memory tool is available.

compact

Compact messages.

@abc.abstractmethod
async def compact(
    self, messages: list[ChatMessage], model: Model
) -> tuple[list[ChatMessage], ChatMessageUser | None]

messages list[ChatMessage]: Full message history
model Model: Target model for compation.

CompactionEdit

Message editing compaction.

Compact messages by editing the history to remove tool call results and thinking blocks. Tool results receive placeholder to indicate they were removed.

class CompactionEdit(CompactionStrategy)

Methods

__init__

Message editing compaction.

def __init__(
    self,
    threshold: int | float = 0.9,
    memory: bool = True,
    keep_thinking_turns: Literal["all"] | int = 1,
    keep_tool_uses: int = 3,
    keep_tool_inputs: bool = True,
    exclude_tools: list[str] | None = None,
)

threshold int | float: Token count or percent of context window to trigger compaction.
memory bool: Warn the model to save critical content to memory prior to compaction when the memory tool is available.
keep_thinking_turns Literal['all'] | int: Defines how many recent assistant turns to preserve thinking blocks within. Specify N to keep the thinking blocks within the last N turns, or “all” to keep all thinking blocks. Defaults to 1. Note that some providers (e.g. google) do not support thinking compaction.
keep_tool_uses int: Defines how many recent tool use/result pairs to keep after clearing occurs. The oldest tool interactions are removed first, preserving the most recent ones. Tool output is replaced with placeholder text to let the model know that tool result was removed.
keep_tool_inputs bool: Controls whether the tool call parameters are cleared along with the tool results. By default, only the tool results are cleared while keeping the original tool calls visible. When False, both the tool call and result are removed entirely and replaced with a placeholder text.
exclude_tools list[str] | None: List of tool names whose tool uses and results should never be cleared. Useful for preserving important context.

compact

Compact messages by editing the history.

Removes tool call results and thinking blocks from older turns. Tool results receive placeholder to indicate they were removed.

@override
async def compact(
    self, messages: list[ChatMessage], model: Model
) -> tuple[list[ChatMessage], ChatMessageUser | None]

messages list[ChatMessage]: Full message history
model Model: Target model for compation.

CompactionSummary

Conversation summary compaction.

Compact messages by summarizing the conversation.

class CompactionSummary(CompactionStrategy)

Methods

__init__

Conversation summary compaction.

def __init__(
    self,
    *,
    threshold: int | float = 0.9,
    memory: bool = True,
    model: str | Model | None = None,
    prompt: str | None = None,
)

threshold int | float: Token count or percent of context window to trigger compaction.
memory bool: Warn the model to save critical content to memory prior to compaction when the memory tool is available.
model str | Model | None: Model to use for summarization (defaults to compaction target model).
prompt str | None: Prompt to use for summarization.

compact

Compact messages by summarizing the conversation.

@override
async def compact(
    self, messages: list[ChatMessage], model: Model
) -> tuple[list[ChatMessage], ChatMessageUser | None]

messages list[ChatMessage]: Full message history
model Model: Target model for compation.

CompactionTrim

Message trimming compaction.

Compact messages by trimming the history to preserve a percentage of messages: - Retain all system messages. - Retain the ‘input’ messages from the sample. - Preserve a proportion of the remaining messages (preserve=0.8 by default). - Ensure that all assistant tool calls have corresponding tool messages. - Ensure that the sequence of messages doesn’t end with an assistant message.

class CompactionTrim(CompactionStrategy)

Methods

__init__

Message trimming compaction.

def __init__(
    self,
    *,
    threshold: int | float = 0.9,
    memory: bool = True,
    preserve: float = 0.8,
)

threshold int | float: Token count or percent of context window to trigger compaction.
memory bool: Warn the model to save critical content to memory prior to compaction when the memory tool is available.
preserve float: Ratio of conversation messages to preserve (defaults to 0.8).

compact

Compact messages by trimming the history to preserve a percentage of messages.

@override
async def compact(
    self, messages: list[ChatMessage], model: Model
) -> tuple[list[ChatMessage], ChatMessageUser | None]

messages list[ChatMessage]: Full message history
model Model: Target model for compation.

Model Info

get_model_info

Get model information including context window, output tokens, etc.

Looks up model information from a local database. Supports standard Inspect model strings and performs case-insensitive matching.

This function first tries direct database lookup, which does not require provider SDKs to be installed. It only falls back to full provider instantiation if direct lookup fails.

def get_model_info(model: str | Model) -> ModelInfo | None

model str | Model: Model name or Model instance. Standard Inspect model strings are supported (e.g., “together/meta-llama/Llama-3.1-8B-Instruct”). The model is resolved and its canonical name is used for lookup.

Examples

from inspect_ai.model import get_model_info

info = get_model_info("together/meta-llama/Llama-3.1-8B-Instruct")
if info:
    print(f"Context window: {info.context_length}")

set_model_info

Set custom model information for models not in the database.

Use this to register model information for custom or private models that are not included in the built-in database.

def set_model_info(model: str, info: ModelInfo) -> None

model str: Model name to register (e.g., “my-provider/custom-model”)
info ModelInfo: ModelInfo object with context_length, output_tokens, etc.

Examples

from inspect_ai.model import set_model_info, ModelInfo

set_model_info(
    "my-provider/custom-model",
    ModelInfo(
        context_length=32000,
        output_tokens=4096,
        organization="My Organization"
    )
)

ModelInfo

Model information and metadata

class ModelInfo(BaseModel)

Attributes

organization str | None: Model organization (e.g. Anthropic, OpenAI).
model str | None: Model name (e.g. Gemini 2.5 Flash).
snapshot str | None: A snapshot (version) string, if available (e.g. “latest” or “20240229”)..
release_date UtcDate | None: The mode’s release date.
knowledge_cutoff_date UtcDate | None: The model’s knowledge cutoff date.
context_length int | None: The model’s context length in tokens.
output_tokens int | None: “The model’s maximum output tokens.
reasoning bool | None: Is this a reasoning model.

Logprobs

Logprob

Log probability for a token.

class Logprob(BaseModel)

Attributes

token str: The predicted token represented as a string.
logprob float: The log probability value of the model for the predicted token.
bytes list[int] | None: The predicted token represented as a byte array (a list of integers).
top_logprobs list[TopLogprob] | None: If the top_logprobs argument is greater than 0, this will contain an ordered list of the top K most likely tokens and their log probabilities.

Logprobs

Log probability information for a completion choice.

class Logprobs(BaseModel)

Attributes

content list[Logprob]: a (num_generated_tokens,) length list containing the individual log probabilities for each generated token.

TopLogprob

List of the most likely tokens and their log probability, at this token position.

class TopLogprob(BaseModel)

Attributes

token str: The top-kth token represented as a string.
logprob float: The log probability value of the model for the top-kth token.
bytes list[int] | None: The top-kth token represented as a byte array (a list of integers).

Caching

CachePolicy

Caching options for model generation.

class CachePolicy(BaseModel)

Attributes

expiry str | None: The expiry time for cache entries (Default “1W”). This is a string of the format “12h” for 12 hours or “1W” for a week, etc. This is how long we will keep the cache entry, if we access it after this point we’ll clear it. Setting to None will cache indefinitely.
per_epoch bool: Default True. By default we cache responses separately for different epochs. The general use case is that if there are multiple epochs, we should cache each response separately because scorers will aggregate across epochs. However, sometimes a response can be cached regardless of epoch if the call being made isn’t under test as part of the evaluation. If False, this option allows you to bypass that and cache independently of the epoch.
scopes dict[str, str]: A dictionary of additional metadata that should be included in the cache key. This allows for more fine-grained control over the cache key generation.

cache_size

Calculate the size of various cached directories and files

If neither subdirs nor files are provided, the entire cache directory will be calculated.

def cache_size(
    subdirs: list[str] = [], files: list[Path] = []
) -> list[tuple[str, int]]

subdirs list[str]: List of folders to filter by, which are generally model names. Empty directories will be ignored.
files list[Path]: List of files to filter by explicitly. Note that return value group these up by their parent directory

cache_clear

Clear the cache directory.

def cache_clear(model: str = "") -> bool

model str: Model to clear cache for.

cache_list_expired

Returns a list of all the cached files that have passed their expiry time.

def cache_list_expired(filter_by: list[str] = []) -> list[Path]

filter_by list[str]: Default []. List of model names to filter by. If an empty list, this will search the entire cache.

cache_prune

Delete all expired cache entries.

def cache_prune(files: list[Path] = []) -> None

files list[Path]: List of files to prune. If empty, this will search the entire cache.

cache_path

Path to cache directory.

def cache_path(model: str = "") -> Path

model str: Path to cache directory for specific model.

Conversion

messages_from_openai

Convert OpenAI Completions API messages into Inspect messages.

async def messages_from_openai(
    messages: "list[ChatCompletionMessageParam]",
    model: str | None = None,
) -> list[ChatMessage]

messages 'list[ChatCompletionMessageParam]': OpenAI Completions API Messages
model str | None: Optional model name to tag assistant messages with.

messages_from_openai_responses

Convert OpenAI Responses API messages into Inspect messages.

async def messages_from_openai_responses(
    messages: "list[ResponseInputItemParam]",
    model: str | None = None,
) -> list[ChatMessage]

messages 'list[ResponseInputItemParam]': OpenAI Responses API Messages
model str | None: Optional model name to tag assistant messages with.

messages_from_anthropic

Convert OpenAI Responses API messages into Inspect messages.

async def messages_from_anthropic(
    messages: "list[MessageParam]", system_message: str | None = None
) -> list[ChatMessage]

messages list[MessageParam]: OpenAI Responses API Messages
system_message str | None: System message accompanying messages (optional).

messages_from_google

Convert Google GenAI Content list into Inspect messages.

async def messages_from_google(
    contents: "Sequence[Content | ContentDict]",
    system_instruction: str | None = None,
    model: str | None = None,
) -> list[ChatMessage]

contents Sequence[Content | ContentDict]: Google GenAI Content objects or dicts that can be converted.
system_instruction str | None: Optional system instruction string.
model str | None: Optional model name to tag assistant messages with.

model_output_from_openai

Convert OpenAI ChatCompletion into Inspect ModelOutput

async def model_output_from_openai(
    completion: Union["ChatCompletion", dict[str, Any]],
) -> ModelOutput

completion Union['ChatCompletion', dict[str, Any]]: OpenAI ChatCompletion object or dict that can converted into one.

model_output_from_openai_responses

Convert OpenAI Response into Inspect ModelOutput

async def model_output_from_openai_responses(
    response: Union["Response", dict[str, Any]],
) -> ModelOutput

response Union['Response', dict[str, Any]]: OpenAI Response object or dict that can converted into one.

model_output_from_anthropic

Convert Anthropic Message response into Inspect ModelOutput

async def model_output_from_anthropic(
    message: Union["Message", dict[str, Any]],
) -> ModelOutput

message Union[Message, dict[str, Any]]: Anthropic Message object or dict that can converted into one.

model_output_from_google

Convert Google GenerateContentResponse into Inspect ModelOutput.

async def model_output_from_google(
    response: Union["GenerateContentResponse", dict[str, Any]],
    model: str | None = None,
) -> ModelOutput

response Union[GenerateContentResponse, dict[str, Any]]: Google GenerateContentResponse object or dict that can be converted.
model str | None: Optional model name override.

messages_to_openai

Convert messages to OpenAI Completions API compatible messages.

async def messages_to_openai(
    messages: list[ChatMessage],
    system_role: Literal["user", "system", "developer"] = "system",
) -> "list[ChatCompletionMessageParam]"

messages list[ChatMessage]: List of messages to convert
system_role Literal['user', 'system', 'developer']: Role to use for system messages (newer OpenAI models use “developer” rather than “system”).

Provider

modelapi

Decorator for registering model APIs.

def modelapi(name: str) -> Callable[..., type[ModelAPI]]

name str: Name of API

ModelAPI

Model API provider.

If you are implementing a custom ModelAPI provider your __init__() method will also receive a **model_args parameter that will carry any custom model_args (or -M arguments from the CLI) specified by the user. You can then pass these on to the approriate place in your model initialisation code (for example, here is what many of the built-in providers do with the model_args passed to them: https://inspect.aisi.org.uk/models.html#model-args)

class ModelAPI(abc.ABC)

Methods

__init__

Create a model API provider.

def __init__(
    self,
    model_name: str,
    base_url: str | None = None,
    api_key: str | None = None,
    api_key_vars: list[str] = [],
    config: GenerateConfig = GenerateConfig(),
) -> None

model_name str: Model name.
base_url str | None: Alternate base URL for model.
api_key str | None: API key for model.
api_key_vars list[str]: Environment variables that may contain keys for this provider (used for override)
config GenerateConfig: Model configuration.

initialize

Reinitialize the model API client.

This can be used to reinitialize the API keys.

def initialize(self) -> None

aclose

Async close method for closing any client allocated for the model.

async def aclose(self) -> None

close

Sync close method for closing any client allocated for the model.

def close(self) -> None

canonical_name

Canonical model name for querying results.

def canonical_name(self) -> str

generate

Generate output from the model.

@abc.abstractmethod
async def generate(
    self,
    input: list[ChatMessage],
    tools: list[ToolInfo],
    tool_choice: ToolChoice,
    config: GenerateConfig,
) -> ModelOutput | tuple[ModelOutput | Exception, ModelCall]

input list[ChatMessage]: Chat message input (if a str is passed it is converted to a ChatUserMessage).
tools list[ToolInfo]: Tools available for the model to call.
tool_choice ToolChoice: Directives to the model as to which tools to prefer.
config GenerateConfig: Model configuration.

count_tokens

Estimate token count for input.

This default implementation uses character-based heuristics for text and size-based estimates for media. Model providers can override count_text_tokens() and count_media_tokens() for more accurate results, or override this method entirely to use their native token counting APIs.

async def count_tokens(self, input: str | list[ChatMessage]) -> int

input str | list[ChatMessage]: Input to count tokens for.

count_text_tokens

Estimate tokens from text using tiktoken (o200k_base with 10% buffer).

Override this method to use model-specific tokenizers.

async def count_text_tokens(self, text: str) -> int

text str: Text to count.

count_media_tokens

Estimate tokens for media content (images, audio, video, documents).

For data URIs, estimates are based on decoded size. For URLs/file paths, uses conservative fixed fallbacks. Override this method for provider-specific media token calculations.

async def count_media_tokens(
    self, media: ContentImage | ContentAudio | ContentVideo | ContentDocument
) -> int

media ContentImage | ContentAudio | ContentVideo | ContentDocument: Media content to count tokens for.

max_tokens

Default max_tokens.

def max_tokens(self) -> int | None

max_tokens_for_config

Default max_tokens for a given config.

def max_tokens_for_config(self, config: GenerateConfig) -> int | None

config GenerateConfig: Generation config.

max_connections

Default max_connections.

def max_connections(self) -> int

connection_key

Scope for enforcement of max_connections.

def connection_key(self) -> str

should_retry

Should this exception be retried?

def should_retry(self, ex: Exception) -> bool

ex Exception: Exception to check for retry

is_auth_failure

Check if this exception indicates an authentication failure.

def is_auth_failure(self, ex: Exception) -> bool

ex Exception: Exception to check for authentication failure

collapse_user_messages

Collapse consecutive user messages into a single message.

def collapse_user_messages(self) -> bool

collapse_assistant_messages

Collapse consecutive assistant messages into a single message.

def collapse_assistant_messages(self) -> bool

tools_required

Any tool use in a message stream means that tools must be passed.

def tools_required(self) -> bool

supports_remote_mcp

Does this provider support remote execution of MCP tools?.

def supports_remote_mcp(self) -> bool

tool_result_images

Tool results can contain images

def tool_result_images(self) -> bool

disable_computer_screenshot_truncation

Some models do not support truncation of computer screenshots.

def disable_computer_screenshot_truncation(self) -> bool

emulate_reasoning_history

Chat message assistant messages with reasoning should playback reasoning with emulation (.e.g. tags)

def emulate_reasoning_history(self) -> bool

force_reasoning_history

Force a specific reasoning history behavior for this provider.

def force_reasoning_history(self) -> Literal["none", "all", "last"] | None

auto_reasoning_history

Behavior to use for reasoning_history=‘auto’

def auto_reasoning_history(self) -> Literal["none", "all", "last"]

compact_reasoning_history

Is reasoning history eligible for compation for this provider?

def compact_reasoning_history(self) -> bool