inspect_ai.model
Generation
get_model
Get an instance of a model.
Calls to get_model() are memoized (i.e. a call with the same arguments will return an existing instance of the model rather than creating a new one). You can disable this with memoize=False.
If you prefer to immediately close models after use (as well as prevent caching) you can employ the async context manager built in to the Model class. For example:
async with get_model("openai/gpt-4o") as model:
response = await model.generate("Say hello")In this case, the model client will be closed at the end of the context manager and will not be available in the get_model() cache.
def get_model(
model: str | Model | None = None,
*,
role: str | None = None,
default: str | Model | None = None,
config: GenerateConfig = GenerateConfig(),
base_url: str | None = None,
api_key: str | None = None,
memoize: bool = True,
**model_args: Any,
) -> Modelmodelstr | Model | None-
Model specification. If Model is passed it is returned unmodified, if
Noneis passed then the model currently being evaluated is returned (or if there is no evaluation then the model referred to byINSPECT_EVAL_MODEL). rolestr | None-
Optional named role for model (e.g. for roles specified at the task or eval level). Provide a
defaultas a fallback in the case where therolehasn’t been externally specified. defaultstr | Model | None-
Optional. Fallback model in case the specified
modelorroleis not found. configGenerateConfig-
Configuration for model.
base_urlstr | None-
Optional. Alternate base URL for model.
api_keystr | None-
Optional. API key for model.
memoizebool-
Use/store a cached version of the model based on the parameters to get_model()
**model_argsAny-
Additional args to pass to model constructor.
Model
Model interface.
Use get_model() to get an instance of a model. Model provides an async context manager for closing the connection to it after use. For example:
async with get_model("openai/gpt-4o") as model:
response = await model.generate("Say hello")class ModelAttributes
apiModelAPI-
Model API.
configGenerateConfig-
Generation config.
namestr-
Model name.
rolestr | None-
Model role.
Methods
- __init__
-
Create a model.
def __init__( self, api: ModelAPI, config: GenerateConfig, model_args: dict[str, Any] | None = None, ) -> NoneapiModelAPI-
Model API provider.
configGenerateConfig-
Model configuration.
model_argsdict[str, Any] | None-
Optional model args
- generate
-
Generate output from the model.
async def generate( self, input: str | list[ChatMessage], tools: Sequence[Tool | ToolDef | ToolInfo | ToolSource] | ToolSource = [], tool_choice: ToolChoice | None = None, config: GenerateConfig = GenerateConfig(), cache: bool | CachePolicy | NotGiven = NOT_GIVEN, ) -> ModelOutputinputstr | list[ChatMessage]-
Chat message input (if a
stris passed it is converted to a ChatMessageUser). toolsSequence[Tool | ToolDef | ToolInfo | ToolSource] | ToolSource-
Tools available for the model to call.
tool_choiceToolChoice | None-
Directives to the model as to which tools to prefer.
configGenerateConfig-
Model configuration.
cachebool | CachePolicy | NotGiven-
Caching behavior for generate responses (defaults to no caching).
- generate_loop
-
Generate output from the model, looping as long as the model calls tools.
Similar to generate(), but runs in a loop resolving model tool calls. The loop terminates when the model stops calling tools. The final ModelOutput as well the message list for the conversation are returned as a tuple.
async def generate_loop( self, input: str | list[ChatMessage], tools: Sequence[Tool | ToolDef | ToolSource] | ToolSource = [], config: GenerateConfig = GenerateConfig(), cache: bool | CachePolicy | NotGiven = NOT_GIVEN, ) -> tuple[list[ChatMessage], ModelOutput]inputstr | list[ChatMessage]-
Chat message input (if a
stris passed it is converted to a ChatMessageUser). toolsSequence[Tool | ToolDef | ToolSource] | ToolSource-
Tools available for the model to call.
configGenerateConfig-
Model configuration.
cachebool | CachePolicy | NotGiven-
Caching behavior for generate responses (defaults to no caching).
GenerateConfig
Model generation options.
class GenerateConfig(BaseModel)Attributes
max_retriesint | None-
Maximum number of times to retry request (defaults to unlimited).
timeoutint | None-
Timeout (in seconds) for an entire request (including retries).
attempt_timeoutint | None-
Timeout (in seconds) for any given attempt (if exceeded, will abandon attempt and retry according to max_retries).
max_connectionsint | None-
Maximum number of concurrent connections to Model API (default is model specific).
system_messagestr | None-
Override the default system message.
max_tokensint | None-
The maximum number of tokens that can be generated in the completion (default is model specific).
top_pfloat | None-
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass.
temperaturefloat | None-
What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
stop_seqslist[str] | None-
Sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
best_ofint | None-
Generates best_of completions server-side and returns the ‘best’ (the one with the highest log probability per token). vLLM only.
frequency_penaltyfloat | None-
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim. OpenAI, Google, Grok, Groq, vLLM, and SGLang only.
presence_penaltyfloat | None-
Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics. OpenAI, Google, Grok, Groq, vLLM, and SGLang only.
logit_biasdict[int, float] | None-
Map token Ids to an associated bias value from -100 to 100 (e.g. “42=10,43=-10”). OpenAI, Grok, Grok, and vLLM only.
seedint | None-
Random seed. OpenAI, Google, Mistral, Groq, HuggingFace, and vLLM only.
top_kint | None-
Randomly sample the next word from the top_k most likely next words. Anthropic, Google, HuggingFace, vLLM, and SGLang only.
num_choicesint | None-
How many chat completion choices to generate for each input message. OpenAI, Grok, Google, TogetherAI, vLLM, and SGLang only.
logprobsbool | None-
Return log probabilities of the output tokens. OpenAI, Grok, TogetherAI, Huggingface, llama-cpp-python, vLLM, and SGLang only.
top_logprobsint | None-
Number of most likely tokens (0-20) to return at each token position, each with an associated log probability. OpenAI, Grok, Huggingface, vLLM, and SGLang only.
parallel_tool_callsbool | None-
Whether to enable parallel function calling during tool use (defaults to True). OpenAI and Groq only.
internal_toolsbool | None-
Whether to automatically map tools to model internal implementations (e.g. ‘computer’ for anthropic).
max_tool_outputint | None-
Maximum tool output (in bytes). Defaults to 16 * 1024.
cache_promptLiteral['auto'] | bool | None-
Whether to cache the prompt prefix. Defaults to “auto”, which will enable caching for requests with tools. Anthropic only.
reasoning_effortLiteral['none', 'minimal', 'low', 'medium', 'high'] | None-
Constrains effort on reasoning. Defaults vary by provider and model and not all models support all values (please consult provider documentation for details).
reasoning_tokensint | None-
Maximum number of tokens to use for reasoning. Anthropic Claude models only.
reasoning_summaryLiteral['none', 'concise', 'detailed', 'auto'] | None-
Provide summary of reasoning steps (OpenAI reasoning models only). Use ‘auto’ to access the most detailed summarizer available for the current model (defaults to ‘auto’ if your organization is verified by OpenAI).
reasoning_historyLiteral['none', 'all', 'last', 'auto'] | None-
Include reasoning in chat message history sent to generate.
response_schemaResponseSchema | None-
Request a response format as JSONSchema (output should still be validated). OpenAI, Google, Mistral, vLLM, and SGLang only.
extra_bodydict[str, Any] | None-
Extra body to be sent with requests to OpenAI compatible servers. OpenAI, vLLM, and SGLang only.
cachebool | CachePolicy | None-
Policy for caching of model generate output.
batchbool | int | BatchConfig | None-
Use batching API when available. True to enable batching with default configuration, False to disable batching, a number to enable batching of the specified batch size, or a BatchConfig object specifying the batching configuration.
Methods
- merge
-
Merge another model configuration into this one.
def merge( self, other: Union["GenerateConfig", GenerateConfigArgs] ) -> "GenerateConfig"otherUnion[GenerateConfig, GenerateConfigArgs]-
Configuration to merge.
GenerateConfigArgs
Type for kwargs that selectively override GenerateConfig.
class GenerateConfigArgs(TypedDict, total=False)Attributes
max_retriesint | None-
Maximum number of times to retry request (defaults to unlimited).
timeoutint | None-
Request timeout (in seconds).
attempt_timeoutint | None-
Timeout (in seconds) for any given attempt (if exceeded, will abandon attempt and retry according to max_retries).
max_connectionsint | None-
Maximum number of concurrent connections to Model API (default is model specific).
system_messagestr | None-
Override the default system message.
max_tokensint | None-
The maximum number of tokens that can be generated in the completion (default is model specific).
top_pfloat | None-
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass.
temperaturefloat | None-
What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
stop_seqslist[str] | None-
Sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
best_ofint | None-
Generates best_of completions server-side and returns the ‘best’ (the one with the highest log probability per token). vLLM only.
frequency_penaltyfloat | None-
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim. OpenAI, Google, Grok, Groq, and vLLM only.
presence_penaltyfloat | None-
Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics. OpenAI, Google, Grok, Groq, and vLLM only.
logit_biasdict[int, float] | None-
Map token Ids to an associated bias value from -100 to 100 (e.g. “42=10,43=-10”). OpenAI and Grok only.
seedint | None-
Random seed. OpenAI, Google, Mistral, Groq, HuggingFace, and vLLM only.
top_kint | None-
Randomly sample the next word from the top_k most likely next words. Anthropic, Google, and HuggingFace only.
num_choicesint | None-
How many chat completion choices to generate for each input message. OpenAI, Grok, Google, and TogetherAI only.
logprobsbool | None-
Return log probabilities of the output tokens. OpenAI, Google, Grok, TogetherAI, Huggingface, llama-cpp-python, and vLLM only.
top_logprobsint | None-
Number of most likely tokens (0-20) to return at each token position, each with an associated log probability. OpenAI, Google, Grok, and Huggingface only.
parallel_tool_callsbool | None-
Whether to enable parallel function calling during tool use (defaults to True). OpenAI and Groq only.
internal_toolsbool | None-
Whether to automatically map tools to model internal implementations (e.g. ‘computer’ for anthropic).
max_tool_outputint | None-
Maximum tool output (in bytes). Defaults to 16 * 1024.
cache_promptLiteral['auto'] | bool | None-
Whether to cache the prompt prefix. Defaults to “auto”, which will enable caching for requests with tools. Anthropic only.
reasoning_effortLiteral['none', 'minimal', 'low', 'medium', 'high'] | None-
Constrains effort on reasoning. Defaults vary by provider and model and not all models support all values (please consult provider documentation for details).
reasoning_tokensint | None-
Maximum number of tokens to use for reasoning. Anthropic Claude models only.
reasoning_summaryLiteral['none', 'concise', 'detailed', 'auto'] | None-
Provide summary of reasoning steps (OpenAI reasoning models only). Use ‘auto’ to access the most detailed summarizer available for the current model (defaults to ‘auto’ if your organization is verified by OpenAI).
reasoning_historyLiteral['none', 'all', 'last', 'auto'] | None-
Include reasoning in chat message history sent to generate.
response_schemaResponseSchema | None-
Request a response format as JSONSchema (output should still be validated). OpenAI, Google, and Mistral only.
extra_bodydict[str, Any] | None-
Extra body to be sent with requests to OpenAI compatible servers. OpenAI, vLLM, and SGLang only.
cachebool | CachePolicy | None-
Policy for caching of model generations.
batchbool | int | BatchConfig | None-
Use batching API when available. True to enable batching with default configuration, False to disable batching, a number to enable batching of the specified batch size, or a BatchConfig object specifying the batching configuration.
GenerateFilter
Filter a model generation.
A filter may substitute for the default model generation by returning a ModelOutput, modify the input parameters by returning a GenerateInput, or return None to allow default processing to continue.
GenerateFilter: TypeAlias = Callable[
[str, list[ChatMessage], list[ToolInfo], ToolChoice | None, GenerateConfig],
Awaitable[ModelOutput | GenerateInput | None],
]BatchConfig
Batch processing configuration.
class BatchConfig(BaseModel)Attributes
sizeint | None-
Target minimum number of requests to include in each batch. If not specified, uses default of 100. Batches may be smaller if the timeout is reached or if requests don’t fit within size limits.
max_sizeint | None-
Maximum number of requests to include in each batch. If not specified, falls back to the provider-specific maximum batch size.
send_delayfloat | None-
Maximum time (in seconds) to wait before sending a partially filled batch. If not specified, uses a default of 15 seconds. This prevents indefinite waiting when request volume is low.
tickfloat | None-
Time interval (in seconds) between checking for new batch requests and batch completion status. If not specified, uses a default of 15 seconds.
When expecting a very large number of concurrent batches, consider increasing this value to reduce overhead from continuous polling since an http request must be made for each batch on each tick.
max_batchesint | None-
Maximum number of batches to have in flight at once for a provider (defaults to 100).
max_consecutive_check_failuresint | None-
Maximum number of consecutive check failures before failing a batch (defaults to 1000).
ResponseSchema
Schema for model response when using Structured Output.
class ResponseSchema(BaseModel)Attributes
namestr-
The name of the response schema. Must be a-z, A-Z, 0-9, or contain underscores and dashes, with a maximum length of 64.
json_schemaJSONSchema-
The schema for the response format, described as a JSON Schema object.
descriptionstr | None-
A description of what the response format is for, used by the model to determine how to respond in the format.
strictbool | None-
Whether to enable strict schema adherence when generating the output. If set to true, the model will always follow the exact schema defined in the schema field. OpenAI and Mistral only.
ModelOutput
Output from model generation.
class ModelOutput(BaseModel)Attributes
modelstr-
Model used for generation.
choiceslist[ChatCompletionChoice]-
Completion choices.
completionstr-
Model completion.
usageModelUsage | None-
Model token usage
timefloat | None-
Time elapsed (in seconds) for call to generate.
metadatadict[str, Any] | None-
Additional metadata associated with model output.
errorstr | None-
Error message in the case of content moderation refusals.
stop_reasonStopReason-
First message stop reason.
messageChatMessageAssistant-
First message choice.
Methods
- from_message
-
Create ModelOutput from a ChatMessageAssistant.
@staticmethod def from_message( message: ChatMessage, stop_reason: StopReason = "stop", ) -> "ModelOutput"messageChatMessage-
Assistant message.
stop_reasonStopReason-
Stop reason for generation
- from_content
-
Create ModelOutput from a
strorlist[Content].@staticmethod def from_content( model: str, content: str | list[Content], stop_reason: StopReason = "stop", error: str | None = None, ) -> "ModelOutput"modelstr-
Model name.
contentstr | list[Content]-
Text content from generation.
stop_reasonStopReason-
Stop reason for generation.
errorstr | None-
Error message.
- for_tool_call
-
Returns a ModelOutput for requesting a tool call.
@staticmethod def for_tool_call( model: str, tool_name: str, tool_arguments: dict[str, Any], internal: JsonValue | None = None, tool_call_id: str | None = None, content: str | None = None, ) -> "ModelOutput"modelstr-
model name
tool_namestr-
The name of the tool.
tool_argumentsdict[str, Any]-
The arguments passed to the tool.
internalJsonValue | None-
The model’s internal info for the tool (if any).
tool_call_idstr | None-
Optional ID for the tool call. Defaults to a random UUID.
contentstr | None-
Optional content to include in the message. Defaults to “tool call for tool {tool_name}”.
ModelConfig
Model config.
class ModelConfig(BaseModel)Attributes
modelstr-
Model name.
configGenerateConfig-
Generate config
base_urlstr | None-
Model base url.
argsdict[str, Any]-
Model specific arguments.
ModelCall
Model call (raw request/response data).
class ModelCall(BaseModel)Attributes
requestdict[str, JsonValue]-
Raw data posted to model.
responsedict[str, JsonValue]-
Raw response data from model.
timefloat | None-
Time taken for underlying model call.
Methods
- create
-
Create a ModelCall object.
Create a ModelCall from arbitrary request and response objects (they might be dataclasses, Pydandic objects, dicts, etc.). Converts all values to JSON serialiable (exluding those that can’t be)
@staticmethod def create( request: Any, response: Any, filter: ModelCallFilter | None = None, time: float | None = None, ) -> "ModelCall"requestAny-
Request object (dict, dataclass, BaseModel, etc.)
responseAny-
Response object (dict, dataclass, BaseModel, etc.)
filterModelCallFilter | None-
Function for filtering model call data.
timefloat | None-
Time taken for underlying ModelCall
ModelConversation
Model conversation.
class ModelConversation(Protocol)Attributes
messageslist[ChatMessage]-
Conversation history.
outputModelOutput-
Model output.
ModelUsage
Token usage for completion.
class ModelUsage(BaseModel)Attributes
input_tokensint-
Total input tokens used.
output_tokensint-
Total output tokens used.
total_tokensint-
Total tokens used.
input_tokens_cache_writeint | None-
Number of tokens written to the cache.
input_tokens_cache_readint | None-
Number of tokens retrieved from the cache.
reasoning_tokensint | None-
Number of tokens used for reasoning.
StopReason
Reason that the model stopped or failed to generate.
StopReason = Literal[
"stop",
"max_tokens",
"model_length",
"tool_calls",
"content_filter",
"unknown",
]ChatCompletionChoice
Choice generated for completion.
class ChatCompletionChoice(BaseModel)Attributes
messageChatMessageAssistant-
Assistant message.
stop_reasonStopReason-
Reason that the model stopped generating.
logprobsLogprobs | None-
Logprobs.
Messages
ChatMessage
Message in a chat conversation
ChatMessage = Union[
ChatMessageSystem, ChatMessageUser, ChatMessageAssistant, ChatMessageTool
]ChatMessageBase
Base class for chat messages.
class ChatMessageBase(BaseModel)Attributes
idstr | None-
Unique identifer for message.
contentstr | list[Content]-
Content (simple string or list of content objects)
sourceLiteral['input', 'generate'] | None-
Source of message.
metadatadict[str, Any] | None-
Additional message metadata.
textstr-
Get the text content of this message.
ChatMessage content is very general and can contain either a simple text value or a list of content parts (each of which can either be text or an image). Solvers (e.g. for prompt engineering) often need to interact with chat messages with the assumption that they are a simple string. The text property returns either the plain str content, or if the content is a list of text and images, the text items concatenated together (separated by newline)
Methods
- metadata_as
-
Metadata as a Pydantic model.
def metadata_as(self, metadata_cls: Type[MT]) -> MTmetadata_clsType[MT]-
BaseModel derived class.
ChatMessageSystem
System chat message.
class ChatMessageSystem(ChatMessageBase)Attributes
roleLiteral['system']-
Conversation role.
ChatMessageUser
User chat message.
class ChatMessageUser(ChatMessageBase)Attributes
roleLiteral['user']-
Conversation role.
tool_call_idlist[str] | None-
ID(s) of tool call(s) this message has the content payload for.
ChatMessageAssistant
Assistant chat message.
class ChatMessageAssistant(ChatMessageBase)Attributes
roleLiteral['assistant']-
Conversation role.
tool_callslist[ToolCall] | None-
Tool calls made by the model.
modelstr | None-
Model used to generate assistant message.
ChatMessageTool
Tool chat message.
class ChatMessageTool(ChatMessageBase)Attributes
roleLiteral['tool']-
Conversation role.
tool_call_idstr | None-
ID of tool call.
functionstr | None-
Name of function called.
errorToolCallError | None-
Error which occurred during tool call.
trim_messages
Trim message list to fit within model context.
Trim the list of messages by: - Retaining all system messages. - Retaining the ‘input’ messages from the sample. - Preserving a proportion of the remaining messages (preserve=0.7 by default). - Ensuring that all assistant tool calls have corresponding tool messages. - Ensuring that the sequence of messages doesn’t end with an assistant message.
async def trim_messages(
messages: list[ChatMessage], preserve: float = 0.7
) -> list[ChatMessage]messageslist[ChatMessage]-
List of messages to trim.
preservefloat-
Ratio of converation messages to preserve (defaults to 0.7)
user_prompt
Get the last “user” message within a message history.
def user_prompt(messages: list[ChatMessage]) -> ChatMessageUsermessageslist[ChatMessage]-
Message history.
Content
Content
Content sent to or received from a model.
Content = Union[
ContentText,
ContentReasoning,
ContentImage,
ContentAudio,
ContentVideo,
ContentData,
ContentToolUse,
ContentDocument,
]ContentText
Text content.
class ContentText(ContentBase)Attributes
typeLiteral['text']-
Type.
textstr-
Text content.
refusalbool | None-
Was this a refusal message?
citationsSequence[Citation] | None-
Citations supporting the text block.
ContentReasoning
Reasoning content.
See the specification for thinking blocks for Claude models.
class ContentReasoning(ContentBase)Attributes
typeLiteral['reasoning']-
Type.
reasoningstr-
Reasoning content.
summarystr | None-
Reasoning summary.
signaturestr | None-
Signature for reasoning content (used by some models to ensure that reasoning content is not modified for replay)
redactedbool-
Indicates that the explicit content of this reasoning block has been redacted.
ContentImage
Image content.
class ContentImage(ContentBase)Attributes
typeLiteral['image']-
Type.
imagestr-
Either a URL of the image or the base64 encoded image data.
detailLiteral['auto', 'low', 'high']-
Specifies the detail level of the image.
Currently only supported for OpenAI. Learn more in the Vision guide.
ContentAudio
Audio content.
class ContentAudio(ContentBase)Attributes
typeLiteral['audio']-
Type.
audiostr-
Audio file path or base64 encoded data URL.
formatLiteral['wav', 'mp3']-
Format of audio data (‘mp3’ or ‘wav’)
ContentVideo
Video content.
class ContentVideo(ContentBase)Attributes
typeLiteral['video']-
Type.
videostr-
Video file path or base64 encoded data URL.
formatLiteral['mp4', 'mpeg', 'mov']-
Format of video data (‘mp4’, ‘mpeg’, or ‘mov’)
ContentDocument
Document content (e.g. a PDF).
class ContentDocument(ContentBase)Attributes
typeLiteral['document']-
Type.
documentstr-
Document file path or base64 encoded data URL.
filenamestr-
Document filename (automatically determined from ‘document’ if not specified).
mime_typestr-
Document mime type (automatically determined from ‘document’ if not specified).
ContentData
Model internal.
class ContentData(ContentBase)Attributes
typeLiteral['data']-
Type.
datadict[str, JsonValue]-
Model provider specific payload - required for internal content.
ContentToolUse
Server side tool use.
class ContentToolUse(ContentBase)Attributes
typeLiteral['tool_use']-
Type.
tool_typeLiteral['web_search', 'mcp_call']-
The type of the tool call.
idstr-
The unique ID of the tool call.
namestr-
Name of the tool.
contextstr | None-
Tool context (e.g. MCP Server)
argumentsstr-
Arguments passed to the tool.
resultstr-
Result from the tool call.
errorstr | None-
The error from the tool call (if any).
Citation
Citation
A citation sent to or received from a model.
Citation: TypeAlias = Annotated[
Union[
ContentCitation,
DocumentCitation,
UrlCitation,
],
Discriminator("type"),
]CitationBase
Base class for citations.
class CitationBase(BaseModel)Attributes
cited_textstr | tuple[int, int] | None-
The cited text
This can be the text itself or a start/end range of the text content within the container that is the cited text.
titlestr | None-
Title of the cited resource.
internaldict[str, JsonValue] | None-
Model provider specific payload - typically used to aid transformation back to model types.
UrlCitation
A citation that refers to a URL.
class UrlCitation(CitationBase)Attributes
typeLiteral['url']-
Type.
urlstr-
URL of the cited resource.
DocumentCitation
A citation that refers to a page range in a document.
class DocumentCitation(CitationBase)Attributes
typeLiteral['document']-
Type.
rangeDocumentRange | None-
Range of the document that is cited.
ContentCitation
A generic content citation.
class ContentCitation(CitationBase)Attributes
typeLiteral['content']-
Type.
Tools
execute_tools
Perform tool calls in the last assistant message.
async def execute_tools(
messages: list[ChatMessage],
tools: Sequence[Tool | ToolDef | ToolSource] | ToolSource,
max_output: int | None = None,
) -> ExecuteToolsResultmessageslist[ChatMessage]-
Current message list
toolsSequence[Tool | ToolDef | ToolSource] | ToolSource-
Available tools
max_outputint | None-
Maximum output length (in bytes). Defaults to max_tool_output from active GenerateConfig (16 * 1024 by default).
ExecuteToolsResult
Result from executing tools in the last assistant message.
In conventional tool calling scenarios there will be only a list of ChatMessageTool appended and no-output. However, if there are handoff() tools (used in multi-agent systems) then other messages may be appended and an output may be available as well.
class ExecuteToolsResult(NamedTuple)Attributes
messageslist[ChatMessage]-
Messages added to conversation.
outputModelOutput | None-
Model output if a generation occurred within the conversation.
Logprobs
Logprob
Log probability for a token.
class Logprob(BaseModel)Attributes
tokenstr-
The predicted token represented as a string.
logprobfloat-
The log probability value of the model for the predicted token.
byteslist[int] | None-
The predicted token represented as a byte array (a list of integers).
top_logprobslist[TopLogprob] | None-
If the
top_logprobsargument is greater than 0, this will contain an ordered list of the top K most likely tokens and their log probabilities.
Logprobs
Log probability information for a completion choice.
class Logprobs(BaseModel)Attributes
contentlist[Logprob]-
a (num_generated_tokens,) length list containing the individual log probabilities for each generated token.
TopLogprob
List of the most likely tokens and their log probability, at this token position.
class TopLogprob(BaseModel)Attributes
tokenstr-
The top-kth token represented as a string.
logprobfloat-
The log probability value of the model for the top-kth token.
byteslist[int] | None-
The top-kth token represented as a byte array (a list of integers).
Caching
CachePolicy
Caching options for model generation.
class CachePolicy(BaseModel)Attributes
expirystr | None-
The expiry time for cache entries (Default “1W”). This is a string of the format “12h” for 12 hours or “1W” for a week, etc. This is how long we will keep the cache entry, if we access it after this point we’ll clear it. Setting to
Nonewill cache indefinitely. per_epochbool-
Default True. By default we cache responses separately for different epochs. The general use case is that if there are multiple epochs, we should cache each response separately because scorers will aggregate across epochs. However, sometimes a response can be cached regardless of epoch if the call being made isn’t under test as part of the evaluation. If False, this option allows you to bypass that and cache independently of the epoch.
scopesdict[str, str]-
A dictionary of additional metadata that should be included in the cache key. This allows for more fine-grained control over the cache key generation.
cache_size
Calculate the size of various cached directories and files
If neither subdirs nor files are provided, the entire cache directory will be calculated.
def cache_size(
subdirs: list[str] = [], files: list[Path] = []
) -> list[tuple[str, int]]subdirslist[str]-
List of folders to filter by, which are generally model names. Empty directories will be ignored.
fileslist[Path]-
List of files to filter by explicitly. Note that return value group these up by their parent directory
cache_clear
Clear the cache directory.
def cache_clear(model: str = "") -> boolmodelstr-
Model to clear cache for.
cache_list_expired
Returns a list of all the cached files that have passed their expiry time.
def cache_list_expired(filter_by: list[str] = []) -> list[Path]filter_bylist[str]-
Default []. List of model names to filter by. If an empty list, this will search the entire cache.
cache_prune
Delete all expired cache entries.
def cache_prune(files: list[Path] = []) -> Nonefileslist[Path]-
List of files to prune. If empty, this will search the entire cache.
cache_path
Path to cache directory.
def cache_path(model: str = "") -> Pathmodelstr-
Path to cache directory for specific model.
Conversion
messages_from_openai
Convert OpenAI Completions API messages into Inspect messages.
async def messages_from_openai(
messages: "list[ChatCompletionMessageParam]",
model: str | None = None,
) -> list[ChatMessage]messages'list[ChatCompletionMessageParam]'-
OpenAI Completions API Messages
modelstr | None-
Optional model name to tag assistant messages with.
messages_from_openai_responses
Convert OpenAI Responses API messages into Inspect messages.
async def messages_from_openai_responses(
messages: "list[ResponseInputItemParam]",
model: str | None = None,
) -> list[ChatMessage]messages'list[ResponseInputItemParam]'-
OpenAI Responses API Messages
modelstr | None-
Optional model name to tag assistant messages with.
messages_from_anthropic
Convert OpenAI Responses API messages into Inspect messages.
async def messages_from_anthropic(
messages: "list[MessageParam]", system_message: str | None = None
) -> list[ChatMessage]messageslist[MessageParam]-
OpenAI Responses API Messages
system_messagestr | None-
System message accompanying messages (optional).
messages_from_google
Convert Google GenAI Content list into Inspect messages.
async def messages_from_google(
contents: "Sequence[Content | ContentDict]",
system_instruction: str | None = None,
model: str | None = None,
) -> list[ChatMessage]contentsSequence[Content | ContentDict]-
Google GenAI Content objects or dicts that can be converted.
system_instructionstr | None-
Optional system instruction string.
modelstr | None-
Optional model name to tag assistant messages with.
model_output_from_openai
Convert OpenAI ChatCompletion into Inspect ModelOutput
async def model_output_from_openai(
completion: Union["ChatCompletion", dict[str, Any]],
) -> ModelOutputcompletionUnion['ChatCompletion', dict[str, Any]]-
OpenAI
ChatCompletionobject or dict that can converted into one.
model_output_from_openai_responses
Convert OpenAI Response into Inspect ModelOutput
async def model_output_from_openai_responses(
response: Union["Response", dict[str, Any]],
) -> ModelOutputresponseUnion['Response', dict[str, Any]]-
OpenAI
Responseobject or dict that can converted into one.
model_output_from_anthropic
Convert Anthropic Message response into Inspect ModelOutput
async def model_output_from_anthropic(
message: Union["Message", dict[str, Any]],
) -> ModelOutputmessageUnion[Message, dict[str, Any]]-
Anthropic
Messageobject or dict that can converted into one.
model_output_from_google
Convert Google GenerateContentResponse into Inspect ModelOutput.
async def model_output_from_google(
response: Union["GenerateContentResponse", dict[str, Any]],
model: str | None = None,
) -> ModelOutputresponseUnion[GenerateContentResponse, dict[str, Any]]-
Google GenerateContentResponse object or dict that can be converted.
modelstr | None-
Optional model name override.
messages_to_openai
Convert messages to OpenAI Completions API compatible messages.
async def messages_to_openai(
messages: list[ChatMessage],
system_role: Literal["user", "system", "developer"] = "system",
) -> "list[ChatCompletionMessageParam]"messageslist[ChatMessage]-
List of messages to convert
system_roleLiteral['user', 'system', 'developer']-
Role to use for system messages (newer OpenAI models use “developer” rather than “system”).
Provider
modelapi
Decorator for registering model APIs.
def modelapi(name: str) -> Callable[..., type[ModelAPI]]namestr-
Name of API
ModelAPI
Model API provider.
If you are implementing a custom ModelAPI provider your __init__() method will also receive a **model_args parameter that will carry any custom model_args (or -M arguments from the CLI) specified by the user. You can then pass these on to the approriate place in your model initialisation code (for example, here is what many of the built-in providers do with the model_args passed to them: https://inspect.aisi.org.uk/models.html#model-args)
class ModelAPI(abc.ABC)Methods
- __init__
-
Create a model API provider.
def __init__( self, model_name: str, base_url: str | None = None, api_key: str | None = None, api_key_vars: list[str] = [], config: GenerateConfig = GenerateConfig(), ) -> Nonemodel_namestr-
Model name.
base_urlstr | None-
Alternate base URL for model.
api_keystr | None-
API key for model.
api_key_varslist[str]-
Environment variables that may contain keys for this provider (used for override)
configGenerateConfig-
Model configuration.
- initialize
-
Reinitialize the model API client.
This can be used to reinitialize the API keys.
def initialize(self) -> None - aclose
-
Async close method for closing any client allocated for the model.
async def aclose(self) -> None - close
-
Sync close method for closing any client allocated for the model.
def close(self) -> None - canonical_name
-
Canonical model name for querying results.
def canonical_name(self) -> str - generate
-
Generate output from the model.
@abc.abstractmethod async def generate( self, input: list[ChatMessage], tools: list[ToolInfo], tool_choice: ToolChoice, config: GenerateConfig, ) -> ModelOutput | tuple[ModelOutput | Exception, ModelCall]inputlist[ChatMessage]-
Chat message input (if a
stris passed it is converted to aChatUserMessage). toolslist[ToolInfo]-
Tools available for the model to call.
tool_choiceToolChoice-
Directives to the model as to which tools to prefer.
configGenerateConfig-
Model configuration.
- max_tokens
-
Default max_tokens.
def max_tokens(self) -> int | None - max_tokens_for_config
-
Default max_tokens for a given config.
def max_tokens_for_config(self, config: GenerateConfig) -> int | NoneconfigGenerateConfig-
Generation config.
- max_connections
-
Default max_connections.
def max_connections(self) -> int - connection_key
-
Scope for enforcement of max_connections.
def connection_key(self) -> str - should_retry
-
Should this exception be retried?
def should_retry(self, ex: Exception) -> boolexException-
Exception to check for retry
- is_auth_failure
-
Check if this exception indicates an authentication failure.
def is_auth_failure(self, ex: Exception) -> boolexException-
Exception to check for authentication failure
- collapse_user_messages
-
Collapse consecutive user messages into a single message.
def collapse_user_messages(self) -> bool - collapse_assistant_messages
-
Collapse consecutive assistant messages into a single message.
def collapse_assistant_messages(self) -> bool - tools_required
-
Any tool use in a message stream means that tools must be passed.
def tools_required(self) -> bool - supports_remote_mcp
-
Does this provider support remote execution of MCP tools?.
def supports_remote_mcp(self) -> bool - tool_result_images
-
Tool results can contain images
def tool_result_images(self) -> bool - disable_computer_screenshot_truncation
-
Some models do not support truncation of computer screenshots.
def disable_computer_screenshot_truncation(self) -> bool - emulate_reasoning_history
-
Chat message assistant messages with reasoning should playback reasoning with emulation (.e.g.
tags) def emulate_reasoning_history(self) -> bool - force_reasoning_history
-
Force a specific reasoning history behavior for this provider.
def force_reasoning_history(self) -> Literal["none", "all", "last"] | None - auto_reasoning_history
-
Behavior to use for reasoning_history=‘auto’
def auto_reasoning_history(self) -> Literal["none", "all", "last"]