inspect_ai.agent
Agents
react
Extensible ReAct agent based on the paper ReAct: Synergizing Reasoning and Acting in Language Models.
Provide a name and description for the agent if you plan on using it in a multi-agent system (this is so other agents can clearly identify its name and purpose). These fields are not required when using react() as a top-level solver.
The agent runs a tool use loop until the model submits an answer using the submit() tool. Use instructions to tailor the agent’s system message (the default instructions provides a basic ReAct prompt).
Use the attempts option to enable additional submissions if the initial submission(s) are incorrect (by default, no additional attempts are permitted).
By default, the model will be urged to continue if it fails to call a tool. Customise this behavior using the on_continue option.
@agent
def react(
*,
name: str | None = None,
description: str | None = None,
prompt: str | AgentPrompt | None = AgentPrompt(),
tools: Sequence[Tool | ToolDef | ToolSource] | None = None,
model: str | Model | Agent | None = None,
attempts: int | AgentAttempts = 1,
submit: AgentSubmit | bool | None = None,
on_continue: str | AgentContinue | None = None,
truncation: Literal["auto", "disabled"] | MessageFilter = "disabled",
) -> Agentnamestr | None-
Agent name (required when using with handoff() or as_tool())
descriptionstr | None-
Agent description (required when using with handoff() or as_tool())
promptstr | AgentPrompt | None-
Prompt for agent. Includes agent-specific contextual
instructionsas well as an optionalassistant_promptandhandoff_prompt(for agents that use handoffs). both are provided by default but can be removed or customized). Passstrto specify the instructions and use the defaults for handoff and prompt messages. toolsSequence[Tool | ToolDef | ToolSource] | None-
Tools available for the agent.
modelstr | Model | Agent | None-
Model to use for agent (defaults to currently evaluated model).
attemptsint | AgentAttempts-
Configure agent to make multiple attempts.
submitAgentSubmit | bool | None-
Use a submit tool for reporting the final answer. Defaults to
Truewhich uses the default submit behavior. Pass an AgentSubmit to customize the behavior or passFalseto disable the submit tool. on_continuestr | AgentContinue | None-
Message to play back to the model to urge it to continue when it stops calling tools. Use the placeholder {submit} to refer to the submit tool within the message. Alternatively, an async function to call to determine whether the loop should continue and what message to play back. Note that this function is called on every iteration of the loop so if you only want to send a message back when the model fails to call tools you need to code that behavior explicitly.
truncationLiteral['auto', 'disabled'] | MessageFilter-
Truncate the conversation history in the event of a context window overflow. Defaults to “disabled” which does no truncation. Pass “auto” to use trim_messages() to reduce the context size. Pass a MessageFilter function to do custom truncation.
human_cli
Human CLI agent for tasks that run in a sandbox.
The Human CLI agent installs agent task tools in the default sandbox and presents the user with both task instructions and documentation for the various tools (e.g. task submit, task start, task stop task instructions, etc.). A human agent panel is displayed with instructions for logging in to the sandbox.
If the user is running in VS Code with the Inspect extension, they will also be presented with links to login to the sandbox using a VS Code Window or Terminal.
@agent
def human_cli(
answer: bool | str = True,
intermediate_scoring: bool = False,
record_session: bool = True,
user: str | None = None,
) -> Agentanswerbool | str-
Is an explicit answer required for this task or is it scored based on files in the container? Pass a
strwith a regex to validate that the answer matches the expected format. intermediate_scoringbool-
Allow the human agent to check their score while working.
record_sessionbool-
Record all user commands and outputs in the sandbox bash session.
userstr | None-
User to login as. Defaults to the sandbox environment’s default user.
Execution
handoff
Create a tool that enables models to handoff to agents.
def handoff(
agent: Agent,
description: str | None = None,
input_filter: MessageFilter | None = None,
output_filter: MessageFilter | None = content_only,
tool_name: str | None = None,
limits: list[Limit] = [],
**agent_kwargs: Any,
) -> ToolagentAgent-
Agent to hand off to.
descriptionstr | None-
Handoff tool description (defaults to agent description)
input_filterMessageFilter | None-
Filter to modify the message history before calling the tool. Use the built-in
remove_toolsfilter to remove all tool calls. Alternatively specify another MessageFilter function or list of MessageFilter functions. output_filterMessageFilter | None-
Filter to modify the message history after calling the tool. Defaults to content_only(), which produces a history that should be safe to read by other models (tool calls are converted to text, and both system messages and reasoning blocks are removed). Alternatively specify another MessageFilter function or list of MessageFilter functions.
tool_namestr | None-
Alternate tool name (defaults to
transfer_to_{agent_name}) limitslist[Limit]-
List of limits to apply to the agent. Limits are scoped to each handoff to the agent. Should a limit be exceeded, the agent stops and a user message is appended explaining that a limit was exceeded.
**agent_kwargsAny-
Arguments to curry to Agent function (arguments provided here will not be presented to the model as part of the tool interface).
run
Run an agent.
The input messages(s) will be copied prior to running so are not modified in place.
async def run(
agent: Agent,
input: str | list[ChatMessage] | AgentState,
limits: list[Limit] | None = None,
*,
name: str | None = None,
**agent_kwargs: Any,
) -> AgentState | tuple[AgentState, LimitExceededError | None]agentAgent-
Agent to run.
inputstr | list[ChatMessage] | AgentState-
Agent input (string, list of messages, or an AgentState).
limitslist[Limit] | None-
List of limits to apply to the agent. Should one of these limits be exceeded, the LimitExceededError is caught and returned.
namestr | None-
Optional display name for the transcript entry. If not provided, the agent’s name as defined in the registry will be used.
**agent_kwargsAny-
Additional arguments to pass to agent.
as_tool
Convert an agent to a tool.
By default the model will see all of the agent’s arguments as tool arguments (save for state which is converted to an input arguments of type str). Provide optional agent_kwargs to mask out agent parameters with default values (these parameters will not be presented to the model as part of the tool interface)
@tool
def as_tool(
agent: Agent,
description: str | None = None,
limits: list[Limit] = [],
**agent_kwargs: Any,
) -> ToolagentAgent-
Agent to convert.
descriptionstr | None-
Tool description (defaults to agent description)
limitslist[Limit]-
List of limits to apply to the agent. Should a limit be exceeded, the tool call ends and returns an error explaining that a limit was exceeded.
**agent_kwargsAny-
Arguments to curry to Agent function (arguments provided here will not be presented to the model as part of the tool interface).
as_solver
Convert an agent to a solver.
Note that agents used as solvers will only receive their first parameter (state). Any other parameters must provide appropriate defaults or be explicitly specified in agent_kwargs
def as_solver(agent: Agent, limits: list[Limit] = [], **agent_kwargs: Any) -> SolverBridging
agent_bridge
Agent bridge.
Provide Inspect integration for 3rd party agents that use the the OpenAI Completions API, OpenAI Responses API, or Anthropic API. The bridge patches the OpenAI and Anthropic client libraries to redirect any model named “inspect” (or prefaced with “inspect/” for non-default models) into the Inspect model API.
See the Agent Bridge documentation for additional details.
@contextlib.asynccontextmanager
async def agent_bridge(
state: AgentState | None = None,
*,
filter: GenerateFilter | None = None,
retry_refusals: int | None = None,
web_search: WebSearchProviders | None = None,
) -> AsyncGenerator[AgentBridge, None]stateAgentState | None-
Initial state for agent bridge. Used as a basis for yielding an updated state based on traffic over the bridge.
filterGenerateFilter | None-
Filter for bridge model generation.
retry_refusalsint | None-
Should refusals be retried? (pass number of times to retry)
web_searchWebSearchProviders | None-
Configuration for mapping model internal web_search tools to Inspect. By default, will map to the internal provider of the target model (supported for OpenAI, Anthropic, Gemini, Grok, and Perplexity). Pass an alternate configuration to use to use an external provider like Tavili or Exa for models that don’t support internal search.
AgentBridge
Agent bridge.
class AgentBridgeAttributes
stateAgentState-
State updated from messages traveling over the bridge.
filterGenerateFilter | None-
Filter for bridge model generation.
A filter may substitute for the default model generation by returning a ModelOutput or return None to allow default processing to continue.
sandbox_agent_bridge
Sandbox agent bridge.
Provide Inspect integration for agents running inside sandboxes. Runs a proxy server in the container that provides REST entpoints for the OpenAI Completions API, OpenAI Responses API, and Anthropic API. This proxy server runs on port 13131 and routes requests to the current Inspect model provider.
You should set OPENAI_BASE_URL=http://localhost:13131/v1 or ANTHROPIC_BASE_URL=http://localhost:13131 when executing the agent within the container and ensure that your agent targets the model name “inspect” when calling OpenAI or Anthropic. Use “inspect/
@contextlib.asynccontextmanager
async def sandbox_agent_bridge(
state: AgentState | None = None,
*,
model: str | None = None,
filter: GenerateFilter | None = None,
retry_refusals: int | None = None,
sandbox: str | None = None,
port: int = 13131,
web_search: WebSearchProviders | None = None,
) -> AsyncIterator[SandboxAgentBridge]stateAgentState | None-
Initial state for agent bridge. Used as a basis for yielding an updated state based on traffic over the bridge.
modelstr | None-
Force the bridge to use a speicifc model (e.g. “inspect” to force the the default model for the task or “inspect/openai/gpt-4o” to force another specific model).
filterGenerateFilter | None-
Filter for bridge model generation.
retry_refusalsint | None-
Should refusals be retried? (pass number of times to retry)
sandboxstr | None-
Sandbox to run model proxy server within.
portint-
Port to run proxy server on.
web_searchWebSearchProviders | None-
Configuration for mapping model internal web_search tools to Inspect. By default, will map to the internal provider of the target model (supported for OpenAI, Anthropic, Gemini, Grok, and Perplxity). Pass an alternate configuration to use to use an external provider like Tavili or Exa for models that don’t support internal search.
SandboxAgentBridge
Sandbox agent bridge.
class SandboxAgentBridge(AgentBridge)Attributes
portint-
Model proxy server port.
modelstr | None-
Specify that the bridge should use a speicifc model (e.g. “inspect” to use thet default model for the task or “inspect/openai/gpt-4o” to use another specific model).
Filters
content_only
Remove (or convert) message history to pure content.
This is the default filter for agent handoffs and is intended to present a history that doesn’t confound the parent model with tools it doesn’t have, reasoning traces it didn’t create, etc.
- Removes system messages
- Removes reasoning traces
- Removes
internalattribute on content - Converts tool calls to user messages
- Converts server tool calls to text
async def content_only(messages: list[ChatMessage]) -> list[ChatMessage]messageslist[ChatMessage]-
Messages to filter.
last_message
Remove all but the last message.
async def last_message(messages: list[ChatMessage]) -> list[ChatMessage]messageslist[ChatMessage]-
Target messages.
remove_tools
Remove tool calls from messages.
Removes all instances of ChatMessageTool as well as the tool_calls field from ChatMessageAssistant.
async def remove_tools(messages: list[ChatMessage]) -> list[ChatMessage]messageslist[ChatMessage]-
Messages to remove tool calls from.
MessageFilter
Filter messages sent to or received from agent handoffs.
MessageFilter = Callable[[list[ChatMessage]], Awaitable[list[ChatMessage]]]Protocol
Agent
Agents perform tasks and participate in conversations.
Agents are similar to tools however they are participants in conversation history and can optionally append messages and model output to the current conversation state.
You can give the model a tool that enables handoff to your agent using the handoff() function.
You can create a simple tool (that receives a string as input) from an agent using as_tool().
class Agent(Protocol):
async def __call__(
self,
state: AgentState,
*args: Any,
**kwargs: Any,
) -> AgentStatestateAgentState-
Agent state (conversation history and last model output)
*argsAny-
Arguments for the agent.
**kwargsAny-
Keyword arguments for the agent.
AgentState
Agent state.
class AgentStateAttributes
messageslist[ChatMessage]-
Conversation history.
outputModelOutput-
Model output.
agent
Decorator for registering agents.
def agent(
func: Callable[P, Agent] | None = None,
*,
name: str | None = None,
description: str | None = None,
) -> Callable[P, Agent] | Callable[[Callable[P, Agent]], Callable[P, Agent]]funcCallable[P, Agent] | None-
Agent function
namestr | None-
Optional name for agent. If the decorator has no name argument then the name of the agent creation function will be used as the name of the agent.
descriptionstr | None-
Description for the agent when used as an ordinary tool or handoff tool.
agent_with
Agent with modifications to name and/or description
This function modifies the passed agent in place and returns it. If you want to create multiple variations of a single agent using agent_with() you should create the underlying agent multiple times.
def agent_with(
agent: Agent,
*,
name: str | None = None,
description: str | None = None,
) -> AgentagentAgent-
Agent instance to modify.
namestr | None-
Agent name (optional).
descriptionstr | None-
Agent description (optional).
is_agent
Check if an object is an Agent.
Determines if the provided object is registered as an Agent in the system registry. When this function returns True, type checkers will recognize ‘obj’ as an Agent type.
def is_agent(obj: Any) -> TypeGuard[Agent]objAny-
Object to check against the registry.
Types
AgentPrompt
Prompt for agent.
class AgentPrompt(NamedTuple)Attributes
instructionsstr | None-
Agent-specific contextual instructions.
handoff_promptstr | None-
Prompt used when there are additional handoff agents active. Pass
Nonefor no additional handoff prompt. assistant_promptstr | None-
Prompt for assistant (covers tool use, CoT, etc.). Pass
Nonefor no additional assistant prompt. submit_promptstr | None-
Prompt to tell the model about the submit tool.
Pass
Nonefor no additional submit prompt.This prompt is not used if the
assistant_promptcontains a {submit} placeholder.
AgentAttempts
Configure a react agent to make multiple attempts.
Submissions are evaluated using the task’s main scorer, with value of 1.0 indicating a correct answer. Scorer values are converted to float (e.g. “C” becomes 1.0) using the standard value_to_float() function. Provide an alternate conversion scheme as required via score_value.
class AgentAttempts(NamedTuple)Attributes
attemptsint-
Maximum number of attempts.
incorrect_messagestr | Callable[[AgentState, list[Score]], Awaitable[str]]-
User message reply for an incorrect submission from the model. Alternatively, an async function which returns a message.
score_valueValueToFloat-
Function used to extract float from scores (defaults to standard value_to_float())
AgentContinue
Function called to determine whether the agent should continue.
Returns True to continue (with no additional messages inserted), return False to stop. Returns str to continue with an additional custom user message inserted.
AgentContinue: TypeAlias = Callable[[AgentState], Awaitable[bool | str]]AgentSubmit
Configure the submit tool of a react agent.
class AgentSubmit(NamedTuple)Attributes
namestr | None-
Name for submit tool (defaults to ‘submit’).
descriptionstr | None-
Description of submit tool (defaults to ‘Submit an answer for evaluation’).
toolTool | ToolDef | None-
Alternate implementation for submit tool.
The tool can provide its
nameanddescriptioninternally, or these values can be overriden by thenameanddescriptionfields in AgentSubmitThe tool should return the
answerprovided to it for scoring. answer_onlybool-
Set the completion to only the answer provided by the submit tool.
By default, the answer is appended (with
answer_delimiter) to whatever other content the model generated along with the call tosubmit(). answer_delimiterstr-
Delimter used when appending submit tool answer to other content the model generated along with the call to
submit(). keep_in_messagesbool-
Keep the submit tool call in the message history.
Defaults to
False, which results in calls to thesubmit()tool being removed from message history so that the model’s response looks like a standard assistant message.This is particularly important for multi-agent systems where the presence of
submit()calls in the history can cause coordinator agents to terminate early because they think they are done. You should therefore not set this toTrueif you are using handoff() in a multi-agent system.
Deprecated
bridge
Bridge an external agent into an Inspect Agent.
Note that this function is deprecated in favor of the agent_bridge() function. If you are creating a new agent bridge we recommend you use this function rather than bridge().
If you do choose to use the bridge() function, these examples demostrate its basic usage.
@agent
def bridge(
agent: Callable[[dict[str, Any]], Awaitable[dict[str, Any]]],
) -> AgentagentCallable[[dict[str, Any]], Awaitable[dict[str, Any]]]-
Callable which takes a sample
dictand returns a resultdict.