# Inspect AI

> Inspect AI is a Python framework for large language model evaluations created by the [UK AI Security Institute](https://aisi.gov.uk). Inspect provides many built-in components, including facilities for prompt engineering, tool usage, multi-turn dialog, and model graded evaluations. Extensions to Inspect (e.g. to support new elicitation and scoring techniques) can be provided by other Python packages.

## Docs

- [Tutorial](https://inspect.aisi.org.uk/tutorial.html.md): Step-by-step walkthroughs of several basic examples of Inspect evaluations.
- [Options](https://inspect.aisi.org.uk/options.html.md): Covers the various options available for evaluations as well as how to manage model credentials.
- [Log Viewer](https://inspect.aisi.org.uk/log-viewer.html.md): Goes into more depth on how to use Inspect View to develop and debug evaluations, including how to provide additional log metadata and how to integrate it with Python's standard logging module.
- [VS Code](https://inspect.aisi.org.uk/vscode.html.md) Provides documentation on using the Inspect VS Code Extension to run, tune, debug, and visualise evaluations.

- [Tasks](https://inspect.aisi.org.uk/tasks.html.md) bring together datasets, solvers, and scorers to define a evaluation. This section explores strategies for creating flexible and re-usable tasks.
- [Datasets](https://inspect.aisi.org.uk/datasets.html.md): Datasets provide samples to evaluation tasks. This section illustrates how to adapt various data sources for use with Inspect, as well as how to include multi-modal data (images, etc.) in your datasets.
- [Solvers](https://inspect.aisi.org.uk/solvers.html.md): Solvers are the heart of Inspect, and encompass prompt engineering and various other elicitation strategies. Here we cover using the built-in solvers and creating your own more sophisticated ones.
- [Scorers](https://inspect.aisi.org.uk/scorers.html.md): Scorers evaluate the work of solvers and aggregate scores into metrics. Sophisticated evals often require custom scorers that use models to evaluate output. This section covers how to create them.

- [Models](https://inspect.aisi.org.uk/models.html.md): Models provide a uniform API for both evaluating a variety of large language models and using models within evaluations (e.g. for critique or grading).
- [Providers](ttps://inspect.ai-safety-institute.org.uk/providers.html.md) covers usage details and available options for the various supported providers.
- [Caching](https://inspect.aisi.org.uk/caching.html.md): Caching enables you to cache model output to reduce the number of API calls made, saving both time and expense.
- [Multimodal](https://inspect.aisi.org.uk/multimodal.html.md) This article describes how to use images, audio, and video in evaluations.
- [Reasoning](https://inspect.aisi.org.uk/reasoning.html.md) documents the additional options and data available for reasoning models.
- [Batch Mode](https://inspect.aisi.org.uk/models-batch.html.md) covers using batch processing APIs for model inference.
- [JSON Output](https://inspect.aisi.org.uk/structured.html.md) explains how to constrain model output to a particular JSON schema.

- [Tool Basics](https://inspect.aisi.org.uk/tools.html.md): Tools provide a means of extending the capabilities of models by registering Python functions for them to call. This section describes how to create custom tools and use them in evaluations.
- [Standard Tools](https://inspect.aisi.org.uk/tools-standard.html.md) describes Inspect's built-in tools for code execution, text editing, computer use, web search, and web browsing.
- [MCP Tools](https://inspect.aisi.org.uk/tools-mcp.html.md) covers how to intgrate tools from the growing list of [Model Context Protocol](https://modelcontextprotocol.io/introduction) providers.
- [Custom Tools](https://inspect.aisi.org.uk/tools-custom.html.md) provides details on more advanced custom tool features including sandboxing, error handling, and dynamic tool definitions. 
- [Sandboxing](https://inspect.aisi.org.uk/sandboxing.html.md): Enables you to isolate code generated by models as well as set up more complex computing environments for tasks. 
- [Tool Approval](https://inspect.aisi.org.uk/approval.html.md): Approvals enable you to create fine-grained policies for approving tool calls made by models.

- [Agents](https://inspect.aisi.org.uk/agents.html.md): Agents combine planning, memory, and tool usage to pursue more complex, longer horizon tasks. This section describes how to build agent evaluations with Inspect.
- [ReAct Agent](https://inspect.aisi.org.uk/react-agent.html.md)  provides details on using and customizing the built-in ReAct agent.  
- [Multi Agent](https://inspect.aisi.org.uk/multi-agent.html.md) covers various ways to compose agents together in multi-agent architectures.
- [Custom Agents](https://inspect.aisi.org.uk/agent-custom.html.md): This article describes Inspect APIs available for creating custom agents.
- [Agent Bridge](agent-bridge.qmd): Facility for integrating agents from 3rd party frameworks like AutoGen or LangChain.
- [Human Agent](https://inspect.aisi.org.uk/human-agent.html.md): This article describes the `human_cli()` agent which enables human baselining for computing tasks.

- [Eval Logs](https://inspect.aisi.org.uk/eval-logs.html.md): Explores how to get the most out of evaluation logs for developing, debugging, and analyzing evaluations.
- [Data Frames](https://inspect.aisi.org.uk/dataframe.html.md) documents the APIs available for extracting dataframes of evals, samples, messages, and events from log files.

- [Eval Sets](https://inspect.aisi.org.uk/eval-sets.html.md): Covers Inspect's features for describing, running, and analysing larger sets of evaluation tasks.
- [Errors and Limits](https://inspect.aisi.org.uk/errors-and-limits.html.md): This article covers various techniques for dealing with unexpected errors and setting limits on evaluation tasks and samples, including retrying failed evaluations, establishing a threshold of samples to tolerate errors for before failing an evaluation, and setting a maximum number of messages, tokens, or elapsed seconds in a sample before forcing the solver to give up.
- [Typing](https://inspect.aisi.org.uk/typing.html.md): Provides guidance on using static type checking with Inspect, including creating typed interfaces to untyped storage (i.e. sample metadata and store).
- [Tracing](https://inspect.aisi.org.uk/tracing.html.md): Describes advanced execution tracing tools used to diagnose runtime issues.
- [Parallelism](https://inspect.aisi.org.uk/parallelism.html.md): Delves into how to obtain maximum performance for evaluations. Inspect uses a highly parallel async architecture---here we cover how to tune this parallelism (e.g to stay under API rate limits or to not overburden local compute) for optimal throughput.
- [Interactivity](https://inspect.aisi.org.uk/interactivity.html.md): Covers various ways to introduce user interaction into the implementation of tasks (for example, confirming consequential actions or prompting the model dynamically based on the trajectory of the evaluation).
- [Extensions](https://inspect.aisi.org.uk/extensions.html.md) describes the various ways you can extend Inspect, including adding support for new Model APIs, tool execution environments, and storage platforms (for datasets, prompts, and logs).

## Reference: Python API

- [inspect_ai](https://inspect.aisi.org.uk/reference/inspect_ai.html.md) describes the core types used to create tasks and run evaluations.
- [inspect_ai.solver](https://inspect.aisi.org.uk/reference/inspect_ai.solver.html.md) describes built in solvers as well as the types used to create custom solvers.
- [inspect_ai.tool](https://inspect.aisi.org.uk/reference/inspect_ai.tool.html.md) describes built in tools as well as the types used to create custom tools.
- [inspect_ai.agent](https://inspect.aisi.org.uk/reference/inspect_ai.agent.html.md) describes high level agent orchestration and the agent protocol.
- [inspect_ai.scorer](https://inspect.aisi.org.uk/reference/inspect_ai.scorer.html.md) describes built in scorers as well as the types used to create custom scorers.
- [inspect_ai.model](https://inspect.aisi.org.uk/reference/inspect_ai.model.html.md) covers using the Inspect model API for accessing various language models.
- [inspect_ai.dataset](https://inspect.aisi.org.uk/reference/inspect_ai.dataset.html.md) describes the types used to read and manipulate datasets and samples.
- [inspect_ai.approval](https://inspect.aisi.org.uk/reference/inspect_ai.approval.html.md) covers using built in approvers as well as the types used to create custom approvers and approval policies.
- [inspect_ai.log](https://inspect.aisi.org.uk/reference/inspect_ai.log.html.md) describes the types used to list, read, write, and traverse the contents of eval log files.
- [inspect_ai.analysis](https://inspect.aisi.org.uk/reference/inspect_ai.analysis.html.md) covers the Python API for reading logs into dataframes for analysis.
- [inspect_ai.util](https://inspect.aisi.org.uk/reference/inspect_ai.util.html.md) covers various utility functions for concurrency, sandboxes, the store, and more.

## Reference: Command Line 

- [inspect_eval](https://inspect.aisi.org.uk/reference/inspect_eval.html.md): Evaluate one or more tasks.
- [inspect_eval-retry](https://inspect.aisi.org.uk/reference/inspect_eval-retry.html.md): Retry an evaluation task.
- [inspect_eval-set](https://inspect.aisi.org.uk/reference/inspect_eval-set.html.md): Evaluate a set of tasks with retries.
- [inspect_score](https://inspect.aisi.org.uk/reference/inspect_score.html.md): Score a previous evaluation run.
- [inspect_view](https://inspect.aisi.org.uk/reference/inspect_view.html.md): Inspect log viewer.
- [inspect_log](https://inspect.aisi.org.uk/reference/inspect_log.html.md): Query, read, write, and convert logs.
- [inspect_trace](https://inspect.aisi.org.uk/reference/inspect_trace.html.md): List and read execution traces.
- [inspect_sandbox](https://inspect.aisi.org.uk/reference/inspect_sandbox.html.md): Manage sandbox environments.
- [inspect_cache](https://inspect.aisi.org.uk/reference/inspect_cache.html.md): Manage the Inspect model cache.
- [inspect_list](https://inspect.aisi.org.uk/reference/inspect_list.html.md): List tasks on the filesystem.
- [inspect_info](https://inspect.aisi.org.uk/reference/inspect_info.html.md): Read version and configuration.