Running Evals

Once an evaluation is developed, Inspect provides a number of tools for running it reliably and at scale:

Eval Sets	Describe, run, and analyse larger sets of evaluation tasks with automatic retry and resumption.
Parallelism	Run multiple tasks and models in parallel and tune sandbox concurrency.
Handling Errors	Deal with runtime errors and recover from crashes during evaluation.
Setting Limits	Set time, message, token, and cost limits on tasks, samples, and agent execution.
Control Channel	Observe running evals from another process: task and sample status, errors, and transcript events.
Early Stopping	End tasks early based on the scores of previously completed samples.
Tracing	Diagnose runtime issues with advanced execution tracing tools.

If you are just getting started running evaluations, see the inspect eval command line interface and the eval() function covered in the Welcome tutorial.