Running Evals
Once an evaluation is developed, Inspect provides a number of tools for running it reliably and at scale:
| Eval Sets | Describe, run, and analyse larger sets of evaluation tasks with automatic retry and resumption. |
| Parallelism | Run multiple tasks and models in parallel and tune sandbox concurrency. |
| Handling Errors | Deal with runtime errors and recover from crashes during evaluation. |
| Setting Limits | Set time, message, token, and cost limits on tasks, samples, and agent execution. |
| Early Stopping | End tasks early based on the scores of previously completed samples. |
| Tracing | Diagnose runtime issues with advanced execution tracing tools. |
If you are just getting started running evaluations, see the inspect eval command line interface and the eval() function covered in the Welcome tutorial.