Early Stopping
The early stopping feature described below is available only in the development version of Inspect. To install the development version from GitHub:
pip install git+https://github.com/UKGovernmentBEIS/inspect_aiOverview
Early stopping enables you to skip samples or epochs during evaluation based on results observed so far. This is useful for implementing adaptive testing algorithms that dynamically decide which samples to run based on prior performance, potentially saving significant computation time while maintaining evaluation quality.
Common use cases include:
- Stopping a sample after consistent results: If a sample has been answered correctly (or incorrectly) across multiple epochs, skip remaining epochs.
- Adaptive difficulty: Focus evaluation time on samples near the model’s capability boundary.
- Resource optimization: Skip samples that are unlikely to provide additional signal.
EarlyStopping Protocol
To implement early stopping, create a class that implements the EarlyStopping protocol and pass it to the early_stopping parameter of a Task:
from inspect_ai import Task, task
from inspect_ai.util import EarlyStopping, EarlyStop
@task
def my_task():
return Task(
dataset=my_dataset,
solver=my_solver,
scorer=my_scorer,
early_stopping=MyEarlyStopping(),
epochs=5,
)The EarlyStopping protocol defines four async methods:
| Method | Description |
|---|---|
start_task() |
Called at the beginning of an eval to register task metadata. |
schedule_sample() |
Called before each sample runs; return EarlyStop to skip it. |
complete_sample() |
Called when a sample completes with its scores. |
complete_task() |
Called when the task completes; return metadata for the log. |
Example Implementation
Here is a simple example that randomly stops samples early (for demonstration purposes):
from pydantic import JsonValue
from typing_extensions import override
from inspect_ai.dataset import Sample
from inspect_ai.log import EvalSpec
from inspect_ai.scorer import SampleScore
from inspect_ai.util import EarlyStopping, EarlyStop
class RandomEarlyStopping(EarlyStopping):
@override
async def start_task(
self, task: EvalSpec, samples: list[Sample], epochs: int
) -> str:
"""Task initialization."""
# TODO: create a structure to track all of the samples/epochs
# this will generally be updated w/ scores in complete_sample()
# return task name
return "random"
@override
async def schedule_sample(
self, id: str | int, epoch: int
) -> EarlyStop | None:
"""Return EarlyStop to skip this sample, or None to run it."""
# TODO: determine whether the given sample has been run based
# on the previously accumulated samples scores.
# randomly stop some samples
if random() < 0.5:
return EarlyStop(id=id, epoch=epoch, reason="random stop")
return None
@override
async def complete_sample(
self, id: str | int, epoch: int, scores: dict[str, SampleScore]
) -> None:
"""Process results from a completed sample."""
# TODO: track scored samples and use this to determine the
# appropriate return value for calls to schedule_sample()
pass
@override
async def complete_task(self) -> dict[str, JsonValue]:
"""Return custom metadata to record in the eval log."""
# TODO: return any custom data about the early stopping output
# (will be written to the log and displayed in the viewer)
return {}EarlyStop
When schedule_sample() returns an EarlyStop, the sample is skipped. The EarlyStop class includes:
| Field | Type | Description |
|---|---|---|
id |
str | int |
Sample dataset id. |
epoch |
int |
Sample epoch. |
reason |
str | None |
Optional reason for the early stop. |
metadata |
dict[str, JsonValue] | None |
Optional metadata about the stop. |
Log Output
Early stopping information is recorded in the eval log as an EarlyStoppingSummary, which includes:
- The name of the early stopping manager
- A list of all samples that were stopped early
- Any metadata returned by
complete_task()
This allows you to analyze and audit the early stopping behavior after evaluation completes.