inspect_ai.hooks

Registration

Hooks

Base class for hooks.

Note that whenever hooks are called, they are wrapped in a try/except block to catch any exceptions that may occur. This is to ensure that a hook failure does not affect the overall execution of the eval. If a hook fails, a warning will be logged.

Source

class Hooks

Methods

enabled

Check if the hook should be enabled.

Default implementation returns True.

Hooks may wish to override this to e.g. check the presence of an environment variable or a configuration setting.

Will be called frequently, so consider caching the result if the computation is expensive.

Source

def enabled(self) -> bool

on_run_start

On run start.

A “run” is a single invocation of eval() or eval_retry() which may contain many Tasks, each with many Samples and many epochs. Note that eval_retry() can be invoked multiple times within an eval_set().

Source

async def on_run_start(self, data: RunStart) -> None

data RunStart: Run start data.

on_run_end

On run end.

Source

async def on_run_end(self, data: RunEnd) -> None

data RunEnd: Run end data.

on_task_start

On task start.

Source

async def on_task_start(self, data: TaskStart) -> None

data TaskStart: Task start data.

on_task_end

On task end.

Source

async def on_task_end(self, data: TaskEnd) -> None

data TaskEnd: Task end data.

on_sample_start

On sample start.

Called when a sample is about to be start. If the sample errors and retries, this will not be called again.

If a sample is run for multiple epochs, this will be called once per epoch.

Source

async def on_sample_start(self, data: SampleStart) -> None

data SampleStart: Sample start data.

on_sample_end

On sample end.

Called when a sample has either completed successfully, or when a sample has errored and has no retries remaining.

If a sample is run for multiple epochs, this will be called once per epoch.

Source

async def on_sample_end(self, data: SampleEnd) -> None

data SampleEnd: Sample end data.

on_model_usage

Called when a call to a model’s generate() method completes successfully.

Note that this is not called when Inspect’s local cache is used and is a cache hit (i.e. if no external API call was made). Provider-side caching will result in this being called.

Source

async def on_model_usage(self, data: ModelUsageData) -> None

data ModelUsageData: Model usage data.

override_api_key

Optionally override an API key.

When overridden, this method may return a new API key value which will be used in place of the original one during the eval.

Source

def override_api_key(self, data: ApiKeyOverride) -> str | None

data ApiKeyOverride: Api key override data.

hooks

Decorator for registering a hook subscriber.

Note

The @hooks decorator is available only in the development version of Inspect. To install the development version from GitHub:

pip install git+https://github.com/UKGovernmentBEIS/inspect_ai

Either decorate a subclass of Hooks, or a function which returns the type of a subclass of Hooks. This decorator will instantiate the hook class and store it in the registry.

Source

def hooks(name: str, description: str) -> Callable[..., Type[T]]

name str: Name of the subscriber (e.g. “audit logging”).
description str: Short description of the hook (e.g. “Copies eval files to S3 bucket for auditing.”).

Hook Data

ApiKeyOverride

Api key override hook event data.

Source

@dataclass(frozen=True)
class ApiKeyOverride

Attributes

env_var_name str: The name of the environment var containing the API key (e.g. OPENAI_API_KEY).
value str: The original value of the environment variable.

ModelUsageData

Model usage hook event data.

Source

@dataclass(frozen=True)
class ModelUsageData

Attributes

model_name str: The name of the model that was used.
usage ModelUsage: The model usage metrics.
call_duration float: The duration of the model call in seconds. If HTTP retries were made, this is the time taken for the successful call. This excludes retry waiting (e.g. exponential backoff) time.

RunEnd

Run end hook event data.

Source

@dataclass(frozen=True)
class RunEnd

Attributes

run_id str: The globally unique identifier for the run.
logs EvalLogs: All eval logs generated during the run. Can be headers only if the run was an eval_set().

RunStart

Run start hook event data.

Source

@dataclass(frozen=True)
class RunStart

Attributes

run_id str: The globally unique identifier for the run.
task_names list[str]: The names of the tasks which will be used in the run.

SampleEnd

Sample end hook event data.

Source

@dataclass(frozen=True)
class SampleEnd

Attributes

run_id str: The globally unique identifier for the run.
eval_id str: The globally unique identifier for the task execution.
sample_id str: The globally unique identifier for the sample execution.
summary EvalSampleSummary: Summary of the sample that has run.

SampleStart

Sample start hook event data.

Source

@dataclass(frozen=True)
class SampleStart

Attributes

run_id str: The globally unique identifier for the run.
eval_id str: The globally unique identifier for the task execution.
sample_id str: The globally unique identifier for the sample execution.
summary EvalSampleSummary: Summary of the sample to be run.

TaskEnd

Task end hook event data.

Source

@dataclass(frozen=True)
class TaskEnd

Attributes

run_id str: The globally unique identifier for the run.
eval_id str: The globally unique identifier for the task execution.
log EvalLog: The log generated for the task. Can be header only if the run was an eval_set()

TaskStart

Task start hook event data.

Source

@dataclass(frozen=True)
class TaskStart

Attributes

run_id str: The globally unique identifier for the run.
eval_id str: The globally unique identifier for this task execution.
spec EvalSpec: Specification of the task.