inspect_ai.hooks
Registration
Hooks
Base class for hooks.
Note that whenever hooks are called, they are wrapped in a try/except block to catch any exceptions that may occur. This is to ensure that a hook failure does not affect the overall execution of the eval. If a hook fails, a warning will be logged.
class HooksMethods
- enabled
-
Check if the hook should be enabled.
Default implementation returns True.
Hooks may wish to override this to e.g. check the presence of an environment variable or a configuration setting.
Will be called frequently, so consider caching the result if the computation is expensive.
def enabled(self) -> bool - on_eval_set_start
-
On eval set start.
A “eval set” is an invocation of eval_set() for a log directory. Note that the
eval_set_idwill be stable across multiple invocations of eval_set() for the same log directory.async def on_eval_set_start(self, data: EvalSetStart) -> NonedataEvalSetStart-
Eval set start data.
- on_eval_set_end
-
On eval set end.
async def on_eval_set_end(self, data: EvalSetEnd) -> NonedataEvalSetEnd-
Eval set end data.
- on_run_start
-
On run start.
A “run” is a single invocation of eval() or eval_retry() which may contain many Tasks, each with many Samples and many epochs. Note that eval_retry() can be invoked multiple times within an eval_set().
async def on_run_start(self, data: RunStart) -> NonedataRunStart-
Run start data.
- on_run_end
-
On run end.
async def on_run_end(self, data: RunEnd) -> NonedataRunEnd-
Run end data.
- on_task_start
-
On task start.
async def on_task_start(self, data: TaskStart) -> NonedataTaskStart-
Task start data.
- on_task_end
-
On task end.
async def on_task_end(self, data: TaskEnd) -> NonedataTaskEnd-
Task end data.
- on_sample_init
-
On sample init.
Called when a sample has been scheduled and is about to begin initialization, before sandbox environments are created. This hook can be used to gate sandbox resource provisioning.
If the sample errors and retries, this will not be called again.
If a sample is run for multiple epochs, this will be called once per epoch.
async def on_sample_init(self, data: SampleInit) -> NonedataSampleInit-
Sample init data.
- on_sample_start
-
On sample start.
Called when a sample is about to be start. If the sample errors and retries, this will not be called again.
If a sample is run for multiple epochs, this will be called once per epoch.
async def on_sample_start(self, data: SampleStart) -> NonedataSampleStart-
Sample start data.
- on_sample_event
-
On sample event.
Called when a sample event is emmitted. Pending events are not logged here (i.e. ToolEvent and ModelEvent are not logged until they are complete).
async def on_sample_event(self, data: SampleEvent) -> NonedataSampleEvent-
Sample event.
- on_sample_end
-
On sample end.
Called when a sample has either completed successfully, or when a sample has errored and has no retries remaining.
If a sample is run for multiple epochs, this will be called once per epoch.
async def on_sample_end(self, data: SampleEnd) -> NonedataSampleEnd-
Sample end data.
- on_sample_attempt_start
-
On sample attempt start.
Fired at the beginning of every attempt (including the first). Unlike on_sample_start which fires once per sample, this fires on retries too.
async def on_sample_attempt_start(self, data: SampleAttemptStart) -> NonedataSampleAttemptStart-
Sample attempt start data.
- on_sample_attempt_end
-
On sample attempt end.
Fired at the end of every attempt (including the last). Unlike on_sample_end which fires once per sample, this fires on retries too.
async def on_sample_attempt_end(self, data: SampleAttemptEnd) -> NonedataSampleAttemptEnd-
Sample attempt end data.
- on_model_usage
-
Called when a call to a model’s generate() method completes successfully without hitting Inspect’s local cache.
Note that this is not called when Inspect’s local cache is used and is a cache hit (i.e. if no external API call was made). Provider-side caching will result in this being called.
async def on_model_usage(self, data: ModelUsageData) -> NonedataModelUsageData-
Model usage data.
- on_model_cache_usage
-
Called when a call to a model’s generate() method completes successfully by hitting Inspect’s local cache.
async def on_model_cache_usage(self, data: ModelCacheUsageData) -> NonedataModelCacheUsageData-
Cached model usage data.
- on_sample_scoring
-
Called before the sample is scored.
Can be used by hooks to demarcate the end of solver execution and the start of scoring.
async def on_sample_scoring(self, data: SampleScoring) -> NonedataSampleScoring-
Sample scoring data.
- override_api_key
-
Optionally override an API key.
When overridden, this method may return a new API key value which will be used in place of the original one during the eval.
def override_api_key(self, data: ApiKeyOverride) -> str | NonedataApiKeyOverride-
Api key override data.
hooks
Decorator for registering a hook subscriber.
Either decorate a subclass of Hooks, or a function which returns the type of a subclass of Hooks. This decorator will instantiate the hook class and store it in the registry.
def hooks(name: str, description: str) -> Callable[..., Type[T]]namestr-
Name of the subscriber (e.g. “audit logging”).
descriptionstr-
Short description of the hook (e.g. “Copies eval files to S3 bucket for auditing.”).
Hook Data
ApiKeyOverride
Api key override hook event data.
@dataclass(frozen=True)
class ApiKeyOverrideAttributes
env_var_namestr-
The name of the environment var containing the API key (e.g. OPENAI_API_KEY).
valuestr-
The original value of the environment variable.
ModelUsageData
Model usage hook event data.
@dataclass(frozen=True)
class ModelUsageDataAttributes
model_namestr-
The name of the model that was used.
usageModelUsage-
The model usage metrics.
call_durationfloat-
The duration of the model call in seconds. If HTTP retries were made, this is the time taken for the successful call. This excludes retry waiting (e.g. exponential backoff) time.
eval_set_idstr | None-
The globally unique identifier for the eval set (if any).
run_idstr | None-
The globally unique identifier for the run (if any).
eval_idstr | None-
The globally unique identifier for the task execution (if any).
task_namestr | None-
The name of the task that generated this usage (if any).
retriesint-
The number of HTTP retries made before the successful call.
EvalSetStart
Eval set start hook event data.
@dataclass(frozen=True)
class EvalSetStartAttributes
eval_set_idstr-
The globally unique identifier for the eval set. Note that the
eval_set_idwill be stable across multiple invocations of eval_set() for the same log directory log_dirstr-
The log directory for the eval set.
EvalSetEnd
Eval set end event data.
@dataclass(frozen=True)
class EvalSetEndAttributes
eval_set_idstr-
The globally unique identifier for the eval set. Note that the
eval_set_idwill be stable across multiple invocations of eval_set() for the same log directory log_dirstr-
The log directory for the eval set.
RunEnd
Run end hook event data.
@dataclass(frozen=True)
class RunEndAttributes
eval_set_idstr | None-
The globally unique identifier for the eval set (if any).
run_idstr-
The globally unique identifier for the run.
exceptionBaseException | None-
The exception that occurred during the run, if any. If None, the run completed successfully.
logsEvalLogs-
All eval logs generated during the run. Can be headers only if the run was an eval_set().
RunStart
Run start hook event data.
@dataclass(frozen=True)
class RunStartAttributes
eval_set_idstr | None-
The globally unique identifier for the eval set (if any).
run_idstr-
The globally unique identifier for the run.
task_nameslist[str]-
The names of the tasks which will be used in the run.
SampleEnd
Sample end hook event data.
@dataclass(frozen=True)
class SampleEndAttributes
eval_set_idstr | None-
The globally unique identifier for the eval set (if any).
run_idstr-
The globally unique identifier for the run.
eval_idstr-
The globally unique identifier for the task execution.
sample_idstr-
The globally unique identifier for the sample execution.
sampleEvalSample-
The sample that has run.
SampleInit
Sample init hook event data.
@dataclass(frozen=True)
class SampleInitAttributes
eval_set_idstr | None-
The globally unique identifier for the eval set (if any).
run_idstr-
The globally unique identifier for the run.
eval_idstr-
The globally unique identifier for the task execution.
sample_idstr-
The globally unique identifier for the sample execution.
summaryEvalSampleSummary-
Summary of the sample to be initialized.
SampleStart
Sample start hook event data.
@dataclass(frozen=True)
class SampleStartAttributes
eval_set_idstr | None-
The globally unique identifier for the eval set (if any).
run_idstr-
The globally unique identifier for the run.
eval_idstr-
The globally unique identifier for the task execution.
sample_idstr-
The globally unique identifier for the sample execution.
summaryEvalSampleSummary-
Summary of the sample to be run.
SampleAttemptStart
Sample attempt start hook event data.
Fired at the beginning of every attempt (including the first). Unlike on_sample_start which fires once per sample, this fires on retries too.
@dataclass(frozen=True)
class SampleAttemptStartAttributes
eval_set_idstr | None-
The globally unique identifier for the eval set (if any).
run_idstr-
The globally unique identifier for the run.
eval_idstr-
The globally unique identifier for the task execution.
sample_idstr-
The globally unique identifier for the sample execution.
summaryEvalSampleSummary-
Summary of the sample to be run.
attemptint-
1-based attempt number.
SampleAttemptEnd
Sample attempt end hook event data.
Fired at the end of every attempt (including the last). Unlike on_sample_end which fires once per sample, this fires on retries too.
@dataclass(frozen=True)
class SampleAttemptEndAttributes
eval_set_idstr | None-
The globally unique identifier for the eval set (if any).
run_idstr-
The globally unique identifier for the run.
eval_idstr-
The globally unique identifier for the task execution.
sample_idstr-
The globally unique identifier for the sample execution.
summaryEvalSampleSummary-
Summary of the sample.
attemptint-
1-based attempt number.
errorEvalError | None-
The error from this attempt, if any.
will_retrybool-
Whether the sample will be retried after this attempt.
SampleEvent
Sample event hook event data.
@dataclass(frozen=True)
class SampleEventAttributes
eval_set_idstr | None-
The globally unique identifier for the eval set (if any).
run_idstr-
The globally unique identifier for the run.
eval_idstr-
The globally unique identifier for the task execution.
sample_idstr-
The globally unique identifier for the sample execution.
eventEvent-
Sample events.
TaskEnd
Task end hook event data.
@dataclass(frozen=True)
class TaskEndAttributes
eval_set_idstr | None-
The globally unique identifier for the eval set (if any).
run_idstr-
The globally unique identifier for the run.
eval_idstr-
The globally unique identifier for the task execution.
logEvalLog-
The log generated for the task. Can be header only if the run was an eval_set()
TaskStart
Task start hook event data.
@dataclass(frozen=True)
class TaskStartAttributes
eval_set_idstr | None-
The globally unique identifier for the eval set (if any).
run_idstr-
The globally unique identifier for the run.
eval_idstr-
The globally unique identifier for this task execution.
specEvalSpec-
Specification of the task.