Task Configuration
Overview
When running an evaluation, there are four layers where you can set or override task parameters. Each layer takes precedence over any before it:
- Task definition: defaults baked into the
@taskfunction andTask()constructor. - task_with(): programmatic overrides applied to a task object before passing it to eval().
- Environment variables /
.envfiles: project or session defaults set outside code. - eval() / CLI: explicit runtime overrides that take highest precedence.
| Lowest | Highest | ||
|---|---|---|---|
| Task definition | task_with() | .env / env vars |
eval() / CLI |
Understanding these layers is key to reusing and adapting tasks without modifying their source code. This article provides a complete reference for what can be overridden, where, and how. For task authoring patterns, see Tasks; for the full CLI and environment variable catalog, see Options; and for model-role-specific guidance, see Model Roles.
Configuration Layers
Layer 1: Task Definition
A @task-decorated function returns a Task object with all its defaults. These are the baseline values that apply when no overrides are specified:
@task
def security_guide() -> Task:
return Task(
dataset=json_dataset("security_guide.json"),
solver=[chain_of_thought(), generate()],
scorer=model_graded_fact(),
epochs=3,
message_limit=50,
)Task authors can also expose parameters on the @task function, which users can set with -T on the CLI or task_args in eval(). For a full guide to designing and using task parameters, see Parameters; here the focus is how they fit into the overall override model:
@task
def security_guide(
difficulty: str = "medium",
temperature: float = 0.0,
) -> Task:
dataset_file = f"security_guide_{difficulty}.json"
return Task(
dataset=json_dataset(dataset_file),
solver=[chain_of_thought(), generate()],
scorer=model_graded_fact(),
config=GenerateConfig(temperature=temperature),
)Users can then set the task parameters from the command line using the -T flag:
# CLI
inspect eval security_guide.py -T difficulty=hard -T temperature=1.0Duplicating framework parameters like temperature as task parameters is not recommended. They can be set directly using the framework’s built-in CLI flags or passing a GenerateConfig object to eval(my_task(), config=...), and this will override any value set in the task definition.
# CLI
inspect eval security_guide.py --temperature 1.0Layer 2: task_with()
Use task_with() when you want to adapt a task you don’t control (e.g. one imported from a package) before passing it to eval(). It modifies the task in place and returns it:
from inspect_ai import task_with
from inspect_evals.simpleqa import simpleqa
adapted = task_with(
simpleqa(),
solver=my_custom_solver(),
scorer=my_scorer(),
config=GenerateConfig(temperature=0.0),
epochs=5,
)task_with() is the only way to override dataset, scorer, setup, and cleanup at runtime; none of these have CLI flags or eval() parameters. For the broader pattern of adapting published tasks, see Task Reuse.
task_with() modifies the passed task in place. If you need multiple variations, create the underlying task multiple times:
# Correct: two independent tasks
task_a = task_with(simpleqa(), solver=agent_a())
task_b = task_with(simpleqa(), solver=agent_b())
# Wrong: both end up with agent_b's solver
base = simpleqa()
task_a = task_with(base, solver=agent_a())
task_b = task_with(base, solver=agent_b())See the Override Reference table below for the complete list of parameters that task_with() accepts. Note that defaults are NOT_GIVEN (a sentinel), not None; this means you can explicitly pass None to clear a value that the base task set. See the API reference for task_with() for the full signature.
Layer 3: Environment Variables / .env Files
Every CLI flag can also be set as an environment variable using the INSPECT_EVAL_ prefix (with hyphens converted to underscores). These can be set in the shell or placed in a .env file, which Inspect reads automatically from the current directory (searching parent directories if not found).
This layer is useful for setting project or session defaults — values you want applied across multiple eval runs without specifying them each time:
.env
INSPECT_EVAL_MODEL=anthropic/claude-sonnet-4-5
INSPECT_EVAL_TEMPERATURE=0.0
INSPECT_EVAL_MAX_CONNECTIONS=20
INSPECT_EVAL_MAX_RETRIES=5Environment variables set in the shell take precedence over values in .env files. See Options for full details on .env file handling.
# CLI
INSPECT_EVAL_LIMIT=1 inspect eval simpleqa.pyLayer 4: eval() / CLI
Parameters passed to eval() or on the inspect eval command line are the outermost overrides and take highest precedence. They apply to all tasks being evaluated in that call.
Python:
from inspect_ai import eval
eval(
simpleqa(),
model="anthropic/claude-sonnet-4-5",
temperature=0.0,
max_tokens=4096,
epochs=5,
limit=100,
message_limit=50,
model_roles={"grader": "google/gemini-2.0-flash"},
)CLI:
inspect eval inspect_evals/simpleqa \
--model anthropic/claude-sonnet-4-5 \
--temperature 0.0 \
--max-tokens 4096 \
--epochs 5 \
--limit 100 \
--message-limit 50 \
--model-role grader=google/gemini-2.0-flashSee Eval Options for the full list of CLI flags.
What Can Be Overridden Where
The table below shows every overridable parameter and which layers support it:
| Parameter | Task | task_with |
eval |
CLI flag |
|---|---|---|---|---|
| Task structure | ||||
dataset |
yes | yes | ||
setup |
yes | yes | ||
solver |
yes | yes | yes | --solver (name or file.py@name) |
cleanup |
yes | yes | ||
scorer |
yes | yes | ||
metrics |
yes | yes | ||
| Model | ||||
model |
yes | yes | yes | --model |
config GenerateConfig (includes temperature, max_tokens, etc.) |
yes | yes | yes (via **kwargs) |
individual flags or --generate-config |
model_roles |
yes | yes | yes | --model-role |
| Execution limits | ||||
epochs |
yes | yes | yes | --epochs |
message_limit |
yes | yes | yes | --message-limit |
token_limit |
yes | yes | yes | --token-limit |
time_limit |
yes | yes | yes | --time-limit |
working_limit |
yes | yes | yes | --working-limit |
cost_limit |
yes | yes | yes | --cost-limit |
early_stopping |
yes | yes | ||
| Error handling | ||||
fail_on_error |
yes | yes | yes | --fail-on-error |
continue_on_fail |
yes | yes | yes | --continue-on-fail |
retry_on_error |
yes | --retry-on-error |
||
debug_errors |
yes | --debug-errors |
||
| Environment | ||||
sandbox |
yes | yes | yes | --sandbox |
sandbox_cleanup |
yes | yes | --no-sandbox-cleanup |
|
approval |
yes | yes | yes | --approval |
| Task identity | ||||
name |
yes | yes | ||
version |
yes | yes | ||
metadata |
yes | yes (overwrites) | yes (merges) | --metadata |
tags |
yes | yes (overwrites) | yes (merges) | --tags |
| Sample selection | ||||
limit |
yes | --limit |
||
sample_id |
yes | --sample-id |
||
sample_shuffle |
yes | --sample-shuffle |
||
| Eval-level controls | ||||
task_args |
args/kwargs | yes | -T key=value |
|
score |
yes | --no-score |
||
score_display |
yes | --no-score-display |
||
trace |
yes | --trace |
Blank cells indicate that a parameter is not configurable at that layer. The task_args row indicates these fields are set as arguments of the Task object, as opposed to passing a task_args dictionary.
Generation Config
GenerateConfig parameters (temperature, max_tokens, top_p, etc.) can be set at every layer:
In the task definition via config:
Task(
...,
config=GenerateConfig(temperature=0.5, max_tokens=2048)
)With task_with() via config:
task_with(my_task(), config=GenerateConfig(temperature=0.0))With eval() as keyword arguments:
eval(my_task(), temperature=0.0, max_tokens=4096)On the CLI as individual flags:
inspect eval my_task.py --temperature 0.0 --max-tokens 4096On the CLI from a YAML/JSON file using --generate-config:
inspect eval my_task.py --generate-config config.yamlWhere config.yaml contains GenerateConfig fields:
config.yaml
temperature: 0.5
max_tokens: 2048The --generate-config option is useful when you want to bundle a set of generation parameters together, for example to capture the parameters specified in a paper for reproducibility. Individual CLI flags (e.g. --temperature) take precedence over values in the config file.
Model Roles
Model roles let you assign different models to named purposes within a task (e.g. a “grader” model for scoring). They can be configured on Task, with task_with(), with eval(), or on the CLI with --model-role; see the override table above for where each form fits into the precedence model.
For complete guidance, including inline YAML / JSON examples and role-resolution details, see Model Roles. Here is the most common pattern:
Task(..., model_roles={"grader": "openai/gpt-4o"})
eval(my_task(), model_roles={"grader": "google/gemini-2.0-flash"})Inside a solver or scorer, resolve the role with get_model():
model = get_model(role="grader", default="openai/gpt-4o")Solver Override
The solver can be overridden at every layer:
With task_with() — any solver or agent object:
task_with(my_task(), solver=my_custom_agent())With eval() — solver objects, SolverSpec, agents, or a list of solvers:
eval(my_task(), solver=my_custom_agent())On the CLI — by name or file.py@name reference:
# solver registered via @solver decorator
inspect eval my_task.py --solver my_registered_solver -S attempts=5
# solver defined in a file (file.py@function_name)
inspect eval my_task.py --solver solvers.py@ctf_agent -S attempts=5Any function decorated with @solver is automatically registered with Inspect and can be referenced by name (see Custom Solvers). The file.py@name syntax lets you reference a solver in any Python file without needing package registration. The -S flag passes arguments to the solver function; you can also use --solver-config to pass solver arguments from a YAML or JSON file. See Tasks and Solvers for the corresponding CLI options.
When a solver is overridden, it replaces the task’s solver entirely. Solvers do not merge or chain across layers. However, the task’s setup solver (if any) always runs before the overridden solver. See Setup Parameter for details.
Scorer Override
The scorer can only be overridden via task_with() during a live eval. There is no eval() parameter or CLI flag for scorers:
task_with(my_task(), scorer=my_custom_scorer())Some task authors expose scorer selection as a task parameter, which can then be set with -T:
inspect eval my_task.py -T scorer=originalThis is a convention, not a framework feature — the task’s @task function must explicitly handle the parameter.
You can re-score an existing log file with a different scorer using inspect score. The --scorer flag accepts a name (any function decorated with @scorer — see Custom Scorers) or a file.py@name reference:
# scorer registered via @scorer decorator
inspect score log_file.eval --scorer my_scorer
# scorer defined in a file
inspect score log_file.eval --scorer scorers.py@custom_scorerPrecedence
When the same parameter is set at multiple levels, the outermost level wins. The full precedence chain is:
| Lowest | Highest | |||
|---|---|---|---|---|
| Task definition | task_with() | .env file |
env var | CLI flag / eval() |
An explicit CLI flag overrides an environment variable, which overrides a value from a .env file, which overrides task_with(), which overrides the task definition.
For example, a task that sets temperature=0.5 internally can be overridden at runtime:
inspect eval my_task.py --temperature 0.0Or via environment variable:
export INSPECT_EVAL_TEMPERATURE=0.0
inspect eval my_task.pyOr in Python:
eval(my_task(), temperature=0.0)For GenerateConfig specifically, values from --generate-config (a YAML/JSON file) are merged with individual CLI flags, with individual flags taking precedence over the file.
Common Patterns
When consuming a task from a package (like inspect_evals) and you need to customise it, here is the recommended approach for each scenario:
| Need | How |
|---|---|
| Different model | eval() / --model |
| Different temperature or max_tokens | eval() / --temperature / --max-tokens |
| Bundle of generation params | --generate-config config.yaml |
| Different solver | eval(solver=...) / --solver / task_with() |
| Different scorer | task_with(task, scorer=...) |
| Different grader model | --model-role grader=... / eval(model_roles=) |
| Different metrics | task_with(task, metrics=[...]) |
| Subset of samples | --limit / --sample-id |
| Different epochs | --epochs |
Most components except scorer, dataset, and metrics can be overridden without modifying the task’s source code. If the task author uses get_model(role="grader") for model-graded scoring, the grader model becomes overridable at runtime via --model-role.