inspect_ai.analysis.beta
Analysis functions are currently in beta and are exported from the inspect_ai.analysis.beta module. The beta module will be preserved after final release so that code written against it now will continue to work after the beta.
Evals
evals_df
Read a dataframe containing evals.
def evals_df(
logs: LogPaths,list[Column] = EvalColumns,
columns: bool = True,
recursive: bool = False,
reverse: bool = True,
strict: -> "pd.DataFrame" | tuple["pd.DataFrame", ColumnErrors] )
logs
LogPaths-
One or more paths to log files or log directories.
columns
list[Column]-
Specification for what columns to read from log files.
recursive
bool-
Include recursive contents of directories (defaults to
True
) reverse
bool-
Reverse the order of the dataframe (by default, items are ordered from oldest to newest).
strict
bool-
Raise import errors immediately. Defaults to
True
. IfFalse
then a tuple ofDataFrame
and errors is returned.
EvalColumn
Column which maps to EvalLog.
class EvalColumn(Column)
EvalColumns
Default columns to import for evals_df().
list[Column] = (
EvalColumns:
EvalInfo+ EvalTask
+ EvalModel
+ EvalDataset
+ EvalConfig
+ EvalResults
+ EvalScores
)
EvalInfo
Eval basic information columns.
list[Column] = [
EvalInfo: "run_id", path="eval.run_id", required=True),
EvalColumn("task_id", path="eval.task_id", required=True),
EvalColumn("log", path=eval_log_location),
EvalColumn("created", path="eval.created", type=datetime, required=True),
EvalColumn("tags", path="eval.tags", default="", value=list_as_str),
EvalColumn("git_origin", path="eval.revision.origin"),
EvalColumn("git_commit", path="eval.revision.commit"),
EvalColumn("packages", path="eval.packages"),
EvalColumn("metadata", path="eval.metadata"),
EvalColumn( ]
EvalTask
Eval task configuration columns.
list[Column] = [
EvalTask: "task_name", path="eval.task", required=True),
EvalColumn("task_version", path="eval.task_version", required=True),
EvalColumn("task_file", path="eval.task_file"),
EvalColumn("task_attribs", path="eval.task_attribs"),
EvalColumn("task_arg_*", path="eval.task_args"),
EvalColumn("solver", path="eval.solver"),
EvalColumn("solver_args", path="eval.solver_args"),
EvalColumn("sandbox_type", path="eval.sandbox.type"),
EvalColumn("sandbox_config", path="eval.sandbox.config"),
EvalColumn( ]
EvalModel
Eval model columns.
list[Column] = [
EvalModel: "model", path="eval.model", required=True),
EvalColumn("model_base_url", path="eval.model_base_url"),
EvalColumn("model_args", path="eval.model_base_url"),
EvalColumn("model_generate_config", path="eval.model_generate_config"),
EvalColumn("model_roles", path="eval.model_roles"),
EvalColumn( ]
EvalConfig
Eval configuration columns.
list[Column] = [
EvalConfig: "epochs", path="eval.config.epochs"),
EvalColumn("epochs_reducer", path="eval.config.epochs_reducer"),
EvalColumn("approval", path="eval.config.approval"),
EvalColumn("message_limit", path="eval.config.message_limit"),
EvalColumn("token_limit", path="eval.config.token_limit"),
EvalColumn("time_limit", path="eval.config.time_limit"),
EvalColumn("working_limit", path="eval.config.working_limit"),
EvalColumn( ]
EvalResults
Eval results columns.
list[Column] = [
EvalResults: "status", path="status", required=True),
EvalColumn("error_message", path="error.message"),
EvalColumn("error_traceback", path="error.traceback"),
EvalColumn("total_samples", path="results.total_samples"),
EvalColumn("completed_samples", path="results.completed_samples"),
EvalColumn("score_headline_name", path="results.scores[0].scorer"),
EvalColumn("score_headline_metric", path="results.scores[0].metrics.*.name"),
EvalColumn("score_headline_value", path="results.scores[0].metrics.*.value"),
EvalColumn( ]
EvalScores
Eval scores (one score/metric per-columns).
list[Column] = [
EvalScores: "score_*_*", path=eval_log_scores_dict),
EvalColumn( ]
Samples
samples_df
Read a dataframe containing samples from a set of evals.
def samples_df(
logs: LogPaths,list[Column] = SampleSummary,
columns: bool = True,
recursive: bool = False,
reverse: bool = True,
strict: -> "pd.DataFrame" | tuple["pd.DataFrame", ColumnErrors] )
logs
LogPaths-
One or more paths to log files or log directories.
columns
list[Column]-
Specification for what columns to read from log files.
recursive
bool-
Include recursive contents of directories (defaults to
True
) reverse
bool-
Reverse the order of the dataframe (by default, items are ordered from oldest to newest).
strict
bool-
Raise import errors immediately. Defaults to
True
. IfFalse
then a tuple ofDataFrame
and errors is returned.
SampleColumn
Column which maps to EvalSample or EvalSampleSummary.
class SampleColumn(Column)
SampleSummary
Sample summary columns.
list[Column] = [
SampleSummary: "id", path="id", required=True, type=str),
SampleColumn("epoch", path="epoch", required=True),
SampleColumn("input", path="input", required=True, value=input_as_str),
SampleColumn("target", path="target", required=True, value=list_as_str),
SampleColumn("metadata_*", path="metadata"),
SampleColumn("score_*", path="scores", value=score_values),
SampleColumn("model_usage", path="model_usage"),
SampleColumn("total_time", path="total_time"),
SampleColumn("working_time", path="total_time"),
SampleColumn("error", path="error"),
SampleColumn("limit", path="limit"),
SampleColumn("retries", path="retries"),
SampleColumn( ]
SampleMessages
Sample messages as a string.
list[Column] = [
SampleMessages: "messages", path=sample_messages_as_str, required=True, full=True)
SampleColumn( ]
Messages
messages_df
Read a dataframe containing messages from a set of evals.
def messages_df(
logs: LogPaths,list[Column] = MessageColumns,
columns: filter: MessageFilter | None = None,
bool = True,
recursive: bool = False,
reverse: bool = True,
strict: -> "pd.DataFrame" | tuple["pd.DataFrame", ColumnErrors] )
logs
LogPaths-
One or more paths to log files or log directories.
columns
list[Column]-
Specification for what columns to read from log files.
filter
MessageFilter | None-
List of message role types to include or callable that performs the filter.
recursive
bool-
Include recursive contents of directories (defaults to
True
) reverse
bool-
Reverse the order of the dataframe (by default, items are ordered from oldest to newest).
strict
bool-
Raise import errors immediately. Defaults to
True
. IfFalse
then a tuple ofDataFrame
and errors is returned.
MessageFilter
Filter for messages_df() rows.
= (
MessageFilter: TypeAlias list[Literal["system", "user", "assistant", "tool"]] | Callable[[ChatMessage], bool]
)
MessageColumn
Column which maps to ChatMessage.
class MessageColumn(Column)
MessageContent
Message content columns.
list[Column] = [
MessageContent: "role", path="role", required=True),
MessageColumn("content", path=message_text),
MessageColumn("source", path="source"),
MessageColumn( ]
MessageToolCalls
Message tool call columns.
list[Column] = [
MessageToolCalls: "tool_calls", path=message_tool_calls),
MessageColumn("tool_call_id", path="tool_call_id"),
MessageColumn("tool_call_function", path="function"),
MessageColumn("tool_call_error", path="error.message"),
MessageColumn( ]
MessageColumns
Chat message columns.
list[Column] = MessageContent + MessageToolCalls MessageColumns:
Columns
Column
Specification for importing a column into a dataframe.
Extract columns from an EvalLog path either using JSONPath expressions or a function that takes EvalLog and returns a value.
By default, columns are not required, pass required=True
to make them required. Non-required columns are extracted as None
, provide a default
to yield an alternate value.
The type
option serves as both a validation check and a directive to attempt to coerce the data into the specified type
. Coercion from str
to other types is done after interpreting the string using YAML (e.g. "true"
-> True
).
The value
function provides an additional hook for transformation of the value read from the log before it is realized as a column (e.g. list to a comma-separated string).
The root
option indicates which root eval log context the columns select from.
class Column(abc.ABC)
Attributes
name
str-
Column name.
path
JSONPath | None-
Path to column in EvalLog
required
bool-
Is the column required? (error is raised if required columns aren’t found).
default
JsonValue | None-
Default value for column when it is read from the log as
None
. type
Type[ColumnType] | None-
Column type (import will attempt to coerce to the specified type).
Methods
- value
-
Convert extracted value into a column value (defaults to identity function).
def value(self, x: JsonValue) -> JsonValue
x
JsonValue-
Value to convert.
ColumnType
Valid types for columns.
Values of list
and dict
are converted into column values as JSON str
.
= int | float | bool | str | date | time | datetime | None ColumnType: TypeAlias
ColumnError
Error which occurred parsing a column.
@dataclass
class ColumnError
Attributes
column
str-
Target column name.
path
str | None-
Path to select column value.
message
str-
Error message.
ColumnErrors
Dictionary of column errors keyed by log file.
class ColumnErrors(dict[str, list[ColumnError]])