extensions_content

Sandboxes

k8s Sandbox — UK AISI: Python package that provides a Kubernetes sandbox environment for Inspect.
EC2 Sandbox — UK AISI: Python package that provides a EC2 virtual machine sandbox environment for Inspect.
Modal Sandbox — Meridian: Serverless container sandbox for Inspect using Modal’s cloud infrastructure.
Proxmox Sandbox — UK AISI: Use virtual machines, running within a Proxmox instance, as Inspect sandboxes.
Inspect Policy Sandbox — Arnab Mitra: Sandbox wrapper that allows fine grained control over command execution and file I/O.

Inspect Scout — Meridian: Transcript analysis for Inspect evaluations.
Inspect Viz — Meridian: Interactive data visualization for Inspect evaluations.
Docent — Transluce: Tools to summarize, cluster, and search over agent transcripts.
Lunette — Fulcrum Research: Platform for understanding and improving agents.
Inspect WandB — Arcadia: Integration with Weights and Biases platform.

Inspect SWE — Meridian: Software engineering agents (Claude Code and Codex CLI) for Inspect.
Inspect Cyber — UK AISI: Python package that streamlines the process of creating agentic cyber evaluations in Inspect.
Petri — Anthropic: Framework testing alignment hypotheses end‑to‑end, including automatic scenario generation.
Control Arena — UK AISI: Framework for running experiments on AI Control and Monitoring.

Inspect Flow — Meridian: Workflow orchestration for reprocibly running evals at scale.
Evaljobs — Hugging Face: Run evals on Hugging Face GPUs and share results and code on the Hugging Face Hub.
Inspect VS Code — Meridian: VS Code extension that assists with developing and debugging Inspect evaluations.

Inspect Evals — UK AISI: Over 1000 LLM evaluations covering safety, coding, reasoning, knowledge, and agent capabilities.
OpenBench — Groq: Standardized, reproducible benchmarking for LLMs across 30+ evals.
Inspect Harbor — Meridian: Evals from Harbor framework including terminal-bench, replicationbench, and compilebench.