Inspect Extensions
Sandboxes
- k8s Sandbox — UK AISI
- Python package that provides a Kubernetes sandbox environment for Inspect.
- EC2 Sandbox — UK AISI
- Python package that provides a EC2 virtual machine sandbox environment for Inspect.
- Modal Sandbox — Meridian
- Serverless container sandbox for Inspect using Modal’s cloud infrastructure.
- Proxmox Sandbox — UK AISI
- Use virtual machines, running within a Proxmox instance, as Inspect sandboxes.
- Inspect Policy Sandbox — Arnab Mitra
- Sandbox wrapper that allows fine grained control over command execution and file I/O.
- Inspect Vagrant Sandbox — Jason Gwartz
- Use any virtual machine hypervisor supported by Hashicorp Vagrant as Inspect sandboxes.
- Podman Sandbox — Vector Institute and National Research Council of Canada
- Podman-backed sandbox environment for Inspect, enabling containerized tool calls without Docker.
Analysis
- Inspect Scout — Meridian
- Transcript analysis for Inspect evaluations.
- Inspect Viz — Meridian
- Interactive data visualization for Inspect evaluations.
- Docent — Transluce
- Tools to summarize, cluster, and search over agent transcripts.
- Lunette — Fulcrum Research
- Platform for understanding and improving agents.
- Inspect WandB — Arcadia
- Integration with Weights and Biases platform.
- Inspct MLFlow — Debu Sinha
- Experiment tracking, execution tracing, and artifact logging for Inspect AI evaluations.
- CJE — CIMO Labs
- Calibrated judge evaluation — calibrate model-graded scorer accuracy using causal inference with optional oracle labels.
Frameworks
- Inspect SWE — Meridian
- Software engineering agents (Claude Code and Codex CLI) for Inspect.
- Inspect Cyber — UK AISI
- Python package that streamlines the process of creating agentic cyber evaluations in Inspect.
- Petri — Anthropic
- Framework testing alignment hypotheses end‑to‑end, including automatic scenario generation.
- Control Arena — UK AISI
- Framework for running experiments on AI Control and Monitoring.
Tooling
- Inspect Flow — Meridian
- Workflow orchestration for reprocibly running evals at scale.
- Evaljobs — Hugging Face
- Run evals on Hugging Face GPUs and share results and code on the Hugging Face Hub.
- Inspect VS Code — Meridian
- VS Code extension that assists with developing and debugging Inspect evaluations.
- Inspect Costs Plugin — Jason Gwartz
- Automatically load pricing data for models under test.
Evals
- Inspect Evals — UK AISI
- Over 1000 LLM evaluations covering safety, coding, reasoning, knowledge, and agent capabilities.
- OpenBench — Groq
- Standardized, reproducible benchmarking for LLMs across 30+ evals.
- Inspect Harbor — Meridian
- Evals from Harbor framework including terminal-bench, replicationbench, and compilebench.