Sandboxes
- k8s Sandbox — UK AISI
- Python package that provides a Kubernetes sandbox environment for Inspect.
- EC2 Sandbox — UK AISI
- Python package that provides a EC2 virtual machine sandbox environment for Inspect.
- Modal Sandbox — Meridian
- Serverless container sandbox for Inspect using Modal’s cloud infrastructure.
- Proxmox Sandbox — UK AISI
- Use virtual machines, running within a Proxmox instance, as Inspect sandboxes.
- Inspect Policy Sandbox — Arnab Mitra
- Sandbox wrapper that allows fine grained control over command execution and file I/O.
Analysis
- Inspect Scout — Meridian
- Transcript analysis for Inspect evaluations.
- Inspect Viz — Meridian
- Interactive data visualization for Inspect evaluations.
- Docent — Transluce
- Tools to summarize, cluster, and search over agent transcripts.
- Lunette — Fulcrum Research
- Platform for understanding and improving agents.
- Inspect WandB — Arcadia
- Integration with Weights and Biases platform.
Frameworks
- Inspect SWE — Meridian
- Software engineering agents (Claude Code and Codex CLI) for Inspect.
- Inspect Cyber — UK AISI
- Python package that streamlines the process of creating agentic cyber evaluations in Inspect.
- Petri — Anthropic
- Framework testing alignment hypotheses end‑to‑end, including automatic scenario generation.
- Control Arena — UK AISI
- Framework for running experiments on AI Control and Monitoring.
Tooling
- Inspect Flow — Meridian
- Workflow orchestration for reprocibly running evals at scale.
- Evaljobs — Hugging Face
- Run evals on Hugging Face GPUs and share results and code on the Hugging Face Hub.
- Inspect VS Code — Meridian
- VS Code extension that assists with developing and debugging Inspect evaluations.
Evals
- Inspect Evals — UK AISI
- Over 1000 LLM evaluations covering safety, coding, reasoning, knowledge, and agent capabilities.
- OpenBench — Groq
- Standardized, reproducible benchmarking for LLMs across 30+ evals.
- Inspect Harbor — Meridian
- Evals from Harbor framework including terminal-bench, replicationbench, and compilebench.