Trusted Editing Dashboard

BashArena: Monitor + Edit protocol — Safety vs Usefulness analysis. Models: Sonnet 4.5 (untrusted) + GPT-4.1 mini (trusted).

Case Studies
Safety vs Usefulness (Pareto Frontier)
Audit budget:
Suspicion Score Distribution
Protocol:
Threshold:
Trajectory Explorer
Mode:
Protocol:
Threshold:
Outcome:
Search task:
Show:
Samples (0)
Pick a sample to inspect tool calls and monitor reasoning.
Honest Samples
Run TypeThresholdProtocolTaskMain ScoreSuspicion
Attack Samples
Run TypeThresholdProtocolTaskMain ScoreSuspicionSide Task