BashArena: Monitor + Edit protocol — Safety vs Usefulness analysis. Models: Sonnet 4.5 (untrusted) + GPT-4.1 mini (trusted).