AI coder sandboxes
A practical recipe for letting one (or many) AI coding agents work in parallel against the real stack — not stubs, not mocks — without ever risking your main branch or stepping on each other.
The trick is the same one that makes E2E testing easy: one isolated worktree + namespace per agent.
Why a fresh sandbox per agent
Letting an AI agent edit code on main (or even on a shared dev branch) eventually breaks something:
- It deletes the file you were editing
- It runs a destructive migration against your dev database
- Two agents fight over the same port or the same git index
- A failed test poisons state and you can't tell whose run did it
A GetWebstack fork solves all four:
- Independent Git worktree — its own branch, its own working tree
- Independent namespace — its own database, its own Redis, its own pods
- Cookie-scoped URLs —
https://api.<project>.<domain>serves every fork; thegws-namespacecookie picks which one a given browser / curl call hits - Independent file sync session — one agent's edits never reach another's pod
Prerequisites
- A working dev environment — see Dev environment
- An AI coding agent: Claude Code, GitHub Copilot CLI, OpenAI Codex CLI, Cursor, Amp, etc.
1. Create a sandbox per agent
# Pick a stable name per agent / per task
gws fork claude-payments # for Claude working on the payments feature
gws fork copilot-search # for Copilot working on search
gws fork codex-bugfix-1234 # for Codex working on a bug
cd .worktrees/claude-payments
gws up -w claude-payments
Each fork shares the project's URL set; the gws-namespace cookie picks which sandbox a request hits:
https://api.<project>.local.getwebstack.dev # cookie gws-namespace=<claude-payments deployment id>
https://web.<project>.local.getwebstack.dev # … same
Get the deployment ID with gws status -w claude-payments --json | jq -r .deploymentId, or pick it from the UI at https://<project>.local.getwebstack.dev.
Hand the agent the worktree path and the URL — that's the entire handoff.
2. Let the agent set itself up (recommended)
Inside the agent's session, run:
/gws-status
The skill calls gws status -w <sandbox>, reports which services are healthy, surfaces the live URLs, and tells the agent where it can hit the API. If anything is wrong, the agent can immediately run /gws-debug to diagnose without you having to translate.
Other useful skills inside the sandbox:
/gws-up— bring it up if it isn't running/gws-down— pause without deleting/gws-debug— root-cause a failing pod
3. Run many agents in parallel
Sandboxes are namespace-isolated, so you can run as many as your machine can spare resources for:
for AGENT in claude copilot codex; do
gws fork "$AGENT-task-1234"
gws up -w "$AGENT-task-1234"
done
gws status --all # see them all at a glance
Each agent works in its own terminal, against its own URLs, with its own database. None of them can interfere with another's run, with main, or with the changes you're making in your own checkout.
4. Validate the agent's work in its own sandbox
Once the agent reports it's done, validate inside its sandbox — not by merging to main and hoping:
# 1. Inspect the diff
cd .worktrees/claude-payments
git --no-pager diff main
# 2. Check the live deployment
gws status -w claude-payments
gws logs -w claude-payments -f api # nothing exploding?
# 3. Run the test suite against the sandbox URL
NS=$(gws status -w claude-payments --json | jq -r .deploymentId)
E2E_BASE_URL="https://web.<project>.local.getwebstack.dev" \
E2E_COOKIE="gws-namespace=$NS" \
npx playwright test
For a stricter prod-parity validation, redeploy with the e2e profile:
gws down -w claude-payments
gws up -w claude-payments --profile e2e
# rerun the suite against prod-like images, no live sync
5. Promote or discard
If the agent's work passes:
cd .worktrees/claude-payments
git push origin claude-payments
gh pr create --fill
If it doesn't:
gws down -w claude-payments # stops the sandbox first
gws delete claude-payments # removes the worktree
Either way, your local main and every other agent's sandbox are untouched.
6. Common patterns
One sandbox per task, not per agent
Reuse a sandbox across follow-up prompts within a single task. Create a new one when the task changes — that way "rolling back" is just gws delete instead of git reset --hard on a polluted branch.
Pre-warm a sandbox template
For repetitive agent tasks (e.g. nightly evals) keep a shell script that does:
gws fork "$RUN_ID"
gws up -w "$RUN_ID" --profile e2e
gws exec -w "$RUN_ID" api -- npm run seed:fixtures
Hand $RUN_ID to the agent. Tear down on completion or failure.
Keep humans and agents on equal footing
Humans use exactly the same workflow:
gws fork my-experiment
cd .worktrees/my-experiment
gws up -w my-experiment
There is no separate "agent path" — agents just use the same CLI and the same skills humans do.
See also
- Dev environment — the foundation
- End-to-end testing — the same isolation pattern, applied to test suites
- AI Skills overview — every
/gws-*skill an agent can call gws fork,gws up,gws status,gws delete