Skip to main content

AI coder sandboxes

A practical recipe for letting one (or many) AI coding agents work in parallel against the real stack — not stubs, not mocks — without ever risking your main branch or stepping on each other.

The trick is the same one that makes E2E testing easy: one isolated worktree + namespace per agent.

Why a fresh sandbox per agent

Letting an AI agent edit code on main (or even on a shared dev branch) eventually breaks something:

  • It deletes the file you were editing
  • It runs a destructive migration against your dev database
  • Two agents fight over the same port or the same git index
  • A failed test poisons state and you can't tell whose run did it

A GetWebstack fork solves all four:

  • Independent Git worktree — its own branch, its own working tree
  • Independent namespace — its own database, its own Redis, its own pods
  • Cookie-scoped URLshttps://api.<project>.<domain> serves every fork; the gws-namespace cookie picks which one a given browser / curl call hits
  • Independent file sync session — one agent's edits never reach another's pod

Prerequisites

  • A working dev environment — see Dev environment
  • An AI coding agent: Claude Code, GitHub Copilot CLI, OpenAI Codex CLI, Cursor, Amp, etc.

1. Create a sandbox per agent

# Pick a stable name per agent / per task
gws fork claude-payments # for Claude working on the payments feature
gws fork copilot-search # for Copilot working on search
gws fork codex-bugfix-1234 # for Codex working on a bug

cd .worktrees/claude-payments
gws up -w claude-payments

Each fork shares the project's URL set; the gws-namespace cookie picks which sandbox a request hits:

https://api.<project>.local.getwebstack.dev   # cookie gws-namespace=<claude-payments deployment id>
https://web.<project>.local.getwebstack.dev # … same

Get the deployment ID with gws status -w claude-payments --json | jq -r .deploymentId, or pick it from the UI at https://<project>.local.getwebstack.dev.

Hand the agent the worktree path and the URL — that's the entire handoff.

Inside the agent's session, run:

/gws-status

The skill calls gws status -w <sandbox>, reports which services are healthy, surfaces the live URLs, and tells the agent where it can hit the API. If anything is wrong, the agent can immediately run /gws-debug to diagnose without you having to translate.

Other useful skills inside the sandbox:

3. Run many agents in parallel

Sandboxes are namespace-isolated, so you can run as many as your machine can spare resources for:

for AGENT in claude copilot codex; do
gws fork "$AGENT-task-1234"
gws up -w "$AGENT-task-1234"
done

gws status --all # see them all at a glance

Each agent works in its own terminal, against its own URLs, with its own database. None of them can interfere with another's run, with main, or with the changes you're making in your own checkout.

4. Validate the agent's work in its own sandbox

Once the agent reports it's done, validate inside its sandbox — not by merging to main and hoping:

# 1. Inspect the diff
cd .worktrees/claude-payments
git --no-pager diff main

# 2. Check the live deployment
gws status -w claude-payments
gws logs -w claude-payments -f api # nothing exploding?

# 3. Run the test suite against the sandbox URL
NS=$(gws status -w claude-payments --json | jq -r .deploymentId)
E2E_BASE_URL="https://web.<project>.local.getwebstack.dev" \
E2E_COOKIE="gws-namespace=$NS" \
npx playwright test

For a stricter prod-parity validation, redeploy with the e2e profile:

gws down -w claude-payments
gws up -w claude-payments --profile e2e
# rerun the suite against prod-like images, no live sync

5. Promote or discard

If the agent's work passes:

cd .worktrees/claude-payments
git push origin claude-payments
gh pr create --fill

If it doesn't:

gws down   -w claude-payments   # stops the sandbox first
gws delete claude-payments # removes the worktree

Either way, your local main and every other agent's sandbox are untouched.

6. Common patterns

One sandbox per task, not per agent

Reuse a sandbox across follow-up prompts within a single task. Create a new one when the task changes — that way "rolling back" is just gws delete instead of git reset --hard on a polluted branch.

Pre-warm a sandbox template

For repetitive agent tasks (e.g. nightly evals) keep a shell script that does:

gws fork "$RUN_ID"
gws up -w "$RUN_ID" --profile e2e
gws exec -w "$RUN_ID" api -- npm run seed:fixtures

Hand $RUN_ID to the agent. Tear down on completion or failure.

Keep humans and agents on equal footing

Humans use exactly the same workflow:

gws fork my-experiment
cd .worktrees/my-experiment
gws up -w my-experiment

There is no separate "agent path" — agents just use the same CLI and the same skills humans do.

See also