AI coder sandboxes

A practical recipe for letting one (or many) AI coding agents work in parallel against the real stack — not stubs, not mocks — without ever risking your main branch or stepping on each other.

The trick is the same one that makes E2E testing easy: one isolated worktree + namespace per agent.

Why a fresh sandbox per agent

Letting an AI agent edit code on main (or even on a shared dev branch) eventually breaks something:

It deletes the file you were editing
It runs a destructive migration against your dev database
Two agents fight over the same port or the same git index
A failed test poisons state and you can't tell whose run did it

A GetWebstack fork solves all four:

Independent Git worktree — its own branch, its own working tree
Independent namespace — its own database, its own Redis, its own pods
Cookie-scoped URLs — https://api.<project>.<domain> serves every fork; the gws-namespace cookie picks which one a given browser / curl call hits
Independent file sync session — one agent's edits never reach another's pod

Prerequisites

A working dev environment — see Dev environment
An AI coding agent: Claude Code, GitHub Copilot CLI, OpenAI Codex CLI, Cursor, Amp, etc.

1. Create a sandbox per agent

# Pick a stable name per agent / per task
gws fork claude-payments       # for Claude working on the payments feature
gws fork copilot-search        # for Copilot working on search
gws fork codex-bugfix-1234     # for Codex working on a bug

cd .worktrees/claude-payments
gws up -w claude-payments

Each fork shares the project's URL set; the gws-namespace cookie picks which sandbox a request hits:

https://api.<project>.<org>.getwebstack.dev   # cookie gws-namespace=<claude-payments deployment id>
https://web.<project>.<org>.getwebstack.dev   # … same

Get the deployment ID with gws status -w claude-payments --json | jq -r .deploymentId, or pick it from the UI at https://<project>.<org>.getwebstack.dev.

Hand the agent the worktree path and the URL — that's the entire handoff.

2. Let the agent set itself up (recommended)

Inside the agent's session, run:

/gws-status

The skill calls gws status -w <sandbox>, reports which services are healthy, surfaces the live URLs, and tells the agent where it can hit the API. If anything is wrong, the agent can immediately run /gws-debug to diagnose without you having to translate.

Other useful skills inside the sandbox:

/gws-up — bring it up if it isn't running
/gws-down — pause without deleting
/gws-debug — root-cause a failing pod

3. Run many agents in parallel

Sandboxes are namespace-isolated, so you can run as many as your machine can spare resources for:

for AGENT in claude copilot codex; do
  gws fork "$AGENT-task-1234"
  gws up  -w "$AGENT-task-1234"
done

gws status --all              # see them all at a glance

Each agent works in its own terminal, against its own URLs, with its own database. None of them can interfere with another's run, with main, or with the changes you're making in your own checkout.

Worktrees in action:

project/
├── main/                           # Your main codebase
├── .worktrees/
│   ├── feature-payments/           # Agent 1 works here (isolated)
│   ├── feature-auth/               # Agent 2 works here (isolated)
│   └── bug-fix-login/              # Agent 3 works here (isolated)

Each worktree gets its own:

Directory with full code copy
Git branch
Isolated namespace (myapp-3f9a2b1c)
Isolated databases and services
Cookie-scoped routing (same hostnames as main; the gws-namespace cookie picks the fork)

4. Validate the agent's work in its own sandbox

Once the agent reports it's done, validate inside its sandbox — not by merging to main and hoping:

# 1. Inspect the diff
cd .worktrees/claude-payments
git --no-pager diff main

# 2. Check the live deployment
gws status -w claude-payments
gws logs   -w claude-payments -f api      # nothing exploding?

# 3. Run the test suite against the sandbox URL
NS=$(gws status -w claude-payments --json | jq -r .deploymentId)
E2E_BASE_URL="https://web.<project>.<org>.getwebstack.dev" \
E2E_COOKIE="gws-namespace=$NS" \
  npx playwright test

For a stricter prod-parity validation, redeploy with the e2e profile:

gws down -w claude-payments
gws up   -w claude-payments --profile e2e
# rerun the suite against prod-like images, no live sync

5. Promote or discard

If the agent's work passes:

cd .worktrees/claude-payments
git push origin claude-payments
gh pr create --fill

If it doesn't:

gws down   -w claude-payments   # stops the sandbox first
gws delete claude-payments      # removes the worktree

Either way, your local main and every other agent's sandbox are untouched.

6. Common patterns

One sandbox per task, not per agent

Reuse a sandbox across follow-up prompts within a single task. Create a new one when the task changes — that way "rolling back" is just gws delete instead of git reset --hard on a polluted branch.

Pre-warm a sandbox template

For repetitive agent tasks (e.g. nightly evals) keep a shell script that does:

gws fork "$RUN_ID"
gws up -w "$RUN_ID" --profile e2e
gws exec -w "$RUN_ID" api -- npm run seed:fixtures

Hand $RUN_ID to the agent. Tear down on completion or failure.

Keep humans and agents on equal footing

Humans use exactly the same workflow:

gws fork my-experiment
cd .worktrees/my-experiment
gws up -w my-experiment

There is no separate "agent path" — agents just use the same CLI and the same skills humans do.

Why a fresh sandbox per agent​

Prerequisites​

1. Create a sandbox per agent​

2. Let the agent set itself up (recommended)​

3. Run many agents in parallel​

4. Validate the agent's work in its own sandbox​

5. Promote or discard​

6. Common patterns​

One sandbox per task, not per agent​

Pre-warm a sandbox template​

Keep humans and agents on equal footing​

See also​