Skip to content

Architecture

How Yak works under the hood. This page exists for the person who wants to understand the system before trusting it — or for someone debugging unexpected behavior who needs a mental model.

Yak is an autonomous coding agent for papercuts, a line-by-line PR reviewer, and a per-branch preview server. One shared sandbox fleet powers all three workflows.

┌────────────────────────────────────────────┐
│ Workflows (surfaces) │
│ │
│ Coding Agent PR Review Branch │
│ (papercuts) (line-level Deployments │
│ comments) │
└────────────────────────────────────────────┘
┌────────────────────────────────────────────┐
│ Substrate (shared) │
│ │
│ Repos ⟷ GitHub App ⟷ Ingress ⟷ Auth │
│ Sandboxes (Incus + ZFS CoW + per-repo │
│ templates from SetupYakJob) │
│ Jobs / Queues / Scheduler │
│ Dashboard / Artifacts / Logs / Costs │
└────────────────────────────────────────────┘

The substrate is the load-bearing part. Workflows are the user-facing surfaces. New workflows in the future plug into the same substrate without reshaping the base.

Every fix task goes through the same pipeline, regardless of where it came from:

Fix task flow — entry, branch creation, the parallel fork into video rendering and CI, and PR convergence

After the agent finishes, work splits into two parallel streams:

  • Stream B (push → CI → PR) pushes the branch, waits for CI, and on green creates the PR.
  • Stream A (video pipeline) renders the polished Remotion walkthrough from the raw webm artifact and patches the PR body once the mp4 is ready.

Both streams converge on the same PR — whichever finishes first writes its piece, the other fills in later. Tasks never block the queue while CI runs: the worker finishes a job, CI runs asynchronously, and a webhook triggers the next step.

The diagram source is fix-task-flow.dot (Graphviz). Regenerate with dot -Tpng docs/fix-task-flow.dot -o docs/fix-task-flow.png -Gdpi=150.

Yak uses two distinct AI layers with different models, different frameworks, and different responsibilities.

LayerFrameworkModelsPurpose
Routing & AnalysisLaravel AI (Anthropic API)Haiku, SonnetWebhook processing, request routing, communication with users in Slack/Linear, Sentry triage, task intake
ImplementationClaude Code CLI (claude -p)OpusCode changes, testing, committing, PR creation, research, ambiguity assessment

The routing layer is lightweight — classify the request, detect the repo, format the prompt, post results back to the source. It runs on the Anthropic API via Laravel AI using your ANTHROPIC_API_KEY.

TaskModelWhy
Parse Slack message / webhookHaikuFast, cheap, structured extraction
Detect repo from messageHaikuPattern matching against known slugs; falls back to natural-language routing using repo descriptions when no explicit mention is found (RepoRoutingAgent)
Summarize Sentry stacktraceSonnetNeeds actual code comprehension
Assemble task context from Linear/SentrySonnetJudgment about what context matters
Format and post results back to sourceHaikuTemplated output

Claude Code does the heavy lifting: reading files, assessing ambiguity with full codebase + MCP context, making changes, running tests, committing. It runs headlessly via claude -p with --dangerously-skip-permissions — no tool approval prompts, fully autonomous.

Claude Code is always Opus. Opus produces better first-attempt results, which means fewer retries and less total work than starting with Sonnet and escalating.

Implementation runs on a Claude Max subscription, not the API key. The subscription covers Claude Code usage; the API key covers the routing layer. These are separate auth mechanisms — see Setup → Log In To Claude Code for how each is configured.

Channel drivers are task-workflow scoped. PR reviews and branch deployments both operate directly on the GitHub integration rather than going through the channel driver interfaces.

Every external integration is a pluggable channel. Channels are enabled by the presence of credentials — no credentials, no channel. The app detects which channels are active at boot and registers only those routes, webhooks, and MCP servers.

┌─────────────────────────────────────────────────────┐
│ Channel Drivers │
│ │
│ Input Drivers (how tasks arrive): │
│ ┌────────┐ ┌────────┐ ┌────────┐ ┌──────────────┐ │
│ │ Slack │ │ Linear │ │ Sentry │ │ Manual CLI │ │
│ │ opt. │ │ opt. │ │ opt. │ │ always avail │ │
│ └────────┘ └────────┘ └────────┘ └──────────────┘ │
│ │
│ CI Drivers (how build results return): │
│ ┌─────────────────┐ ┌────────┐ │
│ │ GitHub Actions │ │ Drone │ (per-repo setting) │
│ └─────────────────┘ └────────┘ │
│ │
│ Notification Drivers (where results are posted): │
│ ┌────────┐ ┌────────┐ ┌────────┐ │
│ │ Slack │ │ Linear │ │ GitHub │ (follows source) │
│ │ opt. │ │ opt. │ │ PRs │ │
│ └────────┘ └────────┘ └────────┘ │
└─────────────────────────────────────────────────────┘

Each channel implements one or more of these contracts, defined in app/Contracts/:

InputDriver // Parse incoming webhook/event → normalized task description
CIDriver // Parse build result webhook → pass/fail with failure output
NotificationDriver // Post status updates and results to the source

A single channel can fill multiple roles. GitHub is both a CI driver (via Actions) and a notification driver (via PR bodies). Slack is both an input driver and a notification driver.

The rule is respond where you were asked. Every task has a source column identifying its origin. Notifications always route back to that source; if the source channel is disabled (for historical tasks after removing a channel), notifications fall back to a PR comment.

See the Channels page for the full list of channels and their roles.

Task status is a fat enum (artisan-build/fat-enums) with transitions enforced at the model level. Setting $task->status = TaskStatus::AwaitingCi on a task that is currently Pending throws InvalidStateTransition — the enum enforces the rules, not the job code.

pending → running → awaiting_ci → success (fix tasks)
pending → running → success (research / setup tasks)

With these branches:

  • running → awaiting_clarification → running (Slack fix tasks only, 3-day TTL)
  • awaiting_ci → retrying → awaiting_ci (at most one retry, so at most two attempts)
  • *→ failed at any point where Claude errors, budget is exceeded, or retries run out

Three terminal states: success, failed, expired.

┌─────────────────────────────────────────┐
│ Fix Task (primary path) │
│ │
pending ──→ running ────┼──→ awaiting_ci ──→ success │
│ │ │ │
│ │ ├──→ retrying ──→ awaiting_ci │
│ │ │ │ │
│ │ └──→ failed ←────────┘ │
│ │ │
│ ├──→ awaiting_clarification ──→ running │
│ │ │ │
│ │ └──→ expired │
│ │ │
│ └──→ failed │
│ │
├──→ success (research / setup — no CI) │
│ │
└──→ failed (error during any mode) │
└─────────────────────────────────────────┘
FromToTrigger
PendingRunningJob picked up by queue worker
RunningAwaitingCiClaude completed, branch pushed (mode = fix)
RunningAwaitingClarificationClaude returned clarification JSON (source = slack)
RunningSuccessResearch or setup task completed
RunningFailedClaude errored, budget exceeded, scope exceeded
AwaitingClarificationRunningUser replied in Slack thread, session resumed
AwaitingClarificationExpiredclarification_expires_at passed (3-day TTL)
AwaitingCiSuccessCI green, PR created
AwaitingCiRetryingCI red, attempts < max_attempts
AwaitingCiFailedCI red, attempts >= max_attempts
RetryingAwaitingCiRetry completed, branch force-pushed
RetryingFailedClaude errored on retry

Every Claude Code task runs in an isolated Incus system container. Each container has its own Docker daemon, network namespace, and filesystem — cloned from a ZFS copy-on-write snapshot in under 3 seconds.

Host (Hetzner Dedicated Server)
├─ Yak App (Docker container)
│ ├─ Web (nginx + php-fpm)
│ ├─ Queue workers (4x yak-claude, 3x default)
│ └─ Scheduler
├─ MariaDB (Docker container, yak-internal network)
├─ Incus (system container manager, ZFS-backed)
│ ├─ yak-tpl-{repo}/ready ← snapshot per repo (setup result)
│ │
│ ├─ task-42 ← clone of snapshot (CoW, isolated)
│ │ └─ Docker daemon, compose services, Claude Code
│ │
│ └─ task-43 ← another clone (fully independent)
│ └─ Docker daemon, compose services, Claude Code

Three guarantees make sandboxed execution safe at scale:

  • Network isolation — sandbox containers are on a separate bridge (yak-sandbox) with firewall rules blocking access to the yak app and MariaDB. The agent cannot reach the yak database, period.
  • Port isolation — each container has its own network namespace. Port 8000 in container A doesn’t conflict with port 8000 in container B.
  • Filesystem isolation — ZFS copy-on-write means each container has its own writable filesystem. Changes in one container are invisible to others, so concurrent tasks on the same repo never collide.
  1. SetupSetupYakJob creates a sandbox from the base template (yak-base), clones the repo, runs Claude’s setup (npm install, composer install, docker-compose up, etc.), then snapshots the result as yak-tpl-{repo}/ready.
  2. Task executionRunYakJob clones from the repo snapshot (instant, ~2s). The agent works in a pristine copy of the fully-prepared environment.
  3. Cleanup — after the task completes (success or failure), the sandbox is destroyed. ZFS reclaims the space immediately.

Repo dev environments using Docker Compose work natively inside Incus containers. security.nesting=true gives each container its own Docker daemon. There’s no shared Docker socket, no port override files, no DinD hacks.

Private registry auth follows the same push-then-start pattern as Claude config and MCP config: Ansible renders ~/.docker/config.json on the host from the docker_registries vault var, the Yak container bind-mounts it read-only, and IncusSandboxManager pushes the file into each fresh sandbox at /home/yak/.docker/config.json before docker pull ever runs. Repos that only need public images leave the vault var unset and the push is skipped entirely.

Two queues separate Claude Code work from everything else:

QueueConcurrencyTimeoutJobs
yak-claude4600sRunYakJob, RetryYakJob, ResearchYakJob, SetupYakJob, ClarificationReplyJob
default330sProcessCIResultJob, webhook handlers, PR creation, notifications, cleanup

The split exists to prevent a common failure mode: Task A’s CI passes, but Task A’s PR creation blocks for 10 minutes because Task B is mid-Opus on yak-claude. Putting coordination work (webhook processing, PR creation) on the default queue keeps it responsive even when Claude Code is busy.

With Incus sandbox isolation, Claude Code tasks run concurrently (4 workers by default). Each task gets its own isolated container — no shared ports, no shared filesystem, no shared Docker daemon. Throughput scales with available RAM (each sandbox uses ~4-8GB).

  • RunYakJob — the initial Claude Code session. Yak creates the branch (yak/{external_id}), then invokes Claude Code which writes code and commits locally. After Claude finishes, Yak pushes the branch and transitions the task to awaiting_ci. Claude Code never pushes or creates PRs — the system prompt explicitly forbids remote git operations.
  • ClarificationReplyJob — runs when a user replies to a Slack clarification. Resumes the original Claude session with --resume $session_id and the user’s chosen option. Claude already has full codebase context from the assessment phase — no ramp-up.
  • ProcessCIResultJob — runs when a CI webhook arrives. On green, it collects artifacts and Yak creates the PR via the GitHub App API, then notifies the source. On red, it either dispatches RetryYakJob (first failure) or marks the task failed (second failure).
  • RetryYakJob — resumes the original Claude session with CI failure output and runs a second attempt on the existing branch. Yak force-pushes the result.
  • ResearchYakJob — for research mode tasks. Read-only; no branch, no CI. Claude generates a standalone HTML findings page saved to .yak-artifacts/research.html.
  • SetupYakJob — the one-time dev environment setup task for a new repo. See Repositories → The Setup Task.
  • RunYakReviewJob — the PR review path. Runs Claude in the sandbox with a read-only prompt scoped to a PR’s diff, then posts the parsed findings as a GitHub review via the installation token. See PR Review.
  • PollPullRequestReactionsJob — scheduled hourly. Polls GitHub for 👍/👎 reactions on Yak-authored review comments within a configurable window and denormalizes counts onto pr_review_comments.
  • EnsureDailyBudget — checks the daily_costs table before Claude Code invocations. If today’s routing-layer cost exceeds daily_budget_usd, the job fails gracefully. This prevents runaway alert storms from blowing the budget.

When a retry or clarification reply is needed, Yak uses claude -p --resume $session_id to continue the original Claude session. Claude retains its full context — files it read during assessment, approaches it considered, what it already tried.

This is the single biggest cost optimization in Yak. A fresh session starting from zero would re-read the codebase, re-check Sentry, re-analyze the stacktrace. Resuming skips all of that and jumps directly to the new prompt (the CI failure, or the user’s chosen clarification option).

session_id is stored on the task row and used for:

  • Retries after a first CI failure
  • Clarification replies when a Slack user picks an option
  • Post-hoc debugging — you can resume a completed task’s session manually if needed

Five tables, deliberately minimal. MariaDB is the backing store, running as a separate Docker container with its own persistent volume.

The primary record. Every incoming event creates a task row. Key columns:

ColumnPurpose
sourcesentry, flaky-test, linear, slack, manual
repoSlug joining to repositories.slug
external_idSource-side ID (GEO-1234, SENTRY-98765). Unique with repo.
modefix, research, setup
statusFat enum: pending, running, awaiting_clarification, awaiting_ci, retrying, success, failed, expired
branch_nameyak/{external_id} once created
session_idClaude session ID for --resume
clarification_optionsJSON array of option strings (Slack only)
pr_url, pr_merged_at, pr_closed_atOutcome tracking
cost_usd, duration_ms, num_turnsMetrics

UNIQUE(external_id, repo) enforces deduplication — re-opening the same Sentry issue won’t create a second task.

Append-only event log that powers the task detail page’s timeline. Each row is level (info/warning/error) + message + optional JSON metadata. Events are written at key lifecycle points: task created, picked up, assessment complete, fix pushed, CI result, PR created, task completed.

Rows reference screenshots, videos, and research HTML pages stored on disk. Served at /artifacts/{task}/{filename} via signed URL (for PR embedding) or authenticated request (for dashboard viewing).

One row per configured repo. Minimal schema — slug, name, path, default branch, CI system, Sentry mapping, setup status, notes. Everything else is auto-detected at task time from README.md and CLAUDE.md.

Primary key is date. Tracks routing-layer API costs for budget enforcement. Updated after each task completes. Read by the EnsureDailyBudget middleware before any Claude Code invocation.

One row per open PR on an opted-in repository. Key columns:

ColumnPurpose
statusState machine: pending, provisioning, running, hibernated, failed, destroyed
template_versionPinned to the current_template_version of the repo at creation time. The deployment clones from that snapshot version for its whole lifetime.
last_accessed_atUpdated on every inbound request. Used as the idle signal for hibernation (15 minutes) and eviction ordering.
public_share_token_hashSHA-256 hash of the raw share token. Raw token is shown once at mint time and never persisted. Null when no share link is active.

Yak reviews pull requests in a dedicated workflow that runs alongside the coding agent. The review agent runs in a sandbox fork of the same per-repo template, reads the PR diff, and leaves line-level comments (suggestion blocks where the fix is obvious, prose when it needs explanation).

The review workflow reuses the substrate: same GitHub App, same sandbox infrastructure, same dashboard surface. It has its own state machine separate from the task state machine. Reviewer output is surfaced as native GitHub PR review comments.

For the full flow, see docs/pr-review.md.

Every open PR on an opted-in repo gets a live preview URL at <repo>-<branch>.<hostname>, wired up automatically from the pull_request.opened webhook. Previews are:

  • OAuth-gated by default, with optional time-boxed public-share tokens for external reviewers
  • Hibernated when idle after 15 minutes, resumed on the next request (first-hit latency 5 to 15 seconds)
  • Destroyed when the PR closes, merges, or is deleted, or after 30 days of inactivity

Each preview runs in its own Incus container cloned from the same per-repo template snapshot that powers task sandboxes. A reverse proxy (Caddy in the default install) handles wildcard TLS, OAuth enforcement via forward-auth, and upstream resolution per request.

Preview state is mirrored to GitHub’s native Deployments API, so the PR UI shows a “View deployment” button automatically.

For the end-to-end user guide, see docs/branch-deployments.md.

The safety guarantees are deliberate design choices, not afterthoughts.

--dangerously-skip-permissions Is Always On

Section titled “--dangerously-skip-permissions Is Always On”

Claude Code runs with --dangerously-skip-permissions on every invocation. No tool approval prompts, no human in the loop during execution. This is the only way unattended operation works at scale.

The safety boundary is the sandbox, not permission dialogs:

  • Dedicated server. Completely separate from production. No VPN, no Tailscale, no shared network.
  • No production access. No production databases, no customer data, no deployment pipelines.
  • Incus sandbox isolation. Each task runs in its own Incus system container with:
    • Own Docker daemon — no access to the host’s Docker socket.
    • Own network namespace — firewall rules block access to the yak app and MariaDB.
    • Own filesystem — ZFS copy-on-write from a snapshot. Changes are invisible to other tasks.
    • Own process tree — no visibility into host processes.
  • Short-lived credentials. GitHub App tokens are injected per-task and are short-lived. Claude Max auth tokens are copied read-only from the host.
  • Automatic cleanup. Sandbox containers are destroyed after each task. A cron job catches any that were missed.

Claude can do anything it wants inside that sandbox. The walls are real — Incus namespace isolation, not just user separation within a shared container.

Yak creates PRs. Humans merge them. Always. The GitHub App should NOT be in your branch protection bypass list and should NOT have permission to approve reviews.

This is non-negotiable by design. If you want to automate merging, don’t use Yak.

At most two attempts per task. Retries use Opus (same model as the initial attempt). If two attempts both fail, the task is marked failed and a human takes over.

The full test suite runs on real CI, not on self-reported output from Claude. Claude runs relevant tests locally to catch obvious issues before pushing, but the authoritative check is CI.

Three layers:

  • Per-task budget--max-budget-usd 5.00 on every Claude CLI invocation, as a runaway guardrail. Implementation cost is covered by the subscription; this limit exists for safety.
  • Daily budgetdaily_budget_usd (default $50) covers routing-layer API costs. Enforced by the EnsureDailyBudget middleware before any Claude Code job starts.
  • DeduplicationUNIQUE(external_id, repo) prevents repeat work on the same issue.

PRs larger than large_change_threshold (default 200 LOC) get the yak-large-change label. Reviewers can use this to route reviews to more senior eyes or reject outright.

Google OAuth with a required domain allowlist (GOOGLE_OAUTH_ALLOWED_DOMAINS). There is no public dashboard. There are no roles — every team member behind the allowlist sees everything, including debug logs and session IDs, but nothing is reachable without signing in.

Artifacts embedded in GitHub PRs (screenshots, videos) use HMAC-SHA256 signed URLs with a 7-day expiry. After expiry, artifacts are still accessible through the authenticated dashboard.

  • Not a merge bot. See above — no merge authority, no bypass.
  • Not horizontally scaled. Four concurrent workers on one server. The architecture supports future scaling to multiple hosts but doesn’t need it.
  • Not a long-running agent. Every task is one-shot. No back-and-forth refinement. If a task needs iterative discussion, use a normal Claude session.
  • Not a frontend framework. Dashboard is Livewire (server-rendered) with Livewire polling for live updates. No SPA, no websockets.
  • Not Kubernetes-anything. Two Docker containers (app + MariaDB) + Incus for sandboxed task execution on a dedicated server. Laravel’s database queue driver. Boring stack.
  • Not a production deploy platform. Previews are preview environments only. Merging a PR does not deploy it anywhere; the existing production deploy pipeline remains the source of truth.