BECKY
GitHub
Protocol v0.92-SENTIENT Active

Becky

The multi-agent coding OS that makes AI agents ship reliable software. Deploy autonomous squads that architect, write, and audit production-ready systems.

becky --spawn-squad --project="neural-arch-01"
>> Initializing Agent Squad... [DONE]
>> Spawning Fury-Agent (PM).............. ONLINE
>> Spawning Strange-Agent (Arch)........ ONLINE
>> Spawning Stark-Agent (Dev)........... ONLINE
>> Spawning Widow-Agent (QA)............ ONLINE
>> Loading Wiki Memory (142 entries)..... SYNCED
>> Rules Engine active (10 rules)....... ENFORCING
#greenfield-mode
#learning-loop
#anti-inflation
>> Awaiting mission briefing
INPUT AGENTS OUTPUT
NEURAL_ARCHITECTURE

What is Becky?

Becky is a multi-agent coding OS that combines role-based pipelines, an agent-maintained knowledge base, closed learning loops, and dual-runtime coordination between Claude and Codex.

Instead of one monolithic AI assistant that forgets everything between sessions, Becky deploys a squad of 7 specialized agents governed by 10 immutable rules. Every mistake becomes a rule. Every rule compounds intelligence. The system literally gets smarter with each project.

7 Agents 10 Rules Wiki Memory Learning Loop
SYSTEMIC_FAILURE_DETECTED

The problem Becky solves

Legacy AI coding architectures collapse under agentic hallucinations and fragmented memory. These are the six failure modes Becky was built to eliminate.

face

Self-Grading Agents

Agents that mark their own homework, leading to systemic bias and recursive logic failures that human developers cannot audit.

ERR_CODE: BIAS_LOOP
psychology

Context Amnesia

Loss of critical project context during long development cycles, forcing agents to hallucinate architecture that does not exist.

ERR_CODE: MEM_DECAY
call_split

Dual-Runtime Drift

Desynchronization between local and production environments causing "works on my agent" bugs that haunt deployment pipelines.

ERR_CODE: ENV_DIVERGE
waterfall_chart

Silent Failure Cascades

Hidden errors that compound across multi-agent systems until the entire codebase collapses under invisible technical debt.

ERR_CODE: SILENT_KILL
heart_broken

Completion Inflation

Fake progress reports hiding incomplete logic. 100% completion metrics that mask non-functional or unoptimized code blocks.

ERR_CODE: GHOST_LOGIC
sync_problem

No Learning Loop

Systems that repeat the same errors without memory or improvement. Every crash is a new experience instead of a learned lesson.

ERR_CODE: STATIC_INTELLIGENCE

The Seven Components

Seven immutable core protocols that form the neural architecture of every Becky-powered project.

01
terminal
Rules

Immutable Protocols

10 immutable rules that govern all agent behavior. Mistakes become rules. Rules compound intelligence across every project.

02
psychology
Agents

Neural Entities

7 specialized agents with distinct roles, personas, and chain-of-thought prompts. Each owns a domain and produces typed artifacts.

03
auto_stories
Wiki

Agent-Maintained Knowledge Base

Persistent wiki pages per project domain. Agents read before acting and write after learning. Knowledge survives session boundaries.

04
dynamic_feed
Modes

Greenfield / Brownfield

Greenfield mode runs the full role-based pipeline from brief to deploy. Brownfield mode indexes existing code and slots agents into the gaps.

05
history_edu
Memory

Persistent Context

MEMORY.md auto-captures decisions, blockers, and architecture choices. Every session starts with full context, zero cold-start amnesia.

06
cached
Learning Loop

Closed Feedback

Build-Verify-Learn-Encode. Every incident creates a retrospective. Every retro distills into a rule. Rules prevent recurrence.

07
hub
Bridge

Dual-Runtime Coordination

Claude handles architecture, planning, and complex reasoning. Codex handles bulk implementation, testing, and file operations. The bridge keeps them synchronized through shared wiki, rules, and artifact contracts.

The Seven Agents

Each agent owns a domain, produces typed artifacts, and is governed by the same 10 rules.

description

Fury

AGENT::FURY

Owns the PRD. Runs elicitation, writes acceptance criteria, manages the product backlog and story prioritization.

architecture

Strange

AGENT::STRANGE

Designs system architecture, ADRs, data models, and API contracts. Validates technical feasibility against the PRD.

palette

Shuri

AGENT::SHURI

Creates UX specs, interaction patterns, design system tokens, and validates accessibility. Works from PRD, not assumptions.

code

Stark

AGENT::STARK

Implements stories, writes migrations, builds APIs and UI. Follows architecture contracts and acceptance criteria exactly.

bug_report

Widow

AGENT::WIDOW

Writes test plans, runs Playwright e2e tests, validates acceptance criteria. Tests like a human -- clicks buttons, fills forms.

verified

Heimdall

AGENT::HEIMDALL

Independent verification agent. Never trusts self-reported completion. Validates DONE/VERIFIED/AUDITED against runtime evidence.

edit_note

Watcher

AGENT::WATCHER

Maintains wiki pages, distills retrospectives into rules, keeps MEMORY.md current. The institutional memory of the system.

The Pipeline

Two modes, one protocol. Each step has a gate -- nothing advances without verification.

1
FURY

Product Brief + PRD

Fury runs 4-5 elicitation methods, produces product brief, then full PRD with acceptance criteria for every feature.

GATE: PRD passes adversarial review
2
STRANGE + SHURI

Architecture + UX Design

Strange produces ADRs, data models, API contracts. Shuri produces interaction specs, design tokens, screen flows.

GATE: Architecture passes implementation readiness check
3
FURY

Sprint Planning + Story Creation

Epics decomposed into stories with full context. Each story file contains everything an agent needs to implement it independently.

GATE: Stories trace back to PRD requirements
4
STARK + WIDOW + HEIMDALL

Implementation + Verification Loop

Stark implements, Widow validates with Playwright tests, Heimdall confirms DONE/VERIFIED/AUDITED. Learning loop encodes lessons.

GATE: Runtime evidence required for DONE status

Anti-Inflation Protocol

Three tiers of truth. No agent can claim DONE without runtime evidence. No exceptions.

State Evidence Required
DONE Runtime artifact: API response, DB query result, browser screenshot proving the feature works
VERIFIED Acceptance criteria reviewed line-by-line with code citations for each point
AUDITED File existence confirmed, LOC counted, function names match spec
verdict.yaml
story: "F7-guest-stay-panel"
status: DONE
evidence:
- type: runtime
proof: "Playwright screenshot: check-in -> folio -> checkout flow"
- type: db_query
proof: "SELECT * FROM folios WHERE booking_id=42 -- balance: 0, status: settled"
verifier: HEIMDALL
rule_violations: 0

The Learning Loop

Build. Verify. Learn. Encode. Every mistake becomes a rule. Rules compound intelligence.

CLOSED LOOP RULES COMPOUND
build
BUILD
verified
VERIFY
school
LEARN
edit_note
ENCODE
replay

Retrospective

TRIGGER: sprint end

Post-sprint review extracts what worked, what failed, and what to encode. Produces structured findings, not vague notes.

warning

Incident Response

TRIGGER: P0 bug

Every P0 produces a root cause analysis that becomes a rule in CLAUDE.md. The same failure pattern never ships twice.

auto_awesome

Skill Distillation

TRIGGER: pattern detected

When agents discover reusable patterns, Watcher distills them into wiki pages and rules. Knowledge compounds across projects.

Get started in five steps

terminal -- becky_init
1

git clone https://github.com/becky-os/becky-os.git

2

cd becky-os

3

npx becky init . # installs slash commands + .becky/ workspace

4

npx becky scan . # analyze existing codebase

5

npx becky run

6
>> Squad deployed. 7 agents online. Awaiting mission.
1

Clone + Init

Clone Becky, run becky init . in your project. Copies 13 slash commands and creates a .becky/ workspace.

2

Scan

Run becky scan . to analyze your existing codebase, detect frameworks, and map what's done vs pending.

3

Define Rules

Add your project rules to CLAUDE.md. Start with defaults or bring your own constraints.

4

Pick Mode

Greenfield for new projects, Brownfield for existing codebases. Becky adapts the pipeline.

5

Let It Run

Agents self-coordinate. The loop runs. Rules compound. Ship production-ready software.

Rules are the source of truth.

Mistakes don't repeat.

GET BECKY arrow_forward