AGENTS.md — Multi-Agent Coordination
The shared context layer for Claude Code, Gemini CLI, and Codex. All three agents read this at session start. It defines what we are building, how agents collaborate, who owns what, and how to hand off work. Target: $25K MRR by Month 12 · 7 products · 3 markets.
3
AI Agents
14
Tool Stack
92%
Gross Margin
🧠
Multi-World Local Stack — Manus & Perplexity now active
Two new agents join the stack: Perplexity (Research Agent · sonar-pro) and Manus (Build Agent · gpt-4o-mini). See Ops Deck → for the full Docker orchestration setup.
View Ops Deck →
1
The Five Agents
Claude Code
claude-sonnet-4 → claude-haiku-3
Primary architect, security reviewer, and Slack Bridge owner. Claude designs before Codex builds.
  • System architecture & multi-file reasoning
  • Security review of all PRs before merge
  • Stripe billing integration (US market)
  • Slack Bridge — Bolt Socket Mode routing
  • SharedContext MCP — primary writer
  • Database schema design & normalization
  • API integration (owns external API contracts)
Gemini CLI
gemini-2.5-pro → gemini-2.0-flash
Research powerhouse and content engine. Handles long-context tasks and all content generation.
  • Market research, API docs, competitor analysis
  • SEOAgent — article pipeline primary owner
  • Long-context synthesis (reading many files)
  • User-facing documentation & READMEs
  • InvoiceAgent PDF generation
  • Spanish-language content (Costa Rica market)
  • Challenge generation for AgentBenchmark
Codex (OpenAI)
gpt-5.4 → gpt-4o
Algorithm specialist and test harness builder. Turns specs into production code with full test coverage.
  • Algorithm design & code synthesis from specs
  • Test generation — systematic edge case coverage
  • AgentBenchmark harness — primary owner
  • Paystack integration (Nigeria/Costa Rica)
  • UI components & embeddable widgets
  • Performance optimization (profiling-guided)
  • Writes failing tests first, then implements
Perplexity
sonar-pro · live web research
Research agent powered by Perplexity's sonar-pro model. Grounds the stack in real, current information — queries the live web, synthesizes findings, writes structured research.md. First agent in every orchestration loop.
  • Live web research via sonar-pro API
  • Writes /data/docs/research.md for Manus
  • Deep competitive and technical analysis
  • API reference + documentation lookup
  • Market intelligence for SEOAgent pipeline
  • Vector store ingestion (Qdrant write)
Manus
gpt-4o-mini · build + revise
Build agent — reads research, implements, and revises based on Judge feedback. Two endpoints (/run and /revise) power the self-healing critique loop. Iterates until Claude's score reaches 8/10.
  • Reads research.md → builds implementation
  • Writes /data/code/implementation.py
  • /revise endpoint for improvement loops
  • Incorporates Claude review feedback directly
  • Cost-efficient: gpt-4o-mini for iteration speed
  • Docker worker · FastAPI · shares /data volume
2
How to Use This File
Claude Code

Reads AGENTS.md natively. Invoke explicitly with:

# Claude reads this automatically claude --context AGENTS.md "<task>"

Session start: read AGENTS.md → check Active Constraints → check Handoff Notes → identify current product.

Gemini CLI

Uses GEMINI.md natively. Pass AGENTS.md via --context flag:

# Standard session start gemini --context AGENTS.md "<task>" # Interactive mode gemini chat --context AGENTS.md
Codex

Reads via --instructions flag. Always check Codex-specific section first:

# Pass AGENTS.md as instructions codex --instructions AGENTS.md "<task>" # Or pipe a task file cat task.md | codex --instructions AGENTS.md
Session Start Checklist (All Agents)
  • Read AGENTS.md in full before any work
  • Check Active Constraints for current blockers
  • Check Cross-Agent Handoff Notes (bottom of file)
  • Identify which product is in active development
  • Confirm which phase of the build protocol applies
  • Verify Ollama is running before any inference call
3
7 Products — Build Order
Ph.1
AgentArena
Side-by-side live coding sessions · agents race on challenges
Free
ClaudeCodex
Ph.1
ChatDesk
AI customer support · KB upload → deploy · WhatsApp bridge
$49–$399/mo
ClaudeGemini
Ph.1
AgentBenchmark
Weekly benchmark of all major AI coding agents · public leaderboard
Free
CodexClaude
Ph.2
ShipIt
Prompt → architecture → code → Plandex diff review → MVP
$49/mo
ClaudeCodex
Ph.2
SEOAgent
Keyword → research → outline → long-form article → SEO score → publish
$49–$99/mo
GeminiClaude
Ph.3
AgentOps
Slack /task → Claw-Empire orchestrates → agents build → PR · flagship product
$49–$299/mo
ClaudeCodex
Ph.3
InvoiceAgent
Auto-invoice → WhatsApp/SMS delivery → Paystack reconciliation · Nigeria/CR focus
$9–$25/mo
GeminiCodex

Status pills: Not started · In progress · MVP done · Tests passing · Deployed. Build Phase 1 completely before starting Phase 2.

4
Agent Roles & Division of Labor
Task Type Primary Agent Reason
System architecture decisionsClaude CodeBest at multi-file reasoning and system design
Research (market, API, docs)Gemini CLIBest at long-context web research and synthesis
Algorithm design + code generationCodexBest at pure code synthesis from specifications
Refactoring large codebasesClaude CodeBest at understanding context and constraints
Test generationCodexBest at systematic edge case coverage
Content generation (articles, copy)Gemini CLIBest at long-form content and SEO research
Security reviewClaude CodeBest at identifying subtle security issues
Performance optimizationCodexBest at profiling-guided improvements
DocumentationGemini CLIBest at comprehensive doc generation
Integration work (APIs, webhooks)Claude CodeBest at understanding external APIs from docs
Database schema designClaude CodeBest at data modeling and normalization
UI/UX implementationCodexBest at translating specs to component code
Collaboration Patterns
Pattern 1: Research → Build → Review
1.Gemini CLI researches API docs + existing solutions → writes findings to SharedContext namespace project:<name>:research
2.Claude Code reads SharedContext, designs architecture → writes design.md to project root
3.Codex implements from design.md → Claude Code reviews PR
Pattern 2: Parallel Feature Development
1.Claude Code + Codex work on different features simultaneously — each uses own Vibe Kanban worktree
2.Both update SharedContext with progress
3.Claude Code integrates and resolves conflicts
Pattern 3: Spec → Test → Build → Verify
1.Claude Code writes specification (spec.md)
2.Codex writes test suite from spec (tests pass 0%)
3.Codex or Claude Code implements until tests pass (100%)
4.Gemini CLI writes documentation from spec + implementation
5
Protocols
🔍
Research Protocol
  1. Search for existing open-source implementations (avoid reinventing)
  2. Check SharedContext namespace project:<name>:research for prior research
  3. Review Active Constraints for technology restrictions
  4. Verify the Ollama model assignment for this feature
  5. Check if any existing tool in the 14-stack already handles this
  6. Store findings: agent · timestamp · findings · pitfalls · references
🔨
Build Protocol
  1. Check Active Constraints — no merges during freeze
  2. Read the product's acceptance criteria (Section 4)
  3. Check SharedContext for prior research or partial implementation
  4. Create a worktree via Vibe Kanban if working in isolation
  5. Use ESM imports, async/await, named exports, JSDoc
  6. Ollama-first: check OLLAMA_HOST before any cloud API call
🧪
Test & Validation Protocol
  1. Three test layers: smoke.sh · unit.test.js · integration.test.js
  2. Run: make test-<product>
  3. Happy path + failure path + edge cases + concurrency
  4. Billing: free tier enforced, paid tier unlocked correctly
  5. Performance: response <3s p80, Ollama ≥80% of requests
  6. Security: no keys logged, webhook signatures verified, rate limits
🤝
Handoff Protocol
  1. Update product status checkboxes in Section 4
  2. Write handoff note to SharedContext namespace session:<date>
  3. Append timestamped entry to Cross-Agent Handoff Notes
  4. Commit: chore(<product>): handoff from <agent>
  5. Next agent: recall SharedContext + read git log --oneline -20
  6. Never skip handoff — agents build on each other's work
6
Routing Directives
#
Claw-Kanban — Lightweight Task Dispatch
Use for single-agent tasks, bug fixes, and routine work. Routes to the correct tool via task board. Example: # task: fix Ollama timeout in ChatDesk
$
Claw-Empire — CEO Strategic Directive
Use for cross-product decisions, prioritization changes, and multi-agent orchestration. Example: $ directive: prioritize ChatDesk WhatsApp over ShipIt MVP
tool:
Direct Tool via Slack Bridge
Spawn a specific CLI agent directly. Available targets: tool:claude · tool:gemini · tool:codex · tool:droid · tool:aider
project:
Project Routing via Slack DM
Route a task to the owner of a specific product. Example: project:chatdesk fix the confidence scoring threshold · project:invoiceagent add CRC currency support
!
Active Constraints
Constraint Added Expires Reason
Build order: complete Phase 1 before Phase 2 2026-03-17 Phase 2 gate Stage gates enforced — quality over speed
$49 minimum price floor 2026-03-17 Never 77% churn below $50 (ChartMogul 2025)
Ollama-first for all inference 2026-03-17 Never 92% gross margin protection — cloud = margin loss
No auto-merge in AgentOps 2026-03-17 Never Safety requirement — human approval always required
Pin Claw-Kanban/Empire to commit hash 2026-03-17 Until stable Newer projects — may break on updates
Do Not Build List (Phase 1–3)

Do not build these until after Month 9 and stage gate passed. These are high-effort, high-distraction, or regulatory-risk products that would sink the solo operator:

❌ ComplianceBot, LegacyRevive, FleetOps — enterprise sales cycle too long
❌ TutorBot, HireScreen — high support burden, off-core
❌ AgentStore, WhiteLabel — requires existing user base first
❌ Medical Scribe, Legal Sanity Checker — regulatory complexity
SharedContext Namespace Map
project:aishop
Owner: All agents
Global project state, key decisions, cross-product conventions
TTL: Permanent
project:<product>
Owner: Product owner agent
Product-specific state, bugs, architectural decisions
TTL: Permanent
project:<product>:research
Owner: Gemini primarily
Research findings — competitor analysis, API docs, market data
TTL: 30 days
project:<product>:test-results
Owner: Codex
Latest test run results — pass/fail, timing, coverage
TTL: 7 days
tool:claude
Owner: Claude Code only
Claude's private notes, architectural preferences, open questions
TTL: Permanent
tool:gemini
Owner: Gemini CLI only
Gemini's research queue, brand voice memory, content templates
TTL: Permanent
tool:codex
Owner: Codex only
Codex's questions for Claude, pending architectural clarifications
TTL: Permanent
session:<YYYYMMDD-HHMM>
Owner: Outgoing agent
Handoff notes — completed, in-progress, blocked, next steps
TTL: 14 days
Memory Rules
✓ Read before writing — check if a fact exists first
✓ Own namespace only — don't write to another agent's tool: namespace
✗ No secrets — never store API keys, passwords, or PII
✓ Structured JSON — all values are JSON strings, include agent + timestamp
7
Environment & Secrets
Agent API Keys (All agents)
ANTHROPIC_API_KEYClaude Code
GOOGLE_API_KEYGemini CLI
OPENAI_API_KEYCodex (OpenAI)
Billing (Products)
STRIPE_SECRET_KEYChatDesk, ShipIt (US)
STRIPE_WEBHOOK_SECRETStripe webhook verify
PAYSTACK_SECRET_KEYInvoiceAgent (NG/CR)
PAYSTACK_PUBLIC_KEYInvoiceAgent frontend
Communication
SLACK_BOT_TOKENxoxb-... AgentOps bridge
SLACK_APP_TOKENxapp-... Socket Mode
SLACK_SIGNING_SECRETWebhook verification
WHATSAPP_VERIFY_TOKENChatDesk, InvoiceAgent
Inference & Memory
OLLAMA_HOSThttp://localhost:11434
OLLAMA_DEFAULT_MODELqwen3-coder:30b
SHARED_MEMORY_RECOVERY_PHRASEAuto-generated on first run
MM_BOT_TOKENMattermost agent bus
8
Acceptance Criteria & Business Gates
Infrastructure Gates (must pass before building products)
  • make check-prereqs passes — Node ≥22.16.0, all CLIs, docker, ollama, ttyd installed
  • make install completes without manual intervention
  • make start runs all 14 tools + bridges + Ollama + dashboard
  • Dashboard at http://localhost:4000 shows health grid with all services
  • SharedContext MCP starts, memory.db created
  • Cross-tool memory: Claude stores fact → Crush recalls it correctly
Phase 1 Gate (must all pass before Phase 2)
  • AgentArena: page loads, challenge dispatches to 2+ agents, results stored in SharedContext
  • ChatDesk: KB ingestion works, Ollama inference <3s, Slack escalation triggers at <0.7 confidence
  • ChatDesk: WhatsApp bridge sends/receives, Stripe + Paystack billing both working
  • AgentBenchmark: 20 challenges in repo, 3 tools scored, leaderboard at :4000/benchmark
Business Metrics Gates
GateMetricTargetMonthIf Missed
ActivationTime-to-first-success< 10 minAllFix before scaling any marketing
RetentionWeek-4 retention≥ 25%Mo 3Stop marketing, fix product first
ConversionFree-to-paid by Day 45≥ 5%Mo 6Kill or pivot the product
RevenueMRR$300Mo 3Reduce to 1 product focus
RevenueMRR$3,000Mo 6Pause new markets expansion
RevenueMRR$25,000Mo 12Plan Year 2
14-Tool Stack Architecture
Orchestration Boards (7)
Claw-Kanban:4011
Claw-Empire:4018
Vibe Kanban:4012
KaibanJS:4013
AgentsBoard:4014
OpenClaw:4015
NanoClaw:4016
CLI Agents & Harnesses (7)
Droid (primary):7687
Pi:7688
Aider:7682
Cline:7683
Crush (LSP+MCP):7684
Plandex:7685
Amp (cloud-only):7686
Infrastructure
Dashboard:4000
Slack Bridge:4010
Mattermost:4020
MM Bridge:4021
Ollama (local AI):11434
SharedContext MCP:3100