Agents — AI Shop Dashboard

AGENTS.md — Multi-Agent Coordination

The shared context layer for Claude Code, Gemini CLI, and Codex. All three agents read this at session start. It defines what we are building, how agents collaborate, who owns what, and how to hand off work. Target: $25K MRR by Month 12 · 7 products · 3 markets.

AI Agents

Tool Stack

92%

Gross Margin

🧠

Multi-World Local Stack — Manus & Perplexity now active

Two new agents join the stack: Perplexity (Research Agent · sonar-pro) and Manus (Build Agent · gpt-4o-mini). See Ops Deck → for the full Docker orchestration setup.

View Ops Deck →

The Five Agents

Claude Code

claude-sonnet-4 → claude-haiku-3

Primary architect, security reviewer, and Slack Bridge owner. Claude designs before Codex builds.

System architecture & multi-file reasoning
Security review of all PRs before merge
Stripe billing integration (US market)
Slack Bridge — Bolt Socket Mode routing
SharedContext MCP — primary writer
Database schema design & normalization
API integration (owns external API contracts)

Gemini CLI

gemini-2.5-pro → gemini-2.0-flash

Research powerhouse and content engine. Handles long-context tasks and all content generation.

Market research, API docs, competitor analysis
SEOAgent — article pipeline primary owner
Long-context synthesis (reading many files)
User-facing documentation & READMEs
InvoiceAgent PDF generation
Spanish-language content (Costa Rica market)
Challenge generation for AgentBenchmark

Codex (OpenAI)

gpt-5.4 → gpt-4o

Algorithm specialist and test harness builder. Turns specs into production code with full test coverage.

Algorithm design & code synthesis from specs
Test generation — systematic edge case coverage
AgentBenchmark harness — primary owner
Paystack integration (Nigeria/Costa Rica)
UI components & embeddable widgets
Performance optimization (profiling-guided)
Writes failing tests first, then implements

Perplexity

sonar-pro · live web research

Research agent powered by Perplexity's sonar-pro model. Grounds the stack in real, current information — queries the live web, synthesizes findings, writes structured research.md. First agent in every orchestration loop.

Live web research via sonar-pro API
Writes /data/docs/research.md for Manus
Deep competitive and technical analysis
API reference + documentation lookup
Market intelligence for SEOAgent pipeline
Vector store ingestion (Qdrant write)

Manus

gpt-4o-mini · build + revise

Build agent — reads research, implements, and revises based on Judge feedback. Two endpoints (/run and /revise) power the self-healing critique loop. Iterates until Claude's score reaches 8/10.

Reads research.md → builds implementation
Writes /data/code/implementation.py
/revise endpoint for improvement loops
Incorporates Claude review feedback directly
Cost-efficient: gpt-4o-mini for iteration speed
Docker worker · FastAPI · shares /data volume

How to Use This File

Claude Code

Reads AGENTS.md natively. Invoke explicitly with:

# Claude reads this automatically
claude --context AGENTS.md "<task>"
        

Session start: read AGENTS.md → check Active Constraints → check Handoff Notes → identify current product.

Gemini CLI

Uses GEMINI.md natively. Pass AGENTS.md via --context flag:

# Standard session start
gemini --context AGENTS.md "<task>"

# Interactive mode
gemini chat --context AGENTS.md
        

Codex

Reads via --instructions flag. Always check Codex-specific section first:

# Pass AGENTS.md as instructions
codex --instructions AGENTS.md "<task>"

# Or pipe a task file
cat task.md | codex --instructions AGENTS.md
        

✓

Session Start Checklist (All Agents)

Read AGENTS.md in full before any work
Check Active Constraints for current blockers
Check Cross-Agent Handoff Notes (bottom of file)

Identify which product is in active development
Confirm which phase of the build protocol applies
Verify Ollama is running before any inference call

7 Products — Build Order

Ph.1

AgentArena

Side-by-side live coding sessions · agents race on challenges

Free

ClaudeCodex

Ph.1

ChatDesk

AI customer support · KB upload → deploy · WhatsApp bridge

$49–$399/mo

ClaudeGemini

Ph.1

AgentBenchmark

Weekly benchmark of all major AI coding agents · public leaderboard

Free

CodexClaude

Ph.2

ShipIt

Prompt → architecture → code → Plandex diff review → MVP

$49/mo

ClaudeCodex

Ph.2

SEOAgent

Keyword → research → outline → long-form article → SEO score → publish

$49–$99/mo

GeminiClaude

Ph.3

AgentOps

Slack /task → Claw-Empire orchestrates → agents build → PR · flagship product

$49–$299/mo

ClaudeCodex

Ph.3

InvoiceAgent

Auto-invoice → WhatsApp/SMS delivery → Paystack reconciliation · Nigeria/CR focus

$9–$25/mo

GeminiCodex

Status pills: Not started · In progress · MVP done · Tests passing · Deployed. Build Phase 1 completely before starting Phase 2.

Agent Roles & Division of Labor

Task Type	Primary Agent	Reason
System architecture decisions	Claude Code	Best at multi-file reasoning and system design
Research (market, API, docs)	Gemini CLI	Best at long-context web research and synthesis
Algorithm design + code generation	Codex	Best at pure code synthesis from specifications
Refactoring large codebases	Claude Code	Best at understanding context and constraints
Test generation	Codex	Best at systematic edge case coverage
Content generation (articles, copy)	Gemini CLI	Best at long-form content and SEO research
Security review	Claude Code	Best at identifying subtle security issues
Performance optimization	Codex	Best at profiling-guided improvements
Documentation	Gemini CLI	Best at comprehensive doc generation
Integration work (APIs, webhooks)	Claude Code	Best at understanding external APIs from docs
Database schema design	Claude Code	Best at data modeling and normalization
UI/UX implementation	Codex	Best at translating specs to component code

◆

Collaboration Patterns

Pattern 1: Research → Build → Review

1.Gemini CLI researches API docs + existing solutions → writes findings to SharedContext namespace project:<name>:research

2.Claude Code reads SharedContext, designs architecture → writes design.md to project root

3.Codex implements from design.md → Claude Code reviews PR

Pattern 2: Parallel Feature Development

1.Claude Code + Codex work on different features simultaneously — each uses own Vibe Kanban worktree

2.Both update SharedContext with progress

3.Claude Code integrates and resolves conflicts

Pattern 3: Spec → Test → Build → Verify

1.Claude Code writes specification (spec.md)

2.Codex writes test suite from spec (tests pass 0%)

3.Codex or Claude Code implements until tests pass (100%)

4.Gemini CLI writes documentation from spec + implementation

Protocols

🔍

Research Protocol

Search for existing open-source implementations (avoid reinventing)
Check SharedContext namespace project:<name>:research for prior research
Review Active Constraints for technology restrictions
Verify the Ollama model assignment for this feature
Check if any existing tool in the 14-stack already handles this
Store findings: agent · timestamp · findings · pitfalls · references

🔨

Build Protocol

Check Active Constraints — no merges during freeze
Read the product's acceptance criteria (Section 4)
Check SharedContext for prior research or partial implementation
Create a worktree via Vibe Kanban if working in isolation
Use ESM imports, async/await, named exports, JSDoc
Ollama-first: check OLLAMA_HOST before any cloud API call

🧪

Test & Validation Protocol

Three test layers: smoke.sh · unit.test.js · integration.test.js
Run: make test-<product>
Happy path + failure path + edge cases + concurrency
Billing: free tier enforced, paid tier unlocked correctly
Performance: response <3s p80, Ollama ≥80% of requests
Security: no keys logged, webhook signatures verified, rate limits

🤝

Handoff Protocol

Update product status checkboxes in Section 4
Write handoff note to SharedContext namespace session:<date>
Append timestamped entry to Cross-Agent Handoff Notes
Commit: chore(<product>): handoff from <agent>
Next agent: recall SharedContext + read git log --oneline -20
Never skip handoff — agents build on each other's work

Routing Directives

Claw-Kanban — Lightweight Task Dispatch

Use for single-agent tasks, bug fixes, and routine work. Routes to the correct tool via task board. Example: # task: fix Ollama timeout in ChatDesk

Claw-Empire — CEO Strategic Directive

Use for cross-product decisions, prioritization changes, and multi-agent orchestration. Example: $ directive: prioritize ChatDesk WhatsApp over ShipIt MVP

tool:

Direct Tool via Slack Bridge

Spawn a specific CLI agent directly. Available targets: tool:claude · tool:gemini · tool:codex · tool:droid · tool:aider

project:

Project Routing via Slack DM

Route a task to the owner of a specific product. Example: project:chatdesk fix the confidence scoring threshold · project:invoiceagent add CRC currency support

Active Constraints

Constraint	Added	Expires	Reason
Build order: complete Phase 1 before Phase 2	2026-03-17	Phase 2 gate	Stage gates enforced — quality over speed
$49 minimum price floor	2026-03-17	Never	77% churn below $50 (ChartMogul 2025)
Ollama-first for all inference	2026-03-17	Never	92% gross margin protection — cloud = margin loss
No auto-merge in AgentOps	2026-03-17	Never	Safety requirement — human approval always required
Pin Claw-Kanban/Empire to commit hash	2026-03-17	Until stable	Newer projects — may break on updates

Do Not Build List (Phase 1–3)

Do not build these until after Month 9 and stage gate passed. These are high-effort, high-distraction, or regulatory-risk products that would sink the solo operator:

❌ ComplianceBot, LegacyRevive, FleetOps — enterprise sales cycle too long

❌ TutorBot, HireScreen — high support burden, off-core

❌ AgentStore, WhiteLabel — requires existing user base first

❌ Medical Scribe, Legal Sanity Checker — regulatory complexity

◎

SharedContext Namespace Map

project:aishop

Owner: All agents

Global project state, key decisions, cross-product conventions

TTL: Permanent

project:<product>

Owner: Product owner agent

Product-specific state, bugs, architectural decisions

TTL: Permanent

project:<product>:research

Owner: Gemini primarily

Research findings — competitor analysis, API docs, market data

TTL: 30 days

project:<product>:test-results

Owner: Codex

Latest test run results — pass/fail, timing, coverage

TTL: 7 days

tool:claude

Owner: Claude Code only

Claude's private notes, architectural preferences, open questions

TTL: Permanent

tool:gemini

Owner: Gemini CLI only

Gemini's research queue, brand voice memory, content templates

TTL: Permanent

tool:codex

Owner: Codex only

Codex's questions for Claude, pending architectural clarifications

TTL: Permanent

session:<YYYYMMDD-HHMM>

Owner: Outgoing agent

Handoff notes — completed, in-progress, blocked, next steps

TTL: 14 days

Memory Rules

✓ Read before writing — check if a fact exists first

✓ Own namespace only — don't write to another agent's tool: namespace

✗ No secrets — never store API keys, passwords, or PII

✓ Structured JSON — all values are JSON strings, include agent + timestamp

Environment & Secrets

Agent API Keys (All agents)

ANTHROPIC_API_KEYClaude Code

GOOGLE_API_KEYGemini CLI

OPENAI_API_KEYCodex (OpenAI)

Billing (Products)

STRIPE_SECRET_KEYChatDesk, ShipIt (US)

STRIPE_WEBHOOK_SECRETStripe webhook verify

PAYSTACK_SECRET_KEYInvoiceAgent (NG/CR)

PAYSTACK_PUBLIC_KEYInvoiceAgent frontend

Communication

SLACK_BOT_TOKENxoxb-... AgentOps bridge

SLACK_APP_TOKENxapp-... Socket Mode

SLACK_SIGNING_SECRETWebhook verification

WHATSAPP_VERIFY_TOKENChatDesk, InvoiceAgent

Inference & Memory

OLLAMA_HOSThttp://localhost:11434

OLLAMA_DEFAULT_MODELqwen3-coder:30b

SHARED_MEMORY_RECOVERY_PHRASEAuto-generated on first run

MM_BOT_TOKENMattermost agent bus

Acceptance Criteria & Business Gates

Infrastructure Gates (must pass before building products)

make check-prereqs passes — Node ≥22.16.0, all CLIs, docker, ollama, ttyd installed
make install completes without manual intervention
make start runs all 14 tools + bridges + Ollama + dashboard
Dashboard at http://localhost:4000 shows health grid with all services
SharedContext MCP starts, memory.db created
Cross-tool memory: Claude stores fact → Crush recalls it correctly

Phase 1 Gate (must all pass before Phase 2)

AgentArena: page loads, challenge dispatches to 2+ agents, results stored in SharedContext
ChatDesk: KB ingestion works, Ollama inference <3s, Slack escalation triggers at <0.7 confidence
ChatDesk: WhatsApp bridge sends/receives, Stripe + Paystack billing both working
AgentBenchmark: 20 challenges in repo, 3 tools scored, leaderboard at :4000/benchmark

Business Metrics Gates

Gate	Metric	Target	Month	If Missed
Activation	Time-to-first-success	< 10 min	All	Fix before scaling any marketing
Retention	Week-4 retention	≥ 25%	Mo 3	Stop marketing, fix product first
Conversion	Free-to-paid by Day 45	≥ 5%	Mo 6	Kill or pivot the product
Revenue	MRR	$300	Mo 3	Reduce to 1 product focus
Revenue	MRR	$3,000	Mo 6	Pause new markets expansion
Revenue	MRR	$25,000	Mo 12	Plan Year 2

⚙

14-Tool Stack Architecture

Orchestration Boards (7)

Claw-Kanban:4011

Claw-Empire:4018

Vibe Kanban:4012

KaibanJS:4013

AgentsBoard:4014

OpenClaw:4015

NanoClaw:4016

CLI Agents & Harnesses (7)

Droid (primary):7687

Pi:7688

Aider:7682

Cline:7683

Crush (LSP+MCP):7684

Plandex:7685

Amp (cloud-only):7686

Infrastructure

Dashboard:4000

Slack Bridge:4010

Mattermost:4020

MM Bridge:4021

Ollama (local AI):11434

SharedContext MCP:3100