TL;DR: Sub-agent delegation lets you run Claude Code agents at 92% lower cost. Parent agents (Sonnet) handle strategic work while sub-agents (Haiku) handle bounded tasks like data fetching. Kaxo scaled from 4 to 35 agents in 90 days using this pattern. Typical sub-agent costs $2-8/year vs $150-600/year for all-Sonnet architecture. Result: production agent fleet running at <$500/year total cost.
Contents
- The Problem
- The Solution in 30 Seconds
- What Are Sub-Agents?
- The Economics of Sub-Agents
- Can Claude Code Agents Call Other Agents?
- Implementation Guide
- Real-World Example
- Building Agentic Workflows: How We Scaled from 4 to 35 Agents
- Key Takeaways
- FAQ
The Problem
Running multiple Claude Code agents gets expensive fast. A single Sonnet agent costs $150-600 per year. Scale to 10 agents: $1,500-6,000 annually.
Manual oversight doesn’t scale. And no documented patterns exist for multi-agent orchestration in Claude Code.
How do you scale without scaling costs?
The Solution in 30 Seconds
Sub-agent delegation solves this.
What: Specialized agents on cheaper models (Haiku: $0.25/$1.25 per million tokens) handle bounded tasks. Parent agents on strategic models (Sonnet: $3/$15) delegate and synthesize.
Why: 92% cost reduction. Parent handles judgment. Sub-agents handle data fetching and transforms.
Result: We went from 4 to 35 agents in 90 days. Annual cost: under $500 vs $5,000+.
The pattern: Identify repetitive task → create spec → delegate → sub-agent executes → parent continues.
Production numbers: data research tasks dropped from $0.05-0.10 to $0.001-0.002 per query. 96% savings.
What Are Sub-Agents?
Sub-agents are specialized, cost-optimized Claude Code agents invoked by parent agents for bounded, repetitive tasks. They run cheaper models (Haiku: $0.25/$1.25 per million tokens) while parent agents use more capable models (Sonnet: $3/$15 per million tokens) for strategic work, reducing costs by 85-92%.
| Feature | Agent | Sub-Agent |
|---|---|---|
| Scope | Project-level (persistent) | Task-level (ephemeral) |
| Location | Own directory (~/agents/name/) | Invoked via parent |
| Model | Strategic choice (Sonnet/Opus) | Cost-optimized (Haiku) |
| Autonomy | High (self-directed) | Bounded (clear spec) |
| State | Maintains state files | Stateless or parent-managed |
| Lifespan | Persistent across sessions | Single-task execution |
| Cost | $150-600/year | $2-8/year |
| Examples | Parent Agent, Research Agent | Subagent A, Subagent B |
The difference matters. Agents are long-lived infrastructure. Sub-agents are task-specific tools invoked on demand.
Analogy: An agent is a department (Marketing, Engineering). A sub-agent is a contractor hired for a specific deliverable.
The Economics of Sub-Agents
Cost optimization isn’t about being cheap. It’s about allocating budget to judgment instead of data fetching.
Real cost comparison from production usage:
| Task Type | Sonnet Cost | Haiku Cost | Savings |
|---|---|---|---|
| Data research | $0.05-0.10 | $0.001-0.002 | 96% |
| Metric tracking | $0.03-0.08 | $0.001-0.002 | 95% |
| Content scanning | $0.08-0.15 | $0.002-0.005 | 97% |
| Annual (1x/week) | $150-600 | $2-8 | 92% |
ROI: Monolithic Sonnet agent: $150-600/year. Parent + 4 Haiku sub-agents: $38-82/year total.
Savings: 85-92% compared to all-Sonnet architecture.
The decision tree is straightforward. Delegate to Haiku when the task has clear input/output specs and requires no judgment. Handle in Sonnet when pattern recognition or synthesis matters. Escalate to Opus when you’re architecting something novel or requirements are ambiguous.
The principle: Would you pay a senior strategist $150/hour to copy-paste from an API? No. You’d have someone junior fetch the data and let the strategist synthesize. Same applies here.
Can Claude Code Agents Call Other Agents?
Yes. Parent agents spawn sub-agents by providing clear task specifications in their CLAUDE.md configuration file. Sub-agents execute bounded tasks and return structured results to the parent. This enables cost-optimized delegation without framework dependencies.
The mechanism:
- Parent identifies repetitive task (3+ occurrences)
- Creates sub-agent spec with clear inputs/outputs
- Configures delegation in CLAUDE.md
- Spawns sub-agent when needed
- Sub-agent executes, returns results
- Parent continues with enriched context
Concrete example: Parent Agent (Sonnet) delegates data research to Subagent A (Haiku). Cost drops from $0.05-0.10 per query to $0.001-0.002. Same data quality. 96% cost reduction.
No framework required. No LangChain. No CrewAI. Just clear specifications and Claude Code’s native agent invocation.
Implementation Guide: Creating Your First Sub-Agent

Building a sub-agent takes 5 steps. Total time: 30-60 minutes for your first one. Future sub-agents take 15-20 minutes once you know the pattern.
Step 1: Identify Repetitive Tasks
Review your agent’s work log. Look for tasks you’re doing 3+ times with predictable patterns.
Good sub-agent candidates:
- API calls that fetch structured data
- File parsing and transformation
- Report generation with consistent format
- Simple pattern matching across files
Generic examples:
- Data lookups from third-party APIs became data-fetcher (Haiku)
- Status checks became metric-tracker (Haiku)
- Content parsing became content-analyzer (Haiku)
- Search result simulation became search-simulator (Sonnet, requires moderate judgment)
Bad sub-agent candidates:
- One-off tasks
- Work requiring strategic judgment
- Tasks with unclear success criteria
- Creative synthesis
If you’ve done something manually 3 times and can describe the input/output format clearly, you have a sub-agent opportunity.
Step 2: Choose Model Tier
Model selection determines cost and capability. Choose wrong and you either overpay or get poor results.
Haiku-appropriate tasks:
- Data fetching from APIs
- Structured report generation
- File parsing and transformation
- Simple pattern matching
- Cost: ~$0.001-0.002 per task
Sonnet-appropriate tasks:
- Strategic analysis requiring context
- Content brief generation
- Pattern synthesis across multiple files
- Moderate complexity judgment calls
- Cost: ~$0.03-0.10 per task
Decision rule: If you’re 99.9% certain the task will complete correctly given a clear spec, use Haiku. If moderate complexity or judgment is involved, use Sonnet.
Cost matters. A Haiku sub-agent running weekly costs $0.10/year. A Sonnet sub-agent running weekly costs $1.56-5.20/year. That’s a 15-50x difference. Choose appropriately.
Step 3: Write Sub-Agent Specification
Clear specs prevent confusion and failed tasks. Your spec needs four elements: input format, output format, success criteria, and error handling.
Template:
## Sub-Agent: data-fetcher
**Model:** Haiku
**Use When:** Need data from third-party API
**Input:**
- Query parameters (5-15 items)
**Output:**
- Markdown report with structured data
- Summary table
- Recommendations
**Success Criteria:**
- All queries processed
- Data retrieved successfully
- Report generated at reports/data/
**Cost Estimate:** ~$0.001-0.002 per query
Be specific. “Fetch data” is vague. “Call API with parameters, return markdown table with fields X, Y, Z, and top 5 recommendations” is clear.
The spec is a contract. Parent provides these inputs. Sub-agent delivers this output. No ambiguity.
Step 4: Configure Parent Agent
Add delegation table to parent’s CLAUDE.md configuration:
## Sub-Agent Fleet
| Sub-Agent | Model | Use When | Cost |
|-----------|-------|----------|------|
| data-fetcher | Haiku | Need external data | ~$0.001-0.002 |
| metric-tracker | Haiku | Check status metrics | ~$0.001-0.002 |
Add autonomy policy so parent doesn’t ask permission every time:
## Sub-Agent Autonomy
You are authorized to spawn sub-agents autonomously. Don't ask permission.
**Model Selection:**
- Haiku: Task has clear spec, 99.9% certain to complete
- Sonnet: Moderate complexity, judgment required
This configuration tells the parent: these sub-agents exist, here’s when to use them, spawn them without asking.
The result: Autonomous delegation. Parent recognizes “I need data” and spawns data-fetcher immediately. No human in the loop.
Step 5: Test and Validate
Run the sub-agent on sample tasks before trusting it with production work.
Validation checklist:
- Output matches expected format
- Success criteria met
- Cost within estimate
- Error handling works
- No hallucinations or incorrect data
Measure cost savings vs having the parent do the work. If you’re not seeing 85%+ reduction for Haiku sub-agents, something’s wrong with your delegation pattern.
Document results. Your sub-agent’s first run should include a report showing actual cost, output quality, and whether it met success criteria. This creates accountability.
Real-World Example: Parent Agent Sub-Agent Fleet

Production architecture example. Parent Agent started as one Sonnet agent doing everything. Now it’s a parent plus 4 specialized sub-agents.
Before: Monolithic Agent
Parent Agent was initially one Sonnet agent handling:
- Data research (manual API calls)
- Content monitoring (manual scanning)
- Metric tracking (manual checks)
- Search simulation (testing queries)
- Brief generation
Cost: ~$150-600/year (all Sonnet) Problem: Strategic work (brief synthesis) and tactical work (data fetching) mixed together. No cost optimization. Parent model doing grunt work.
After: Parent + Sub-Agent Fleet
Parent Agent (Sonnet)
├── Subagent A: data-fetcher (Haiku)
├── Subagent B: metric-tracker (Haiku)
├── Subagent C: content-analyzer (Haiku)
└── Subagent D: search-simulator (Sonnet)
Delegation pattern:
| Task | Handler | Model | Cost |
|---|---|---|---|
| Data research | Subagent A | Haiku | $0.001-0.002 |
| Metric tracking | Subagent B | Haiku | $0.001-0.002 |
| Content scanning | Subagent C | Haiku | $0.002-0.005 |
| Search simulation | Subagent D | Sonnet | $0.03-0.10 |
| Brief generation | Parent Agent | Sonnet | $0.03-0.10 |
Annual cost breakdown (52 weeks):
- Subagent A (data-fetcher): 1x/week = ~$0.05-0.10/year
- Subagent B (metric-tracker): 7x/week = ~$0.35-0.70/year
- Subagent C (content-analyzer): bi-weekly = ~$0.10-0.25/year
- Subagent D (search-simulator): 1x/week = ~$1.56-5.20/year
- Parent Agent: 4x/month = ~$15-50/year
Total: $17-56/year vs $150-600/year monolithic Savings: 85-92%
Performance metrics:
- Brief generation time: 3x faster (sub-agents run in parallel)
- Data quality: Same or better (specialized agents with clear specs)
- Cost per brief: 92% reduction
- Autonomy: Parent Agent spawns sub-agents without human approval
The key insight: Parent focuses on strategic synthesis. Sub-agents act as sensors gathering landscape data. This creates a “sensor-actuator” pattern. Sub-agents sense (data volumes, content activity, metric positions). Parent decides what to do with that intelligence.
Building Agentic Workflows: How We Scaled from 4 to 35 Agents
Kaxo went from 4 agents to 35 agents in 90 days. Here’s the scaling trajectory and what changed at each phase.
Timeline:
- Day 0: 4 agents (core infrastructure)
- Day 30: 10 agents (added 6 meta-agents for shared infrastructure)
- Day 90: 35 agents (17 meta-agents + 18 domain-specific agents)
Phase 1: Identify Meta-Agent Opportunities
Meta-agents are reusable across projects. Instead of every agent implementing shared capabilities separately, extract them once.
We extracted:
- Permission management (adds safe permissions automatically when prompted 3+ times)
- Configuration sync (synchronizes shared CLAUDE.md sections across agents)
- Session reporting (generates daily session reports from transcripts)
- Error analysis (analyzes failure patterns across agent fleet)
- Cost monitoring (tracks API spend by agent and model)
Pattern recognition: If 3+ agents need the same capability, extract it as a meta-agent available to all.
This creates infrastructure reuse. New agents inherit capabilities instead of rebuilding them.
Phase 2: Post-Task Self-Assessment
After completing any task, agents now ask themselves:
- Did I do anything 3+ times that a Haiku sub-agent could have done?
- Was there a bounded, repeatable step I handled manually?
- Could any part be extracted as a deterministic sub-agent?
If YES: Document in report under “Sub-Agent Opportunities”
This creates a continuous improvement feedback loop. Task execution leads to reflection. Reflection leads to pattern detection. Pattern detection leads to new sub-agents. New sub-agents improve fleet efficiency.
Generic example: Content Agent noticed it was manually structuring outlines from briefs. Same pattern every time: parse brief, extract sections, format as hierarchical outline. Created structure-generator (Haiku) sub-agent. Cost: ~$0.001-0.002 per outline vs $0.03-0.10 doing it in Sonnet.
Phase 3: Global Instructions Pattern
Created ~/.claude/CLAUDE.md with shared context visible to all agents:
- Agent directory structure
- Permission policies
- State file patterns
- Meta-agent registry
New agents inherit this automatically. No need to document the same patterns 35 times.
Cost savings from global context: Reduced configuration overhead by ~60%. New agent setup went from 2 hours to 30 minutes.
Phase 4: Fleet Taxonomy
Organized agents into clear hierarchy:
Meta-Agents (17): Reusable infrastructure
- Permission management
- Configuration sync
- Error analysis and aggregation
- Cost tracking and reporting
- Documentation generation
Domain-Specific (18): Project-focused work
- Research Agent + 4 sub-agents (data fetching, metric tracking, content monitoring, search simulation)
- Content Agent + 3 sub-agents (outline generation, structure extraction)
- Deploy Agent + 2 sub-agents (validation, optimization)
- Analytics Agent + 9 sub-agents (health checking, status validation, automation)

Key scaling principles that enabled 4→35 growth:
- Delegation Decision Tree: Haiku for clear tasks, Sonnet for judgment, Opus for novel architecture
- Autonomy Policies: Sub-agents spawn without approval if task spec is clear
- Cost Tracking: Every agent reports estimated cost per task in execution logs
- State Management: Agents maintain state files for continuity across sessions
- Post-Task Assessment: Continuous pattern detection and sub-agent opportunity identification
Result: 35-agent fleet running at <$500/year total cost vs $5,000+/year if built entirely with Sonnet agents.
The economic argument: Cost per agent dropped from $150-600/year to $14-28/year average. This makes agent proliferation economically viable. You can afford to create specialized agents for narrow tasks because the incremental cost is negligible.
Key Takeaways
- Sub-agent delegation reduces Claude Code costs by 85-92% compared to all-Sonnet architecture
- Parent agents (Sonnet) handle strategic work while sub-agents (Haiku) handle bounded, repetitive tasks
- Typical sub-agent costs $2-8/year vs $150-600/year for traditional agent
- Scaled from 4 to 35 agents in 90 days using this pattern, staying under $500/year total cost
- Autonomous delegation works: Parent agents spawn sub-agents without human approval when task specs are clear
- Post-task self-assessment creates continuous improvement: Agents identify their own sub-agent opportunities after completing work
- Meta-agents provide reusable infrastructure across domain-specific agents, reducing configuration overhead by 60%
FAQ
What are sub-agents in Claude Code?
Sub-agents are specialized, cost-optimized Claude Code agents invoked by parent agents for bounded tasks. They run cheaper models (Haiku: $0.25/$1.25 per million tokens) for simple tasks while parent agents use more capable models (Sonnet: $3/$15 per million tokens) for strategic work, reducing costs by up to 92%.
What’s the difference between Claude Code agents and sub-agents?
Agents are project-level with persistent state and strategic scope, while sub-agents are task-level, ephemeral, and cost-optimized. Agents live in their own directories with state files. Sub-agents are invoked by parents via clear task specifications and return results without maintaining persistent state.
How can I reduce Claude Code API costs?
Use sub-agent delegation to run simple tasks on Haiku ($0.001-0.002 per call) instead of Sonnet ($0.03-0.10 per call). Identify tasks repeated 3+ times with clear input/output specs. Create sub-agent specifications for those tasks. Configure parent to delegate autonomously. A typical sub-agent fleet costs $2-8/year vs $150-600/year for all-Sonnet architecture.
Can Claude Code agents call other agents?
Yes. Parent agents spawn sub-agents by providing clear task specifications in their CLAUDE.md configuration file. Sub-agents execute bounded tasks and return structured results to the parent. This enables cost-optimized delegation without framework dependencies like LangChain or CrewAI.
How do I scale Claude Code from one agent to many?
Start with task decomposition. Identify repetitive tasks (3+ occurrences) in your current agent’s work. Create sub-agents for bounded tasks using Haiku. Use global instructions file (~/.claude/CLAUDE.md) for shared context across agents. Extract reusable capabilities as meta-agents. Follow post-task self-assessment pattern to continuously identify new sub-agent opportunities.
When should I use a sub-agent vs doing it myself?
Delegate to sub-agents when: (1) Task repeated 3+ times, (2) Clear input/output specification exists, (3) No strategic judgment required, (4) Cost matters. Handle directly in parent when: Task requires context from previous work, judgment calls needed, or it’s a one-off task. Use the 99.9% certainty rule: if you’re 99.9% certain a Haiku sub-agent will complete the task correctly given a clear spec, delegate it.
What’s the cheapest way to run Claude Code agents?
Use Haiku sub-agents for data fetching, structured output generation, and simple file transforms. Reserve Sonnet for strategic analysis, pattern synthesis, and content generation. Use post-task self-assessment to continuously identify delegation opportunities. Track costs per task type and optimize based on actual usage. Typical savings: 85-92% compared to all-Sonnet architecture.
Next Steps
Ready to scale your AI agent infrastructure without scaling costs?
Kaxo Technologies specializes in building production agent systems for Canadian SMBs. We scaled from 4 to 35 agents using the patterns in this guide.
Our services:
- Agent architecture consulting
- Sub-agent fleet implementation and optimization
- Cost optimization audits for existing agent systems
- Claude Code training and best practices workshops
Book a discovery call to discuss your automation needs.
Location: Ontario, Canada Expertise: AI automation, agentic workflows, Claude Code infrastructure
Soli Deo Gloria
Back to Insights