Scaling Claude Code: From 4 to 35 Agents with Sub-Agent Architecture

Learn how to build scalable Claude Code agent systems using sub-agent delegation. Reduce API costs by 92% while automating complex workflows—practical patterns from scaling 4 to 35 agents.

Scaling Claude Code: From 4 to 35 Agents with Sub-Agent Architecture

TL;DR: Sub-agent delegation lets you run Claude Code agents at 92% lower cost. Parent agents (Sonnet) handle strategic work while sub-agents (Haiku) handle bounded tasks like data fetching. Kaxo scaled from 4 to 35 agents in 90 days using this pattern. Typical sub-agent costs $2-8/year vs $150-600/year for all-Sonnet architecture. Result: production agent fleet running at <$500/year total cost.


Contents


The Problem

Running multiple Claude Code agents gets expensive fast. A single Sonnet agent costs $150-600 per year. Scale to 10 agents: $1,500-6,000 annually.

Manual oversight doesn’t scale. And no documented patterns exist for multi-agent orchestration in Claude Code.

How do you scale without scaling costs?

The Solution in 30 Seconds

Sub-agent delegation solves this.

What: Specialized agents on cheaper models (Haiku: $0.25/$1.25 per million tokens) handle bounded tasks. Parent agents on strategic models (Sonnet: $3/$15) delegate and synthesize.

Why: 92% cost reduction. Parent handles judgment. Sub-agents handle data fetching and transforms.

Result: We went from 4 to 35 agents in 90 days. Annual cost: under $500 vs $5,000+.

The pattern: Identify repetitive task → create spec → delegate → sub-agent executes → parent continues.

Production numbers: data research tasks dropped from $0.05-0.10 to $0.001-0.002 per query. 96% savings.

What Are Sub-Agents?

Sub-agents are specialized, cost-optimized Claude Code agents invoked by parent agents for bounded, repetitive tasks. They run cheaper models (Haiku: $0.25/$1.25 per million tokens) while parent agents use more capable models (Sonnet: $3/$15 per million tokens) for strategic work, reducing costs by 85-92%.

FeatureAgentSub-Agent
ScopeProject-level (persistent)Task-level (ephemeral)
LocationOwn directory (~/agents/name/)Invoked via parent
ModelStrategic choice (Sonnet/Opus)Cost-optimized (Haiku)
AutonomyHigh (self-directed)Bounded (clear spec)
StateMaintains state filesStateless or parent-managed
LifespanPersistent across sessionsSingle-task execution
Cost$150-600/year$2-8/year
ExamplesParent Agent, Research AgentSubagent A, Subagent B

The difference matters. Agents are long-lived infrastructure. Sub-agents are task-specific tools invoked on demand.

Analogy: An agent is a department (Marketing, Engineering). A sub-agent is a contractor hired for a specific deliverable.

The Economics of Sub-Agents

Cost optimization isn’t about being cheap. It’s about allocating budget to judgment instead of data fetching.

Real cost comparison from production usage:

Task TypeSonnet CostHaiku CostSavings
Data research$0.05-0.10$0.001-0.00296%
Metric tracking$0.03-0.08$0.001-0.00295%
Content scanning$0.08-0.15$0.002-0.00597%
Annual (1x/week)$150-600$2-892%

ROI: Monolithic Sonnet agent: $150-600/year. Parent + 4 Haiku sub-agents: $38-82/year total.

Savings: 85-92% compared to all-Sonnet architecture.

The decision tree is straightforward. Delegate to Haiku when the task has clear input/output specs and requires no judgment. Handle in Sonnet when pattern recognition or synthesis matters. Escalate to Opus when you’re architecting something novel or requirements are ambiguous.

The principle: Would you pay a senior strategist $150/hour to copy-paste from an API? No. You’d have someone junior fetch the data and let the strategist synthesize. Same applies here.

Can Claude Code Agents Call Other Agents?

Yes. Parent agents spawn sub-agents by providing clear task specifications in their CLAUDE.md configuration file. Sub-agents execute bounded tasks and return structured results to the parent. This enables cost-optimized delegation without framework dependencies.

The mechanism:

  1. Parent identifies repetitive task (3+ occurrences)
  2. Creates sub-agent spec with clear inputs/outputs
  3. Configures delegation in CLAUDE.md
  4. Spawns sub-agent when needed
  5. Sub-agent executes, returns results
  6. Parent continues with enriched context

Concrete example: Parent Agent (Sonnet) delegates data research to Subagent A (Haiku). Cost drops from $0.05-0.10 per query to $0.001-0.002. Same data quality. 96% cost reduction.

No framework required. No LangChain. No CrewAI. Just clear specifications and Claude Code’s native agent invocation.

Implementation Guide: Creating Your First Sub-Agent

Implementation guide illustration

Building a sub-agent takes 5 steps. Total time: 30-60 minutes for your first one. Future sub-agents take 15-20 minutes once you know the pattern.

Step 1: Identify Repetitive Tasks

Review your agent’s work log. Look for tasks you’re doing 3+ times with predictable patterns.

Good sub-agent candidates:

Generic examples:

Bad sub-agent candidates:

If you’ve done something manually 3 times and can describe the input/output format clearly, you have a sub-agent opportunity.

Step 2: Choose Model Tier

Model selection determines cost and capability. Choose wrong and you either overpay or get poor results.

Haiku-appropriate tasks:

Sonnet-appropriate tasks:

Decision rule: If you’re 99.9% certain the task will complete correctly given a clear spec, use Haiku. If moderate complexity or judgment is involved, use Sonnet.

Cost matters. A Haiku sub-agent running weekly costs $0.10/year. A Sonnet sub-agent running weekly costs $1.56-5.20/year. That’s a 15-50x difference. Choose appropriately.

Step 3: Write Sub-Agent Specification

Clear specs prevent confusion and failed tasks. Your spec needs four elements: input format, output format, success criteria, and error handling.

Template:

## Sub-Agent: data-fetcher

**Model:** Haiku
**Use When:** Need data from third-party API

**Input:**
- Query parameters (5-15 items)

**Output:**
- Markdown report with structured data
- Summary table
- Recommendations

**Success Criteria:**
- All queries processed
- Data retrieved successfully
- Report generated at reports/data/

**Cost Estimate:** ~$0.001-0.002 per query

Be specific. “Fetch data” is vague. “Call API with parameters, return markdown table with fields X, Y, Z, and top 5 recommendations” is clear.

The spec is a contract. Parent provides these inputs. Sub-agent delivers this output. No ambiguity.

Step 4: Configure Parent Agent

Add delegation table to parent’s CLAUDE.md configuration:

## Sub-Agent Fleet

| Sub-Agent | Model | Use When | Cost |
|-----------|-------|----------|------|
| data-fetcher | Haiku | Need external data | ~$0.001-0.002 |
| metric-tracker | Haiku | Check status metrics | ~$0.001-0.002 |

Add autonomy policy so parent doesn’t ask permission every time:

## Sub-Agent Autonomy

You are authorized to spawn sub-agents autonomously. Don't ask permission.

**Model Selection:**
- Haiku: Task has clear spec, 99.9% certain to complete
- Sonnet: Moderate complexity, judgment required

This configuration tells the parent: these sub-agents exist, here’s when to use them, spawn them without asking.

The result: Autonomous delegation. Parent recognizes “I need data” and spawns data-fetcher immediately. No human in the loop.

Step 5: Test and Validate

Run the sub-agent on sample tasks before trusting it with production work.

Validation checklist:

Measure cost savings vs having the parent do the work. If you’re not seeing 85%+ reduction for Haiku sub-agents, something’s wrong with your delegation pattern.

Document results. Your sub-agent’s first run should include a report showing actual cost, output quality, and whether it met success criteria. This creates accountability.

Real-World Example: Parent Agent Sub-Agent Fleet

Parent Agent architecture with cost labels

Production architecture example. Parent Agent started as one Sonnet agent doing everything. Now it’s a parent plus 4 specialized sub-agents.

Before: Monolithic Agent

Parent Agent was initially one Sonnet agent handling:

Cost: ~$150-600/year (all Sonnet) Problem: Strategic work (brief synthesis) and tactical work (data fetching) mixed together. No cost optimization. Parent model doing grunt work.

After: Parent + Sub-Agent Fleet

Parent Agent (Sonnet)
├── Subagent A: data-fetcher (Haiku)
├── Subagent B: metric-tracker (Haiku)
├── Subagent C: content-analyzer (Haiku)
└── Subagent D: search-simulator (Sonnet)

Delegation pattern:

TaskHandlerModelCost
Data researchSubagent AHaiku$0.001-0.002
Metric trackingSubagent BHaiku$0.001-0.002
Content scanningSubagent CHaiku$0.002-0.005
Search simulationSubagent DSonnet$0.03-0.10
Brief generationParent AgentSonnet$0.03-0.10

Annual cost breakdown (52 weeks):

Total: $17-56/year vs $150-600/year monolithic Savings: 85-92%

Performance metrics:

The key insight: Parent focuses on strategic synthesis. Sub-agents act as sensors gathering landscape data. This creates a “sensor-actuator” pattern. Sub-agents sense (data volumes, content activity, metric positions). Parent decides what to do with that intelligence.

Building Agentic Workflows: How We Scaled from 4 to 35 Agents

Kaxo went from 4 agents to 35 agents in 90 days. Here’s the scaling trajectory and what changed at each phase.

Timeline:

Phase 1: Identify Meta-Agent Opportunities

Meta-agents are reusable across projects. Instead of every agent implementing shared capabilities separately, extract them once.

We extracted:

Pattern recognition: If 3+ agents need the same capability, extract it as a meta-agent available to all.

This creates infrastructure reuse. New agents inherit capabilities instead of rebuilding them.

Phase 2: Post-Task Self-Assessment

After completing any task, agents now ask themselves:

  1. Did I do anything 3+ times that a Haiku sub-agent could have done?
  2. Was there a bounded, repeatable step I handled manually?
  3. Could any part be extracted as a deterministic sub-agent?

If YES: Document in report under “Sub-Agent Opportunities”

This creates a continuous improvement feedback loop. Task execution leads to reflection. Reflection leads to pattern detection. Pattern detection leads to new sub-agents. New sub-agents improve fleet efficiency.

Generic example: Content Agent noticed it was manually structuring outlines from briefs. Same pattern every time: parse brief, extract sections, format as hierarchical outline. Created structure-generator (Haiku) sub-agent. Cost: ~$0.001-0.002 per outline vs $0.03-0.10 doing it in Sonnet.

Phase 3: Global Instructions Pattern

Created ~/.claude/CLAUDE.md with shared context visible to all agents:

New agents inherit this automatically. No need to document the same patterns 35 times.

Cost savings from global context: Reduced configuration overhead by ~60%. New agent setup went from 2 hours to 30 minutes.

Phase 4: Fleet Taxonomy

Organized agents into clear hierarchy:

Meta-Agents (17): Reusable infrastructure

Domain-Specific (18): Project-focused work

Delegation decision tree

Key scaling principles that enabled 4→35 growth:

  1. Delegation Decision Tree: Haiku for clear tasks, Sonnet for judgment, Opus for novel architecture
  2. Autonomy Policies: Sub-agents spawn without approval if task spec is clear
  3. Cost Tracking: Every agent reports estimated cost per task in execution logs
  4. State Management: Agents maintain state files for continuity across sessions
  5. Post-Task Assessment: Continuous pattern detection and sub-agent opportunity identification

Result: 35-agent fleet running at <$500/year total cost vs $5,000+/year if built entirely with Sonnet agents.

The economic argument: Cost per agent dropped from $150-600/year to $14-28/year average. This makes agent proliferation economically viable. You can afford to create specialized agents for narrow tasks because the incremental cost is negligible.

Key Takeaways


FAQ

What are sub-agents in Claude Code?

Sub-agents are specialized, cost-optimized Claude Code agents invoked by parent agents for bounded tasks. They run cheaper models (Haiku: $0.25/$1.25 per million tokens) for simple tasks while parent agents use more capable models (Sonnet: $3/$15 per million tokens) for strategic work, reducing costs by up to 92%.

What’s the difference between Claude Code agents and sub-agents?

Agents are project-level with persistent state and strategic scope, while sub-agents are task-level, ephemeral, and cost-optimized. Agents live in their own directories with state files. Sub-agents are invoked by parents via clear task specifications and return results without maintaining persistent state.

How can I reduce Claude Code API costs?

Use sub-agent delegation to run simple tasks on Haiku ($0.001-0.002 per call) instead of Sonnet ($0.03-0.10 per call). Identify tasks repeated 3+ times with clear input/output specs. Create sub-agent specifications for those tasks. Configure parent to delegate autonomously. A typical sub-agent fleet costs $2-8/year vs $150-600/year for all-Sonnet architecture.

Can Claude Code agents call other agents?

Yes. Parent agents spawn sub-agents by providing clear task specifications in their CLAUDE.md configuration file. Sub-agents execute bounded tasks and return structured results to the parent. This enables cost-optimized delegation without framework dependencies like LangChain or CrewAI.

How do I scale Claude Code from one agent to many?

Start with task decomposition. Identify repetitive tasks (3+ occurrences) in your current agent’s work. Create sub-agents for bounded tasks using Haiku. Use global instructions file (~/.claude/CLAUDE.md) for shared context across agents. Extract reusable capabilities as meta-agents. Follow post-task self-assessment pattern to continuously identify new sub-agent opportunities.

When should I use a sub-agent vs doing it myself?

Delegate to sub-agents when: (1) Task repeated 3+ times, (2) Clear input/output specification exists, (3) No strategic judgment required, (4) Cost matters. Handle directly in parent when: Task requires context from previous work, judgment calls needed, or it’s a one-off task. Use the 99.9% certainty rule: if you’re 99.9% certain a Haiku sub-agent will complete the task correctly given a clear spec, delegate it.

What’s the cheapest way to run Claude Code agents?

Use Haiku sub-agents for data fetching, structured output generation, and simple file transforms. Reserve Sonnet for strategic analysis, pattern synthesis, and content generation. Use post-task self-assessment to continuously identify delegation opportunities. Track costs per task type and optimize based on actual usage. Typical savings: 85-92% compared to all-Sonnet architecture.


Next Steps

Ready to scale your AI agent infrastructure without scaling costs?

Kaxo Technologies specializes in building production agent systems for Canadian SMBs. We scaled from 4 to 35 agents using the patterns in this guide.

Our services:

Book a discovery call to discuss your automation needs.

Location: Ontario, Canada Expertise: AI automation, agentic workflows, Claude Code infrastructure


Soli Deo Gloria

Written by
The Kaxo Team
Last Updated: January 2, 2026
Back to Insights