Scaling Claude Code: 4 to 35 Agents in 90 Days

Build scalable Claude Code systems with sub-agent delegation. Real patterns from growing 4 agents to 35+ in 90 days. Infrastructure that self-improves.

Scaling Claude Code: 4 to 35 Agents in 90 Days | Kaxo

TL;DR: Sub-agent delegation lets you run Claude Code agents at 92% lower cost. Parent agents (Sonnet) handle strategic work while sub-agents (Haiku) handle bounded tasks like data fetching. Kaxo scaled from 4 to 35 agents in 90 days using this pattern. Typical sub-agent costs $2-8/year vs $150-600/year for all-Sonnet architecture. Result: production agent fleet running at <$500/year total cost.


What Is Sub-Agent Architecture? (Quick Answer)

Sub-agent architecture lets you delegate tasks from a main Claude Code agent to specialized “helper” agents, each with their own context and model. Key benefit: Reduce API costs by 92% by using cheaper models (Haiku) for routine tasks while reserving expensive models (Sonnet) for strategic work.

When to use:

  • ✅ Task repeated 3+ times
  • ✅ Clear inputs/outputs
  • ✅ Doesn’t need creative problem-solving

Real example: Kaxo Technologies scaled from 4 to 35 agents using this pattern, reducing annual costs from $6,500 to $500.


Contents


The Problem

Running multiple Claude Code agents gets expensive fast. A single Sonnet agent costs $150-600 per year. Scale to 10 agents: $1,500-6,000 annually.

Manual oversight doesn’t scale. And no documented patterns exist for multi-agent orchestration in Claude Code.

How do you scale without scaling costs?

The Solution in 30 Seconds

Sub-agent delegation solves this.

What: Specialized agents on cheaper models (Haiku : $0.25/$1.25 per million tokens) handle bounded tasks. Parent agents on strategic models (Sonnet: $3/$15) delegate and synthesize.

Why: 92% cost reduction. Parent handles judgment. Sub-agents handle data fetching and transforms.

92% cost reduction

Result: We went from 4 to 35 agents in 90 days. Annual cost: under $500 vs $5,000+.

$500/year total cost

The pattern: Identify repetitive task → create spec → delegate → sub-agent executes → parent continues.

Production numbers: data research tasks dropped from $0.05-0.10 to $0.001-0.002 per query. 96% savings.


What Are Sub-Agents?

Sub-agents are specialized, cost-optimized Claude Code agents invoked by parent agents for bounded, repetitive tasks. They run cheaper models (Haiku : $0.25/$1.25 per million tokens) while parent agents use more capable models (Sonnet: $3/$15 per million tokens) for strategic work, reducing costs by 85-92%.

FeatureAgentSub-Agent
ScopeProject-level (persistent)Task-level (ephemeral)
LocationOwn directory (~/agents/name/)Invoked via parent
ModelStrategic choice (Sonnet/Opus)Cost-optimized (Haiku)
AutonomyHigh (self-directed)Bounded (clear spec)
StateMaintains state filesStateless or parent-managed
LifespanPersistent across sessionsSingle-task execution
Cost$150-600/year$2-8/year
ExamplesParent Agent, Research AgentSubagent A, Subagent B

The difference matters. Agents are long-lived infrastructure. Sub-agents are task-specific tools invoked on demand.

Analogy: An agent is a department (Marketing, Engineering). A sub-agent is a contractor hired for a specific deliverable.

The Economics of Sub-Agents

Cost optimization isn’t about being cheap. It’s about allocating budget to judgment instead of data fetching.

Real cost comparison from production usage:

Task TypeSonnet CostHaiku CostSavings
Data research$0.05-0.10$0.001-0.00296%
Metric tracking$0.03-0.08$0.001-0.00295%
Content scanning$0.08-0.15$0.002-0.00597%
Annual (1x/week)$150-600$2-892%

ROI: Monolithic Sonnet agent: $150-600/year. Parent + 4 Haiku sub-agents: $38-82/year total.

Savings: 85-92% compared to all-Sonnet architecture.

85-92% cost savings

The decision tree is straightforward. Delegate to Haiku when the task has clear input/output specs and requires no judgment. Handle in Sonnet when pattern recognition or synthesis matters. Escalate to Opus when you’re architecting something novel or requirements are ambiguous.

The principle: Would you pay a senior strategist $150/hour to copy-paste from an API? No. You’d have someone junior fetch the data and let the strategist synthesize. Same applies here.

Sub-Agent Cost Comparison (Verified Results)

MetricBefore (All Sonnet)After (Sub-Agent Mix)Savings
Monthly API Cost$542$4292%
Cost Per Task$0.08 avg$0.006 avg92.5%
Annual Projection$6,500$500$6,000 saved
Agent Count435775% increase
Tasks Automated~50/month~400/month700% increase

Source: Kaxo Technologies internal infrastructure (Dec 2025 - Jan 2026) Configuration: 1 Sonnet main agent + 34 Haiku sub-agents


Can Claude Code Agents Call Other Agents?

Yes. Parent agents spawn sub-agents by providing clear task specifications in their CLAUDE.md configuration file . Sub-agents execute bounded tasks and return structured results to the parent. This enables cost-optimized delegation without framework dependencies.

The mechanism:

  1. Parent identifies repetitive task (3+ occurrences)
  2. Creates sub-agent spec with clear inputs/outputs
  3. Configures delegation in CLAUDE.md
  4. Spawns sub-agent when needed
  5. Sub-agent executes, returns results
  6. Parent continues with enriched context

Concrete example: Parent Agent (Sonnet) delegates data research to Subagent A (Haiku). Cost drops from $0.05-0.10 per query to $0.001-0.002. Same data quality. 96% cost reduction.

No framework required. No LangChain. No CrewAI. Just clear specifications and Claude Code’s native agent invocation .

Implementation Guide: Creating Your First Sub-Agent

Implementation guide illustration

Building a sub-agent takes 5 steps. Total time: 30-60 minutes for your first one. Future sub-agents take 15-20 minutes once you know the pattern.

Step 1: Identify Repetitive Tasks

Review your agent’s work log. Look for tasks you’re doing 3+ times with predictable patterns.

Good sub-agent candidates:

  • API calls that fetch structured data
  • File parsing and transformation
  • Report generation with consistent format
  • Simple pattern matching across files

Generic examples:

  • Data lookups from third-party APIs became data-fetcher (Haiku)
  • Status checks became metric-tracker (Haiku)
  • Content parsing became content-analyzer (Haiku)
  • Search result simulation became search-simulator (Sonnet, requires moderate judgment)

Bad sub-agent candidates:

  • One-off tasks
  • Work requiring strategic judgment
  • Tasks with unclear success criteria
  • Creative synthesis

If you’ve done something manually 3 times and can describe the input/output format clearly, you have a sub-agent opportunity.

Step 2: Choose Model Tier

Model selection determines cost and capability. Choose wrong and you either overpay or get poor results.

Haiku-appropriate tasks:

  • Data fetching from APIs
  • Structured report generation
  • File parsing and transformation
  • Simple pattern matching
  • Cost: ~$0.001-0.002 per task

Sonnet-appropriate tasks:

  • Strategic analysis requiring context
  • Content brief generation
  • Pattern synthesis across multiple files
  • Moderate complexity judgment calls
  • Cost: ~$0.03-0.10 per task

Decision rule: If you’re 99.9% certain the task will complete correctly given a clear spec, use Haiku. If moderate complexity or judgment is involved, use Sonnet.

Cost matters. A Haiku sub-agent running weekly costs $0.10/year. A Sonnet sub-agent running weekly costs $1.56-5.20/year. That’s a 15-50x difference. Choose appropriately.

Step 3: Write Sub-Agent Specification

Clear specs prevent confusion and failed tasks. Your spec needs four elements: input format, output format, success criteria, and error handling.

Template:

## Sub-Agent: data-fetcher

**Model:** Haiku
**Use When:** Need data from third-party API

**Input:**
- Query parameters (5-15 items)

**Output:**
- Markdown report with structured data
- Summary table
- Recommendations

**Success Criteria:**
- All queries processed
- Data retrieved successfully
- Report generated at reports/data/

**Cost Estimate:** ~$0.001-0.002 per query

Be specific. “Fetch data” is vague. “Call API with parameters, return markdown table with fields X, Y, Z, and top 5 recommendations” is clear.

The spec is a contract. Parent provides these inputs. Sub-agent delivers this output. No ambiguity.

Step 4: Configure Parent Agent

Add delegation table to parent’s CLAUDE.md configuration:

## Sub-Agent Fleet

| Sub-Agent | Model | Use When | Cost |
|-----------|-------|----------|------|
| data-fetcher | Haiku | Need external data | ~$0.001-0.002 |
| metric-tracker | Haiku | Check status metrics | ~$0.001-0.002 |

Add autonomy policy so parent doesn’t ask permission every time:

## Sub-Agent Autonomy

You are authorized to spawn sub-agents autonomously. Don't ask permission.

**Model Selection:**
- Haiku: Task has clear spec, 99.9% certain to complete
- Sonnet: Moderate complexity, judgment required

This configuration tells the parent: these sub-agents exist, here’s when to use them, spawn them without asking.

The result: Autonomous delegation. Parent recognizes “I need data” and spawns data-fetcher immediately. No human in the loop.

Step 5: Test and Validate

Run the sub-agent on sample tasks before trusting it with production work.

Validation checklist:

  • Output matches expected format
  • Success criteria met
  • Cost within estimate
  • Error handling works
  • No hallucinations or incorrect data

Measure cost savings vs having the parent do the work. If you’re not seeing 85%+ reduction for Haiku sub-agents, something’s wrong with your delegation pattern.

Document results. Your sub-agent’s first run should include a report showing actual cost, output quality, and whether it met success criteria. This creates accountability.

Common Problems Solved by Sub-Agent Architecture

“My Claude Code costs are too high”

Solution: Delegate 80% of tasks to Haiku sub-agents ($0.001/query) instead of using Sonnet ($0.03/task) for everything. Result: 92% cost reduction.

Quick fix: Identify your top 5 most frequent tasks. If they have clear inputs/outputs, convert them to Haiku sub-agents this week.

“I hit API rate limits constantly”

Solution: Sub-agents run in parallel with separate rate limits. Distribute load across multiple Haiku instances instead of queuing everything through one Sonnet agent.

Quick fix: Create 3-4 sub-agents for your highest-volume tasks. Each gets its own rate limit allocation.

“My agent context gets too bloated”

Solution: Sub-agents use isolated context windows. Main agent stays focused on strategy while sub-agents handle execution details that don’t need to persist.

Quick fix: Move any task that generates >1000 tokens of output to a sub-agent. Main agent receives only the summary.

Real-World Example: Parent Agent Sub-Agent Fleet

Parent Agent architecture with cost labels

Production architecture example. Parent Agent started as one Sonnet agent doing everything. Now it’s a parent plus 4 specialized sub-agents.

Before: Monolithic Agent

Parent Agent was initially one Sonnet agent handling:

  • Data research (manual API calls)
  • Content monitoring (manual scanning)
  • Metric tracking (manual checks)
  • Search simulation (testing queries)
  • Brief generation

Cost: ~$150-600/year (all Sonnet) Problem: Strategic work (brief synthesis) and tactical work (data fetching) mixed together. No cost optimization. Parent model doing grunt work.

After: Parent + Sub-Agent Fleet

Parent Agent (Sonnet)
├── Subagent A: data-fetcher (Haiku)
├── Subagent B: metric-tracker (Haiku)
├── Subagent C: content-analyzer (Haiku)
└── Subagent D: search-simulator (Sonnet)

Delegation pattern:

TaskHandlerModelCost
Data researchSubagent AHaiku$0.001-0.002
Metric trackingSubagent BHaiku$0.001-0.002
Content scanningSubagent CHaiku$0.002-0.005
Search simulationSubagent DSonnet$0.03-0.10
Brief generationParent AgentSonnet$0.03-0.10

Annual cost breakdown (52 weeks):

  • Subagent A (data-fetcher): 1x/week = ~$0.05-0.10/year
  • Subagent B (metric-tracker): 7x/week = ~$0.35-0.70/year
  • Subagent C (content-analyzer): bi-weekly = ~$0.10-0.25/year
  • Subagent D (search-simulator): 1x/week = ~$1.56-5.20/year
  • Parent Agent: 4x/month = ~$15-50/year

Total: $17-56/year vs $150-600/year monolithic Savings: 85-92%

Performance metrics:

  • Brief generation time: 3x faster (sub-agents run in parallel)
  • Data quality: Same or better (specialized agents with clear specs)
  • Cost per brief: 92% reduction
  • Autonomy: Parent Agent spawns sub-agents without human approval

The key insight: Parent focuses on strategic synthesis. Sub-agents act as sensors gathering landscape data. This creates a “sensor-actuator” pattern . Sub-agents sense (data volumes, content activity, metric positions). Parent decides what to do with that intelligence.

Building Agentic Workflows: How We Scaled from 4 to 35 Agents

Kaxo went from 4 agents to 35 agents in 90 days. Here’s the scaling trajectory and what changed at each phase.

Timeline:

  • Day 0: 4 agents (core infrastructure)
  • Day 30: 10 agents (added 6 meta-agents for shared infrastructure)
  • Day 90: 35 agents (17 meta-agents + 18 domain-specific agents)

Phase 1: Identify Meta-Agent Opportunities

Meta-agents are reusable across projects. Instead of every agent implementing shared capabilities separately, extract them once.

We extracted:

  • Permission management (adds safe permissions automatically when prompted 3+ times)
  • Configuration sync (synchronizes shared CLAUDE.md sections across agents)
  • Session reporting (generates daily session reports from transcripts)
  • Error analysis (analyzes failure patterns across agent fleet)
  • Cost monitoring (tracks API spend by agent and model)

Pattern recognition: If 3+ agents need the same capability, extract it as a meta-agent available to all.

This creates infrastructure reuse. New agents inherit capabilities instead of rebuilding them.

The same principle applies beyond Claude Code. Platforms like OpenClaw demonstrate how autonomous agent architectures scale when capabilities are extracted into reusable skills rather than rebuilt per deployment.

Phase 2: Post-Task Self-Assessment

After completing any task, agents now ask themselves:

  1. Did I do anything 3+ times that a Haiku sub-agent could have done?
  2. Was there a bounded, repeatable step I handled manually?
  3. Could any part be extracted as a deterministic sub-agent?

If YES: Document in report under “Sub-Agent Opportunities”

This creates a continuous improvement feedback loop. Task execution leads to reflection. Reflection leads to pattern detection. Pattern detection leads to new sub-agents. New sub-agents improve fleet efficiency.

Generic example: Content Agent noticed it was manually structuring outlines from briefs. Same pattern every time: parse brief, extract sections, format as hierarchical outline. Created structure-generator (Haiku) sub-agent. Cost: ~$0.001-0.002 per outline vs $0.03-0.10 doing it in Sonnet.

Phase 3: Global Instructions Pattern

Created ~/.claude/CLAUDE.md with shared context visible to all agents:

  • Agent directory structure
  • Permission policies
  • State file patterns
  • Meta-agent registry

New agents inherit this automatically. No need to document the same patterns 35 times.

Cost savings from global context: Reduced configuration overhead by ~60%. New agent setup went from 2 hours to 30 minutes.

Phase 4: Fleet Taxonomy

Organized agents into clear hierarchy:

Meta-Agents (17): Reusable infrastructure

  • Permission management
  • Configuration sync
  • Error analysis and aggregation
  • Cost tracking and reporting
  • Documentation generation

Domain-Specific (18): Project-focused work

  • Research Agent + 4 sub-agents (data fetching, metric tracking, content monitoring, search simulation)
  • Content Agent + 3 sub-agents (outline generation, structure extraction)
  • Deploy Agent + 2 sub-agents (validation, optimization)
  • Analytics Agent + 9 sub-agents (health checking, status validation, automation)

Delegation decision tree

Key scaling principles that enabled 4→35 growth:

  1. Delegation Decision Tree: Haiku for clear tasks, Sonnet for judgment, Opus for novel architecture
  2. Autonomy Policies: Sub-agents spawn without approval if task spec is clear
  3. Cost Tracking: Every agent reports estimated cost per task in execution logs
  4. State Management: Agents maintain state files for continuity across sessions
  5. Post-Task Assessment: Continuous pattern detection and sub-agent opportunity identification

Result: 35-agent fleet running at <$500/year total cost vs $5,000+/year if built entirely with Sonnet agents.

The economic argument: Cost per agent dropped from $150-600/year to $14-28/year average. This makes agent proliferation economically viable. You can afford to create specialized agents for narrow tasks because the incremental cost is negligible.

Key Takeaways

  • Sub-agent delegation reduces Claude Code costs by 85-92% compared to all-Sonnet architecture
  • Parent agents (Sonnet) handle strategic work while sub-agents (Haiku) handle bounded, repetitive tasks
  • Typical sub-agent costs $2-8/year vs $150-600/year for traditional agent
  • Scaled from 4 to 35 agents in 90 days using this pattern, staying under $500/year total cost
  • Autonomous delegation works: Parent agents spawn sub-agents without human approval when task specs are clear
  • Post-task self-assessment creates continuous improvement: Agents identify their own sub-agent opportunities after completing work
  • Meta-agents provide reusable infrastructure across domain-specific agents, reducing configuration overhead by 60%

FAQ

What are sub-agents in Claude Code?

Sub-agents are specialized, cost-optimized Claude Code agents invoked by parent agents for bounded tasks. They run cheaper models (Haiku: $0.25/$1.25 per million tokens) for simple tasks while parent agents use more capable models (Sonnet: $3/$15 per million tokens) for strategic work, reducing costs by up to 92%.

What’s the difference between Claude Code agents and sub-agents?

Agents are project-level with persistent state and strategic scope, while sub-agents are task-level, ephemeral, and cost-optimized. Agents live in their own directories with state files. Sub-agents are invoked by parents via clear task specifications and return results without maintaining persistent state.

How can I reduce Claude Code API costs?

Use sub-agent delegation to run simple tasks on Haiku ($0.001-0.002 per call) instead of Sonnet ($0.03-0.10 per call). Identify tasks repeated 3+ times with clear input/output specs. Create sub-agent specifications for those tasks. Configure parent to delegate autonomously. A typical sub-agent fleet costs $2-8/year vs $150-600/year for all-Sonnet architecture.

Can Claude Code agents call other agents?

Yes. Parent agents spawn sub-agents by providing clear task specifications in their CLAUDE.md configuration file. Sub-agents execute bounded tasks and return structured results to the parent. This enables cost-optimized delegation without framework dependencies like LangChain or CrewAI.

How do I scale Claude Code from one agent to many?

Start with task decomposition. Identify repetitive tasks (3+ occurrences) in your current agent’s work. Create sub-agents for bounded tasks using Haiku. Use global instructions file (~/.claude/CLAUDE.md) for shared context across agents. Extract reusable capabilities as meta-agents. Follow post-task self-assessment pattern to continuously identify new sub-agent opportunities.

When should I use a sub-agent vs doing it myself?

Delegate to sub-agents when: (1) Task repeated 3+ times, (2) Clear input/output specification exists, (3) No strategic judgment required, (4) Cost matters. Handle directly in parent when: Task requires context from previous work, judgment calls needed, or it’s a one-off task. Use the 99.9% certainty rule: if you’re 99.9% certain a Haiku sub-agent will complete the task correctly given a clear spec, delegate it.

What’s the cheapest way to run Claude Code agents?

Use Haiku sub-agents for data fetching, structured output generation, and simple file transforms. Reserve Sonnet for strategic analysis, pattern synthesis, and content generation. Use post-task self-assessment to continuously identify delegation opportunities. Track costs per task type and optimize based on actual usage. Typical savings: 85-92% compared to all-Sonnet architecture.


Next Steps

Ready to scale your AI agent infrastructure without scaling costs?

Kaxo Technologies specializes in building production agent systems for Canadian SMBs . We scaled from 4 to 35 agents using the patterns in this guide.

Our services:

  • Agent architecture consulting
  • Sub-agent fleet implementation and optimization
  • Cost optimization audits for existing agent systems
  • Claude Code training and best practices workshops

Book a discovery call to discuss your automation needs.

Service Areas: Kawartha Lakes | Peterborough | Durham Region Expertise: AI automation, agentic workflows, Claude Code infrastructure


Soli Deo Gloria

About the Author

The Kaxo Team leads AI infrastructure development and autonomous agent deployment for Canadian businesses. Specializes in self-hosted AI security, multi-agent orchestration, and production automation systems. Based in Ontario, Canada.

Written by
The Kaxo Team
Last Updated: January 4, 2026
Back to Insights