Part 6

Part 6 of 6

A Complete Session

End-to-End Walkthrough

Follow a real session from first message to final response across 5 passes. Every AI decision is annotated with the actual thinking behind it. Based on Coulee Tech's List Builder Employee — generalized for any record-search domain.

The Session

User Goal

“Find 50 qualified enterprise records in Region A, category: Technology”

Employee

Record Analyst

Max Passes

Cost Budget

$1.00

Think Model

Gemini Flash

Escalation Model

Claude Sonnet

Before the Loop: Pre-Enrichment

Pre-Enrichment Phase

AI THINKS

The user wants records in Region A, Technology category. Before the loop starts, I should load overall stats so I know what I'm working with. I'll request data.getStats — that will tell me total record count and regional distribution.

data.getStats

Result: 2,847 total records across 4 regions. Region A: 634 records, 189 in Technology category. Pre-enriched context injected into Pass 1.

The 5 Passes

Confirm Criteria

Pass 1

~$0.003

AI THINKING

“I have the pre-enriched stats: 189 Technology records in Region A. That's more than 50, so this is achievable. But I should confirm the user's criteria before diving in — do they want any size, or specific company sizes? And should I exclude existing customers? Let me ask.”

LIVING DOCUMENT UPDATES

criteria

Region A, Technology category, 50 records needed

confidence

low

OUTCOME

AI responds with a clarifying question: 'I found 189 Technology records in Region A. Should I include all company sizes, or filter by size? Should I exclude existing customers from your list?'

User replies: “Any size, yes exclude existing customers”

Search & Gather

Pass 2

~$0.018

AI THINKING

“Criteria confirmed: Region A, Technology, any size, exclude customers. I have 189 candidates from the stats. Now I need to actually fetch them. I'll run two parallel searches — one for active records, one for pending records — plus check what customer records exist so I can exclude them. Three tools, all in parallel.”

TOOL CALLS (3 parallel)

data.searchRecords(region=A, category=Technology, status=active)

data.searchRecords(region=A, category=Technology, status=pending)

data.getCustomerList(region=A)

LIVING DOCUMENT UPDATES

criteria

Region A, Technology, any size, exclude customers

plan

Search active + pending, cross-reference against customers, score survivors

work_queue

156 record IDs gathered (active: 112, pending: 44)

OUTCOME

Three tools ran in parallel. 156 records gathered. 23 existing customers identified for exclusion. 133 candidates remaining.

Cross-Reference & Filter

Pass 3

~$0.024

AI THINKING

“I have 133 candidates and 23 customer IDs to exclude. I should run a batch cross-reference to filter out the customers in one call. I'll also cross-reference against any existing lists the user might have — don't want to add duplicates. Two tools.”

TOOL CALLS (2 parallel)

analysis.crossReference(recordIds=[...133], excludeIds=[...23])

data.getExistingLists()

LIVING DOCUMENT UPDATES

work_queue

108 records after customer exclusion (25 excluded)

findings

3 records already on existing lists — excluded. 105 clean candidates.

OUTCOME

Cross-reference complete. 105 clean candidates after all exclusions. Ready to score.

Score & Rank

Pass 4

~$0.031

AI THINKING

“105 candidates, need top 50. I'll batch score all 105 in one call — scoring considers company size, activity level, and category fit. Then I'll update my results section with the top 50 by score.”

TOOL CALLS (1 parallel)

analysis.batchScore(recordIds=[...105], criteria={category: Technology, region: A})

LIVING DOCUMENT UPDATES

results

Top 50 by score: rec-001(94), rec-002(91), rec-003(89)... [50 records]

confidence

high

OUTCOME

All 105 records scored. Top 50 identified with scores ranging from 94 to 67. Confidence: high. Ready to respond.

Final Response

Pass 5

~$0.011

AI THINKING

“I have a complete scored list of 50 records. Confidence is high. I should respond now with the full table — company name, score, region, category, status — plus a summary of what I did and why these 50 were selected. should_respond: true.”

OUTCOME

Final Markdown response emitted: summary paragraph + table of 50 records with scores + methodology note. Session complete.

Cost Breakdown

This session used Gemini Flash for all 5 passes (no escalation needed — confidence reached "high" by Pass 4). Total cost: ~$0.087 for a complete, scored list of 50 records.

Phase	Model	Input Tokens	Output Tokens	Cost
Pre-enrichment	Gemini Flash	~800	~120	~$0.001
Pass 1 (clarify)	Gemini Flash	~3,200	~280	~$0.003
Pass 2 (search)	Gemini Flash	~4,800	~420	~$0.018
Pass 3 (cross-ref)	Gemini Flash	~7,200	~380	~$0.024
Pass 4 (score)	Gemini Flash	~9,400	~520	~$0.031
Pass 5 (respond)	Gemini Flash	~11,200	~1,200	~$0.011
TOTAL	—	~36,600	~2,920	~$0.087

Why it's cheap

Gemini Flash is ~10x cheaper than Claude Sonnet per token. Using it for every decision pass (not just the final synthesis) is the key cost optimization. The escalation model only activates when needed — in this session, it never was.

The 8 Design Principles

These are the architectural decisions that make the Coulee Tech system work at production scale. Each principle is a direct response to a real problem they encountered building it.

Config, not code

Each Employee is a data object. The loop logic is shared. Adding a new Employee means writing a config file, not a new engine.

Parallel tool execution

Every pass runs all requested tools simultaneously via Promise.allSettled. A 5-tool pass takes as long as the slowest tool, not the sum of all tools.

Living document

The AI's working memory persists across passes as named sections. It doesn't need to re-state what it already knows. Each pass only updates what changed.

Deduplication

A Set<string> of tool::params signatures prevents the AI from calling the same tool twice. If all requested tools are duplicates, the loop exits.

Permission tiers

read/draft tools auto-execute. write/destructive tools pause the loop and ask the user. The AI can never modify data without human approval.

Automatic escalation

The engine detects when the AI is stuck (stale confidence, no tools requested) and automatically switches to a more capable model.

Graceful synthesis

If the loop exits for any reason other than should_respond: true, the engine calls a separate synthesis step. The user always gets a response.

Streaming throughout

Every phase emits events. The UI shows tool activity, thinking, cost, and living document updates in real time. The user is never staring at a blank screen.

Build Order for Your Own Implementation

Coulee Tech built this system incrementally. Here is the order they recommend:

Define your WorkerDefinition interface and LoopConfig types
Build 2-3 read-only tools with the ToolResult contract
Build a minimal engine: state init → single AI call → tool execution → response
Add the living document (sections + updateDocument)
Add exit conditions (maxPasses, costBudget)
Add pre-enrichment (optional but high-value)
Add permission tiers and the approval flow
Add model escalation
Add the streaming layer (async generator + SSE)
Write your first full WorkerDefinition with a real system prompt

Part 5: API Layer Back to Deep Dive Hub