Part 6 of 6
A Complete Session
End-to-End Walkthrough
Follow a real session from first message to final response across 5 passes. Every AI decision is annotated with the actual thinking behind it. Based on Coulee Tech's List Builder Employee — generalized for any record-search domain.
The Session
User Goal
“Find 50 qualified enterprise records in Region A, category: Technology”
Employee
Record Analyst
Max Passes
8
Cost Budget
$1.00
Think Model
Gemini Flash
Escalation Model
Claude Sonnet
Before the Loop: Pre-Enrichment
Pre-Enrichment Phase
Result: 2,847 total records across 4 regions. Region A: 634 records, 189 in Technology category. Pre-enriched context injected into Pass 1.
The 5 Passes
Confirm Criteria
Pass 1
AI THINKING
“I have the pre-enriched stats: 189 Technology records in Region A. That's more than 50, so this is achievable. But I should confirm the user's criteria before diving in — do they want any size, or specific company sizes? And should I exclude existing customers? Let me ask.”
LIVING DOCUMENT UPDATES
Region A, Technology category, 50 records needed
low
OUTCOME
AI responds with a clarifying question: 'I found 189 Technology records in Region A. Should I include all company sizes, or filter by size? Should I exclude existing customers from your list?'
User replies: “Any size, yes exclude existing customers”
Search & Gather
Pass 2
AI THINKING
“Criteria confirmed: Region A, Technology, any size, exclude customers. I have 189 candidates from the stats. Now I need to actually fetch them. I'll run two parallel searches — one for active records, one for pending records — plus check what customer records exist so I can exclude them. Three tools, all in parallel.”
TOOL CALLS (3 parallel)
LIVING DOCUMENT UPDATES
Region A, Technology, any size, exclude customers
Search active + pending, cross-reference against customers, score survivors
156 record IDs gathered (active: 112, pending: 44)
OUTCOME
Three tools ran in parallel. 156 records gathered. 23 existing customers identified for exclusion. 133 candidates remaining.
Cross-Reference & Filter
Pass 3
AI THINKING
“I have 133 candidates and 23 customer IDs to exclude. I should run a batch cross-reference to filter out the customers in one call. I'll also cross-reference against any existing lists the user might have — don't want to add duplicates. Two tools.”
TOOL CALLS (2 parallel)
LIVING DOCUMENT UPDATES
108 records after customer exclusion (25 excluded)
3 records already on existing lists — excluded. 105 clean candidates.
OUTCOME
Cross-reference complete. 105 clean candidates after all exclusions. Ready to score.
Score & Rank
Pass 4
AI THINKING
“105 candidates, need top 50. I'll batch score all 105 in one call — scoring considers company size, activity level, and category fit. Then I'll update my results section with the top 50 by score.”
TOOL CALLS (1 parallel)
LIVING DOCUMENT UPDATES
Top 50 by score: rec-001(94), rec-002(91), rec-003(89)... [50 records]
high
OUTCOME
All 105 records scored. Top 50 identified with scores ranging from 94 to 67. Confidence: high. Ready to respond.
Final Response
Pass 5
AI THINKING
“I have a complete scored list of 50 records. Confidence is high. I should respond now with the full table — company name, score, region, category, status — plus a summary of what I did and why these 50 were selected. should_respond: true.”
OUTCOME
Final Markdown response emitted: summary paragraph + table of 50 records with scores + methodology note. Session complete.
Cost Breakdown
This session used Gemini Flash for all 5 passes (no escalation needed — confidence reached "high" by Pass 4). Total cost: ~$0.087 for a complete, scored list of 50 records.
| Phase | Model | Input Tokens | Output Tokens | Cost |
|---|---|---|---|---|
| Pre-enrichment | Gemini Flash | ~800 | ~120 | ~$0.001 |
| Pass 1 (clarify) | Gemini Flash | ~3,200 | ~280 | ~$0.003 |
| Pass 2 (search) | Gemini Flash | ~4,800 | ~420 | ~$0.018 |
| Pass 3 (cross-ref) | Gemini Flash | ~7,200 | ~380 | ~$0.024 |
| Pass 4 (score) | Gemini Flash | ~9,400 | ~520 | ~$0.031 |
| Pass 5 (respond) | Gemini Flash | ~11,200 | ~1,200 | ~$0.011 |
| TOTAL | — | ~36,600 | ~2,920 | ~$0.087 |
The 8 Design Principles
These are the architectural decisions that make the Coulee Tech system work at production scale. Each principle is a direct response to a real problem they encountered building it.
Config, not code
Each Employee is a data object. The loop logic is shared. Adding a new Employee means writing a config file, not a new engine.
Parallel tool execution
Every pass runs all requested tools simultaneously via Promise.allSettled. A 5-tool pass takes as long as the slowest tool, not the sum of all tools.
Living document
The AI's working memory persists across passes as named sections. It doesn't need to re-state what it already knows. Each pass only updates what changed.
Deduplication
A Set<string> of tool::params signatures prevents the AI from calling the same tool twice. If all requested tools are duplicates, the loop exits.
Permission tiers
read/draft tools auto-execute. write/destructive tools pause the loop and ask the user. The AI can never modify data without human approval.
Automatic escalation
The engine detects when the AI is stuck (stale confidence, no tools requested) and automatically switches to a more capable model.
Graceful synthesis
If the loop exits for any reason other than should_respond: true, the engine calls a separate synthesis step. The user always gets a response.
Streaming throughout
Every phase emits events. The UI shows tool activity, thinking, cost, and living document updates in real time. The user is never staring at a blank screen.
Build Order for Your Own Implementation
Coulee Tech built this system incrementally. Here is the order they recommend:
Define your WorkerDefinition interface and LoopConfig types
Build 2-3 read-only tools with the ToolResult contract
Build a minimal engine: state init → single AI call → tool execution → response
Add the living document (sections + updateDocument)
Add exit conditions (maxPasses, costBudget)
Add pre-enrichment (optional but high-value)
Add permission tiers and the approval flow
Add model escalation
Add the streaming layer (async generator + SSE)
Write your first full WorkerDefinition with a real system prompt