AI Business Maturity Model
Certifications
Find a CoachFind a SpeakerSign In
Deep Dive

/

Part 6

Part 6 of 6

A Complete Session

End-to-End Walkthrough

Follow a real session from first message to final response across 5 passes. Every AI decision is annotated with the actual thinking behind it. Based on Coulee Tech's List Builder Employee — generalized for any record-search domain.

The Session

User Goal

“Find 50 qualified enterprise records in Region A, category: Technology”

Employee

Record Analyst

Max Passes

8

Cost Budget

$1.00

Think Model

Gemini Flash

Escalation Model

Claude Sonnet

Before the Loop: Pre-Enrichment

Pre-Enrichment Phase
data.getStats

Result: 2,847 total records across 4 regions. Region A: 634 records, 189 in Technology category. Pre-enriched context injected into Pass 1.

The 5 Passes

1

Confirm Criteria

Pass 1

~$0.003

AI THINKING

I have the pre-enriched stats: 189 Technology records in Region A. That's more than 50, so this is achievable. But I should confirm the user's criteria before diving in — do they want any size, or specific company sizes? And should I exclude existing customers? Let me ask.

LIVING DOCUMENT UPDATES

criteria

Region A, Technology category, 50 records needed

confidence

low

OUTCOME

AI responds with a clarifying question: 'I found 189 Technology records in Region A. Should I include all company sizes, or filter by size? Should I exclude existing customers from your list?'

User replies: “Any size, yes exclude existing customers”

2

Search & Gather

Pass 2

~$0.018

AI THINKING

Criteria confirmed: Region A, Technology, any size, exclude customers. I have 189 candidates from the stats. Now I need to actually fetch them. I'll run two parallel searches — one for active records, one for pending records — plus check what customer records exist so I can exclude them. Three tools, all in parallel.

TOOL CALLS (3 parallel)

data.searchRecords(region=A, category=Technology, status=active)
data.searchRecords(region=A, category=Technology, status=pending)
data.getCustomerList(region=A)

LIVING DOCUMENT UPDATES

criteria

Region A, Technology, any size, exclude customers

plan

Search active + pending, cross-reference against customers, score survivors

work_queue

156 record IDs gathered (active: 112, pending: 44)

OUTCOME

Three tools ran in parallel. 156 records gathered. 23 existing customers identified for exclusion. 133 candidates remaining.

3

Cross-Reference & Filter

Pass 3

~$0.024

AI THINKING

I have 133 candidates and 23 customer IDs to exclude. I should run a batch cross-reference to filter out the customers in one call. I'll also cross-reference against any existing lists the user might have — don't want to add duplicates. Two tools.

TOOL CALLS (2 parallel)

analysis.crossReference(recordIds=[...133], excludeIds=[...23])
data.getExistingLists()

LIVING DOCUMENT UPDATES

work_queue

108 records after customer exclusion (25 excluded)

findings

3 records already on existing lists — excluded. 105 clean candidates.

OUTCOME

Cross-reference complete. 105 clean candidates after all exclusions. Ready to score.

4

Score & Rank

Pass 4

~$0.031

AI THINKING

105 candidates, need top 50. I'll batch score all 105 in one call — scoring considers company size, activity level, and category fit. Then I'll update my results section with the top 50 by score.

TOOL CALLS (1 parallel)

analysis.batchScore(recordIds=[...105], criteria={category: Technology, region: A})

LIVING DOCUMENT UPDATES

results

Top 50 by score: rec-001(94), rec-002(91), rec-003(89)... [50 records]

confidence

high

OUTCOME

All 105 records scored. Top 50 identified with scores ranging from 94 to 67. Confidence: high. Ready to respond.

5

Final Response

Pass 5

~$0.011

AI THINKING

I have a complete scored list of 50 records. Confidence is high. I should respond now with the full table — company name, score, region, category, status — plus a summary of what I did and why these 50 were selected. should_respond: true.

OUTCOME

Final Markdown response emitted: summary paragraph + table of 50 records with scores + methodology note. Session complete.

Cost Breakdown

This session used Gemini Flash for all 5 passes (no escalation needed — confidence reached "high" by Pass 4). Total cost: ~$0.087 for a complete, scored list of 50 records.

PhaseModelInput TokensOutput TokensCost
Pre-enrichmentGemini Flash~800~120~$0.001
Pass 1 (clarify)Gemini Flash~3,200~280~$0.003
Pass 2 (search)Gemini Flash~4,800~420~$0.018
Pass 3 (cross-ref)Gemini Flash~7,200~380~$0.024
Pass 4 (score)Gemini Flash~9,400~520~$0.031
Pass 5 (respond)Gemini Flash~11,200~1,200~$0.011
TOTAL~36,600~2,920~$0.087

The 8 Design Principles

These are the architectural decisions that make the Coulee Tech system work at production scale. Each principle is a direct response to a real problem they encountered building it.

1

Config, not code

Each Employee is a data object. The loop logic is shared. Adding a new Employee means writing a config file, not a new engine.

2

Parallel tool execution

Every pass runs all requested tools simultaneously via Promise.allSettled. A 5-tool pass takes as long as the slowest tool, not the sum of all tools.

3

Living document

The AI's working memory persists across passes as named sections. It doesn't need to re-state what it already knows. Each pass only updates what changed.

4

Deduplication

A Set<string> of tool::params signatures prevents the AI from calling the same tool twice. If all requested tools are duplicates, the loop exits.

5

Permission tiers

read/draft tools auto-execute. write/destructive tools pause the loop and ask the user. The AI can never modify data without human approval.

6

Automatic escalation

The engine detects when the AI is stuck (stale confidence, no tools requested) and automatically switches to a more capable model.

7

Graceful synthesis

If the loop exits for any reason other than should_respond: true, the engine calls a separate synthesis step. The user always gets a response.

8

Streaming throughout

Every phase emits events. The UI shows tool activity, thinking, cost, and living document updates in real time. The user is never staring at a blank screen.

Build Order for Your Own Implementation

Coulee Tech built this system incrementally. Here is the order they recommend:

  1. Define your WorkerDefinition interface and LoopConfig types

  2. Build 2-3 read-only tools with the ToolResult contract

  3. Build a minimal engine: state init → single AI call → tool execution → response

  4. Add the living document (sections + updateDocument)

  5. Add exit conditions (maxPasses, costBudget)

  6. Add pre-enrichment (optional but high-value)

  7. Add permission tiers and the approval flow

  8. Add model escalation

  9. Add the streaming layer (async generator + SSE)

  10. Write your first full WorkerDefinition with a real system prompt


Part 5: API LayerBack to Deep Dive Hub