AI Business Maturity Model
Certifications
Find a CoachFind a SpeakerSign In

Recursive AI Tool Loop

A production-proven pattern for building AI systems that analyze data, decide what they need, fetch it, and refine their own answer — without human intervention. The loop is self-terminating: the AI reports its own confidence, and the orchestrator decides when to stop.

Deep Dive Available — Built by Coulee Tech

Coulee Tech built a production AI Employee system on this pattern with 91 tools and multiple specialized Employees. Their detailed case study is available as a 6-part deep dive with annotated code, flowcharts, and AI thinking callouts.

Explore the Deep Dive

What This System Does

Given a piece of work (a support ticket, sales lead, code review, patient intake — any domain), the system:

  1. Reads the work item and its metadata
  2. Guesses which external data sources would help, and pre-fetches them
  3. Analyzes the work item using an LLM, with pre-fetched data injected into the prompt
  4. Decides whether it has enough information or needs more
  5. Fetches additional data using tools the LLM requested
  6. Re-analyzes with new data merged in, updating only the sections that changed
  7. Repeats steps 4–6 until confident, out of budget, or at the pass limit
  8. Saves the final analysis and makes it available to the human

Architecture Overview

Loop State Machine

text
┌──────────────────────────────────────────────────────────────────┐
│                        ORCHESTRATOR                              │
│                                                                  │
│  1. Pre-Enrichment ─────────────────────────────────────────┐   │
│     │  Lightweight AI call picks which tools to run         │   │
│     │  Selected tools execute in parallel                   │   │
│     │  Results injected into Pass 1 system prompt           │   │
│     ▼                                                       │   │
│  2. Pass 1: Main Analysis ──────────────────────────────────┤   │
│     │  Streaming LLM call with full prompt + enriched data  │   │
│     │  Response parsed into section map (living document)   │   │
│     ▼                                                       │   │
│  3. Decision Gate ──────────────────────────────────────────┤   │
│     │  Extract confidence + ai_actions from response        │   │
│     │  Check exit conditions (confident? budget? max pass?) │   │
│     │  Filter to new, non-duplicate actions                 │   │
│     ▼                                                       │   │
│  4. Tool Execution ─────────────────────────────────────────┤   │
│     │  Run required + recommended tools in parallel         │   │
│     │  Each tool returns { summary, markdown, data }        │   │
│     │  Markdown accumulated into enriched context           │   │
│     ▼                                                       │   │
│  5. Refinement Pass (2, 3, ... up to 5) ───────────────────┤   │
│     │  System prompt = base rules + ALL enriched data       │   │
│     │  User message = current document + tool summaries     │   │
│     │  AI outputs ONLY sections that need changes           │   │
│     │  Sections merged into existing document               │   │
│     → Back to Decision Gate (step 3)                        │   │
│     ▼                                                       │   │
│  6. Complete ───────────────────────────────────────────────┘   │
│     Auto-save analysis to backend                               │
│     Restore model if escalated during the loop                  │
└──────────────────────────────────────────────────────────────────┘

Data Flow

text
Work Item (ticket, order, case, etc.)
    │
    ▼
Pre-Enrichment AI ──selects──▶ Tool Registry ──dispatches──▶ Individual Tools
                                                                   │
                                               graphqlFetch() / REST / AI sub-call
                                                                   │
                                                           ToolResult {
                                                             summary,   ← for UI
                                                             markdown,  ← for prompt injection
                                                             data       ← for structured access
                                                           }
                                                                   │
                                       Injected into LLM system prompt as context

The Five Key Components

1
The Orchestrator

Manages the loop lifecycle — phases, pass counting, budget tracking, tool deduplication, model escalation, and exit conditions.

2
The AI Analysis Engine

Sends the LLM call (streaming), manages prompt templates, and fires a completion callback when the stream finishes.

3
The Tool System

A registry of async functions that fetch data from external systems. Each tool takes standardized inputs and returns a standardized result.

4
The Pre-Enrichment Engine

Before the main analysis starts, a lightweight AI call decides which tools to run upfront — replacing keyword matching with intent reasoning.

5
Smart Tools

Tools that optionally use an AI sub-call to post-process their own results before returning — filtering 40 results down to the 2-3 most relevant.

Orchestrator Exit Conditions

Checked at every Decision Gate — the loop stops when any of these are true:

ConditionTrigger
ConfidentAI reports high confidence and has no required/recommended actions
No new actionsAI requested tools, but all were already run
Max passesHard limit (5 passes) to prevent infinite loops
Stale confidenceSame confidence level for 2+ consecutive passes
Budget exceededTotal cost exceeds per-analysis budget ($0.10 default)
Manual stopUser disabled auto mode or clicked abort

Orchestrator Decision Gate (pseudocode)

pseudocode
on_analysis_complete(response, usage):
    update living document (build or merge sections)
    record pass in history

    if pass >= MAX_PASSES → exit("max-passes")
    if total_cost >= budget → exit("budget-exceeded")

    actions = extract_ai_actions(response)
    new_actions = actions.filter(not already completed)

    if new_actions is empty:
        if confidence == "high" → exit("confident")
        else → exit("no-new-actions")

    if pass >= 3 and confidence == last_confidence → exit("stale-confidence")

    run_tools(new_actions)
    → on tools complete → build refinement prompt → start next pass

The Tool System

Each tool is a function that fetches ONE type of data from ONE source. Think of them as sensors — each one looks at the world from a different angle.

TypeDescriptionAI CostExample
Script toolsPure data fetching. Call an API, format the response, return markdown. No AI involved.
$0.00
fetch_device_info
Smart toolsFetch data, then optionally use an AI sub-call to filter/rank before returning. Always has a code-based fallback.
~$0.0003
fetch_sop_docs
Standardized ToolResult Interface

Tool Types

typescript
interface ToolResult {
  success: boolean;
  toolName: string;
  data: unknown;       // raw structured data (for UI or further processing)
  summary: string;     // short text for inline UI display
  markdown: string;    // full formatted output for prompt injection ← CRITICAL
  error?: string;
}

interface ToolContext {
  companyId: string;   // scoping — which customer/account
  deviceId?: string;   // scoping — which asset
  ticketId?: string;   // scoping — which work item
  contactEmail?: string;
  companyName: string;
  modelId?: string;    // optional — enables AI sub-calls inside tools
  ticketTitle?: string;
}

type ToolFunction = (
  params: ToolParams,
  context: ToolContext,
  token: string
) => Promise<ToolResult>;

Script Tool Template

typescript
import type { ToolFunction, ToolResult } from './types';

export const execute: ToolFunction = async (params, context, token) => {
  const fail = (msg: string): ToolResult => ({
    success: false,
    toolName: 'fetch_your_thing',
    data: null,
    summary: msg,
    markdown: `> **Your Thing:** ${msg}`,
    error: msg,
  });

  try {
    const rawData = await yourApiCall(context.companyId, params.target, token);

    if (!rawData || rawData.length === 0) {
      return fail('No data found');
    }

    const markdown = [
      `## Your Thing — ${context.companyName}`,
      `| Name | Status | Last Updated |`,
      `|------|--------|------------- |`,
      ...rawData.map(item => `| ${item.name} | ${item.status} | ${item.date} |`),
    ].join('\n');

    return {
      success: true,
      toolName: 'fetch_your_thing',
      data: rawData,
      summary: `Found ${rawData.length} items`,
      markdown,
    };
  } catch (err) {
    return fail(err instanceof Error ? err.message : 'Unknown error');
  }
};

Smart Tool Template (with AI sub-call)

typescript
import { chatComplete } from '@/lib/ai-call';
import type { ToolFunction } from './types';

const AI_FILTER_THRESHOLD = 5;
const FILTER_PROMPT = `You are filtering results for relevance to a work item.
Given the context and a list of results, select the 1-3 most relevant.
Respond with ONLY a JSON array: [{"index": 0, "reason": "why"}]`;

export const execute: ToolFunction = async (params, context, token) => {
  const rawResults = await fetchFromApi(params, context, token);

  // AI filtering when available and data is large enough
  if (context.modelId && rawResults.length >= AI_FILTER_THRESHOLD) {
    try {
      const aiResult = await chatComplete(context.modelId, [
        { role: 'system', content: FILTER_PROMPT },
        { role: 'user', content: formatForAI(rawResults, context) },
      ], { maxTokens: 512, timeoutMs: 10_000 });

      const selections = parseAiResponse(aiResult.content);
      if (selections.length > 0) {
        return buildFocusedResult(selections, rawResults);
      }
    } catch {
      // Fall through to code-based filtering — ALWAYS have a fallback
    }
  }

  return codeBasedResult(rawResults, params.target);
};

How to Build This for Your Application

1
Define Your Domain

What is your “work item”? A support ticket, sales lead, code review, patient intake, financial transaction? Everything flows from this.

Answer: What data is attached at the start? What external sources could enrich it? What does “done” look like?

2
Design Your Tools

Each tool fetches ONE type of data from ONE source. Name them fetch_<what>. Design principles:

  • Single responsibility — one tool per data source
  • Scoped by context — tools receive the work item's context and scope their queries
  • Graceful failure — return {success: false, error: "..."}, never throw
  • Markdown output is the product — the markdown field is what the LLM sees
3
Build the Tool Registry

A central map of toolName → { execute, description }. Add aliases for names the LLM might generate.

Tool Registry

typescript
export const TOOLS: Record<string, ToolMeta> = {
  fetch_device_info: {
    name: 'fetch_device_info',
    description: 'Fetch device details — hardware, OS, network, status',
    execute: fetchDeviceInfo,
  },
  // Aliases — LLMs sometimes generate slightly different names
  fetch_devices: {
    name: 'fetch_device_info',
    description: 'Alias for fetch_device_info',
    execute: fetchDeviceInfo,
  },
};

// Single dispatcher — wraps any thrown exceptions into a failed ToolResult
async function executeTool(
  name: string,
  params: ToolParams,
  context: ToolContext,
  token: string
): Promise<ToolResult> {
  const tool = TOOLS[name] ?? TOOLS[Object.keys(TOOLS).find(k => k.includes(name)) ?? ''];
  if (!tool) return { success: false, toolName: name, data: null, summary: 'Unknown tool', markdown: '' };
  try {
    return await tool.execute(params, context, token);
  } catch (err) {
    return { success: false, toolName: name, data: null, summary: String(err), markdown: '' };
  }
}
4
Build the Pre-Enrichment Engine

A lightweight AI call that looks at the work item and decides which tools to run BEFORE the main analysis. This replaces keyword matching — the AI reasons about intent.

Pre-Enrichment System Prompt

typescript
const PRE_ENRICHMENT_PROMPT = `
You select data-gathering tools to run before a [work item] is analyzed.
Pick only tools that would provide genuinely useful context for THIS work item.

Available tools:
- fetch_device_info: Fetch device hardware, OS, network. Use when a device is mentioned.
- fetch_backup_status: Check backup health. Use for data loss, disk, or reliability issues.
- fetch_user_info: Look up user account details. Use when a user or email is mentioned.

Respond with ONLY a JSON array:
[{"tool": "tool_name", "params": {"target": "value"}, "reason": "why"}]
`;

// Execute selected tools in parallel with a hard timeout
const results = await Promise.allSettled(
  selectedTools.map(({ tool, params }) =>
    Promise.race([
      executeTool(tool, params, context, token),
      new Promise<ToolResult>((_, reject) =>
        setTimeout(() => reject(new Error('timeout')), 12_000)
      ),
    ])
  )
);
5
Build the Main Analysis Prompt

Two parts: a static system prompt (rules, tool descriptions, confidence contract) and a per-work-item user message (all metadata, linked entities, output format).

Confidence Contract (include in system prompt)

text
## Confidence Contract
After each analysis pass, you MUST output a confidence assessment in this JSON block:

```json
{
  "analysis_confidence": "high | medium | low",
  "confidence_reason": "why you chose this level",
  "ai_actions": [...]
}
```

- "high" = You have specific, actionable data. Not guessing.
- "medium" = Reasonable analysis but tool data would improve it.
- "low" = Mostly guessing. Critical context is missing.

When you report "high", the loop EXITS and your answer is shown.
Do not report "high" to end the loop early if you are uncertain.
6
Build the AI Actions Schema

The AI requests tools via a structured JSON block in its response. Define the exact schema and include it in the system prompt.

AI Actions Schema

json
{
  "ai_actions": [
    {
      "action_id": "unique-kebab-case-id",
      "action_type": "fetch_device_info",
      "reason": "How this data would change the recommendation",
      "parameters": { "target": "hostname-or-search-term" },
      "priority": "required | recommended | nice_to_have"
    }
  ]
}

// Priority rules:
// required + recommended → execute automatically
// nice_to_have → skip (reduces cost, the AI said it's not critical)
7
Build the Refinement Prompt

For Pass 2+, the prompt changes. The system prompt gets all accumulated tool data appended. The user message contains the current document and refinement instructions.

Refinement Instructions (user message)

text
## Refinement Instructions
- ONLY output sections that NEED CHANGES based on the new data.
- Sections you do not output will remain unchanged.
- Use the SAME section headers as your original output.
- NEVER re-request a tool that already ran.
- Output an empty ai_actions array if all needed data is gathered.

## Tools Already Executed
- fetch_device_info (target: HOSTNAME): Success — Found device with ...
- fetch_backup_status: Success — 3 appliances, 12 agents
- fetch_azure_user (target: [email protected]): Failed — User not found
8
Build the Living Document (Section Map)

The analysis is a map of typed sections, not a monolithic string. On Pass 1, build it from scratch. On Pass 2+, merge — only sections present in the new response overwrite existing ones.

Section Map Types

typescript
interface SectionState {
  section: {
    heading: string;       // e.g. "🔍 Situation"
    headingText: string;   // e.g. "Situation"
    type: SectionType;     // enum: situation | findings | next-steps | ...
    body: string;
    emoji: string;
  };
  lastUpdatedPass: number;
  lastUpdatedAt: number;
}

type SectionMap = Map<SectionType, SectionState>;

// Merge function — only update sections present in the new response
function mergeSections(existing: SectionMap, updated: SectionMap): SectionMap {
  const merged = new Map(existing);
  for (const [type, state] of updated) {
    merged.set(type, state);
  }
  return merged;
}
9
Build the Orchestrator Loop

Tie it all together. The orchestrator is a state machine with 5 critical implementation details:

  1. Deduplication: Track action signatures (actionType::JSON(params)) in a Set. Never re-run the same tool with the same params.
  2. Budget tracking: Sum estimatedCost from every LLM call's usage stats. Check against budget at every Decision Gate.
  3. Stale confidence detection: If confidence hasn't improved after 2+ passes, the tools aren't helping. Stop.
  4. Model escalation: If confidence is still “low” after 2+ passes, temporarily switch to a more capable model. Restore when the loop completes.
  5. Parallel tool execution: Always run all requested tools in parallel (Promise.allSettled), never sequentially.
10
Add Safety Rails

Context blowout prevention, graceful degradation, and timeout management:

Context Blowout Prevention
  • Cap each tool's markdown (4000 chars)
  • Limit pre-enrichment to 5 tools
  • Hard pass limit (5)
  • Cost budget with escalation
Graceful Degradation
  • Pre-enrichment fails → default tools
  • Smart tool AI fails → code fallback
  • Tool fails → return error as ToolResult
  • Streaming error → show what we have
Timeout Management
  • Pre-enrichment tools: 12s timeout
  • AI sub-calls in smart tools: 10s
  • Pre-enrichment AI picker: 8s

Context Management Deep Dive

The biggest challenge in a recursive AI loop is managing the context window. Each pass adds more data. Here's how to keep it under control:

System Prompt Growth
Pass 1 system prompt:
  Base rules (~4,000 tokens)
  + Pre-enriched tool data (~2,000 tokens)
  ≈ 6,000 tokens

Pass 3 system prompt:
  Base rules (~4,000 tokens)
  + Pre-enriched data (~2,000 tokens)
  + Pass 1 tool results (~3,000 tokens)
  + Pass 2 tool results (~3,000 tokens)
  ≈ 12,000 tokens
User Message Changes
Pass 1 user message:
  Full work item data + notes + contacts
  + format spec
  ≈ 3,000–8,000 tokens

Pass 2+ user message:
  Current document (section map) (~2,000)
  + Tool execution summary (~500)
  + Compact work item data (~1,500)
  + Refinement instructions (~500)
  ≈ 4,500 tokens

Real-World Cost Analysis

Using Gemini Flash tier pricing as a reference point:

ComponentTokensCost
Pre-enrichment AI picker~800
$0.0005
Pass 1 analysis~12,000
$0.003
Smart tool sub-call~1,000
$0.0005
Pass 2 refinement~15,000
$0.004
Pass 3 refinement~18,000
$0.005
Typical 2-pass analysis
~$0.008
Complex 4-pass analysis
~$0.02

Adapting to Your Domain

Replace these concepts with your equivalents — the loop architecture stays the same:

Our ConceptYour EquivalentExamples
Support ticketWork itemSales lead, code PR, patient chart, order
CompanyAccount/scopeCustomer, organization, project, repository
DeviceAssetServer, product, vehicle, instrument
ContactPersonLead, patient, author, stakeholder
Ticket notesActivity logComments, events, messages, lab results
GraphQL APIYour data layerREST API, database queries, file system
Auth tokenYour authAPI key, OAuth token, session cookie

Quick Reference: Build Order

If starting from scratch, build in this order to get something working as fast as possible:

1

ToolResult type + 2 simple tools

Get data flowing
2

Tool registry + dispatcher

Centralized lookup
3

Main analysis prompt

System prompt with confidence contract
4

Streaming LLM call

Get Pass 1 working end-to-end
5

AI Actions parser

Extract tool requests from response
6

Orchestrator loop

Wire up analyze → decision → tools → refine
7

Section map + merge

Make refinement passes update, not replace
8

Pre-enrichment

Reduce the number of passes needed
9

Smart tools

Add AI sub-processing to high-volume tools
10

Budget + safety rails

Cost tracking, dedup, stale detection