Recursive AI Tool Loop

A production-proven pattern for building AI systems that analyze data, decide what they need, fetch it, and refine their own answer — without human intervention. The loop is self-terminating: the AI reports its own confidence, and the orchestrator decides when to stop.

AI Instructions

This page describes a complete recursive AI tool loop architecture. Read it top to bottom to understand the pattern, then use the “How to Build This” section (Steps 1–10) to implement it in any application. The TypeScript templates are copy-paste ready.

Deep Dive Available — Built by Coulee Tech

Coulee Tech built a production AI Employee system on this pattern with 91 tools and multiple specialized Employees. Their detailed case study is available as a 6-part deep dive with annotated code, flowcharts, and AI thinking callouts.

Explore the Deep Dive

AI Context: Recursive AI Tool Loop

This page documents a production recursive AI tool loop. The system reads a work item, pre-fetches relevant data using a lightweight AI picker, runs a main LLM analysis, then enters a decision gate that checks confidence, budget, and deduplication. If more data is needed, tools run in parallel and a refinement pass updates only the changed sections of a living document (section map). The loop exits when the AI reports high confidence, hits the pass limit, exhausts the budget, or detects stale confidence. The five key components are: Orchestrator, AI Analysis Engine, Tool System, Pre-Enrichment Engine, and Smart Tools.

What This System Does

Given a piece of work (a support ticket, sales lead, code review, patient intake — any domain), the system:

Reads the work item and its metadata
Guesses which external data sources would help, and pre-fetches them
Analyzes the work item using an LLM, with pre-fetched data injected into the prompt
Decides whether it has enough information or needs more
Fetches additional data using tools the LLM requested
Re-analyzes with new data merged in, updating only the sections that changed
Repeats steps 4–6 until confident, out of budget, or at the pass limit
Saves the final analysis and makes it available to the human

The loop is self-terminating. The AI itself reports its confidence level, and the orchestrator uses that — along with cost tracking, deduplication, and stale-confidence detection — to decide when to stop.

Architecture Overview

Loop State Machine

text

┌──────────────────────────────────────────────────────────────────┐
│                        ORCHESTRATOR                              │
│                                                                  │
│  1. Pre-Enrichment ─────────────────────────────────────────┐   │
│     │  Lightweight AI call picks which tools to run         │   │
│     │  Selected tools execute in parallel                   │   │
│     │  Results injected into Pass 1 system prompt           │   │
│     ▼                                                       │   │
│  2. Pass 1: Main Analysis ──────────────────────────────────┤   │
│     │  Streaming LLM call with full prompt + enriched data  │   │
│     │  Response parsed into section map (living document)   │   │
│     ▼                                                       │   │
│  3. Decision Gate ──────────────────────────────────────────┤   │
│     │  Extract confidence + ai_actions from response        │   │
│     │  Check exit conditions (confident? budget? max pass?) │   │
│     │  Filter to new, non-duplicate actions                 │   │
│     ▼                                                       │   │
│  4. Tool Execution ─────────────────────────────────────────┤   │
│     │  Run required + recommended tools in parallel         │   │
│     │  Each tool returns { summary, markdown, data }        │   │
│     │  Markdown accumulated into enriched context           │   │
│     ▼                                                       │   │
│  5. Refinement Pass (2, 3, ... up to 5) ───────────────────┤   │
│     │  System prompt = base rules + ALL enriched data       │   │
│     │  User message = current document + tool summaries     │   │
│     │  AI outputs ONLY sections that need changes           │   │
│     │  Sections merged into existing document               │   │
│     → Back to Decision Gate (step 3)                        │   │
│     ▼                                                       │   │
│  6. Complete ───────────────────────────────────────────────┘   │
│     Auto-save analysis to backend                               │
│     Restore model if escalated during the loop                  │
└──────────────────────────────────────────────────────────────────┘

Data Flow

text

Work Item (ticket, order, case, etc.)
    │
    ▼
Pre-Enrichment AI ──selects──▶ Tool Registry ──dispatches──▶ Individual Tools
                                                                   │
                                               graphqlFetch() / REST / AI sub-call
                                                                   │
                                                           ToolResult {
                                                             summary,   ← for UI
                                                             markdown,  ← for prompt injection
                                                             data       ← for structured access
                                                           }
                                                                   │
                                       Injected into LLM system prompt as context

The Five Key Components

The Orchestrator

Manages the loop lifecycle — phases, pass counting, budget tracking, tool deduplication, model escalation, and exit conditions.

The AI Analysis Engine

Sends the LLM call (streaming), manages prompt templates, and fires a completion callback when the stream finishes.

The Tool System

A registry of async functions that fetch data from external systems. Each tool takes standardized inputs and returns a standardized result.

The Pre-Enrichment Engine

Before the main analysis starts, a lightweight AI call decides which tools to run upfront — replacing keyword matching with intent reasoning.

Smart Tools

Tools that optionally use an AI sub-call to post-process their own results before returning — filtering 40 results down to the 2-3 most relevant.

Orchestrator Exit Conditions

Checked at every Decision Gate — the loop stops when any of these are true:

Condition	Trigger
`Confident`	AI reports high confidence and has no required/recommended actions
`No new actions`	AI requested tools, but all were already run
`Max passes`	Hard limit (5 passes) to prevent infinite loops
`Stale confidence`	Same confidence level for 2+ consecutive passes
`Budget exceeded`	Total cost exceeds per-analysis budget ($0.10 default)
`Manual stop`	User disabled auto mode or clicked abort

Orchestrator Decision Gate (pseudocode)

pseudocode

on_analysis_complete(response, usage):
    update living document (build or merge sections)
    record pass in history

    if pass >= MAX_PASSES → exit("max-passes")
    if total_cost >= budget → exit("budget-exceeded")

    actions = extract_ai_actions(response)
    new_actions = actions.filter(not already completed)

    if new_actions is empty:
        if confidence == "high" → exit("confident")
        else → exit("no-new-actions")

    if pass >= 3 and confidence == last_confidence → exit("stale-confidence")

    run_tools(new_actions)
    → on tools complete → build refinement prompt → start next pass

The Tool System

Each tool is a function that fetches ONE type of data from ONE source. Think of them as sensors — each one looks at the world from a different angle.

Type	Description	AI Cost	Example
Script tools	Pure data fetching. Call an API, format the response, return markdown. No AI involved.	$0.00	`fetch_device_info`
Smart tools	Fetch data, then optionally use an AI sub-call to filter/rank before returning. Always has a code-based fallback.	~$0.0003	`fetch_sop_docs`

Standardized ToolResult Interface

Tool Types

typescript

interface ToolResult {
  success: boolean;
  toolName: string;
  data: unknown;       // raw structured data (for UI or further processing)
  summary: string;     // short text for inline UI display
  markdown: string;    // full formatted output for prompt injection ← CRITICAL
  error?: string;
}

interface ToolContext {
  companyId: string;   // scoping — which customer/account
  deviceId?: string;   // scoping — which asset
  ticketId?: string;   // scoping — which work item
  contactEmail?: string;
  companyName: string;
  modelId?: string;    // optional — enables AI sub-calls inside tools
  ticketTitle?: string;
}

type ToolFunction = (
  params: ToolParams,
  context: ToolContext,
  token: string
) => Promise<ToolResult>;

The markdown field is the product. It's what gets injected into the LLM prompt on the next pass. Format it as structured markdown with tables, bullet lists, and clear headers. Use capMarkdown(md, 4000) to prevent any single tool from blowing out the context window.

Script Tool Template

typescript

import type { ToolFunction, ToolResult } from './types';

export const execute: ToolFunction = async (params, context, token) => {
  const fail = (msg: string): ToolResult => ({
    success: false,
    toolName: 'fetch_your_thing',
    data: null,
    summary: msg,
    markdown: `> **Your Thing:** ${msg}`,
    error: msg,
  });

  try {
    const rawData = await yourApiCall(context.companyId, params.target, token);

    if (!rawData || rawData.length === 0) {
      return fail('No data found');
    }

    const markdown = [
      `## Your Thing — ${context.companyName}`,
      `| Name | Status | Last Updated |`,
      `|------|--------|------------- |`,
      ...rawData.map(item => `| ${item.name} | ${item.status} | ${item.date} |`),
    ].join('\n');

    return {
      success: true,
      toolName: 'fetch_your_thing',
      data: rawData,
      summary: `Found ${rawData.length} items`,
      markdown,
    };
  } catch (err) {
    return fail(err instanceof Error ? err.message : 'Unknown error');
  }
};

Smart Tool Template (with AI sub-call)

typescript

import { chatComplete } from '@/lib/ai-call';
import type { ToolFunction } from './types';

const AI_FILTER_THRESHOLD = 5;
const FILTER_PROMPT = `You are filtering results for relevance to a work item.
Given the context and a list of results, select the 1-3 most relevant.
Respond with ONLY a JSON array: [{"index": 0, "reason": "why"}]`;

export const execute: ToolFunction = async (params, context, token) => {
  const rawResults = await fetchFromApi(params, context, token);

  // AI filtering when available and data is large enough
  if (context.modelId && rawResults.length >= AI_FILTER_THRESHOLD) {
    try {
      const aiResult = await chatComplete(context.modelId, [
        { role: 'system', content: FILTER_PROMPT },
        { role: 'user', content: formatForAI(rawResults, context) },
      ], { maxTokens: 512, timeoutMs: 10_000 });

      const selections = parseAiResponse(aiResult.content);
      if (selections.length > 0) {
        return buildFocusedResult(selections, rawResults);
      }
    } catch {
      // Fall through to code-based filtering — ALWAYS have a fallback
    }
  }

  return codeBasedResult(rawResults, params.target);
};

How to Build This for Your Application

AI Instructions: Follow these 10 steps in order to implement a recursive AI tool loop in any application. Each step builds on the previous one.

Define Your Domain

What is your “work item”? A support ticket, sales lead, code review, patient intake, financial transaction? Everything flows from this.

Answer: What data is attached at the start? What external sources could enrich it? What does “done” look like?

Design Your Tools

Each tool fetches ONE type of data from ONE source. Name them fetch_<what>. Design principles:

Single responsibility — one tool per data source
Scoped by context — tools receive the work item's context and scope their queries
Graceful failure — return {success: false, error: "..."}, never throw
Markdown output is the product — the markdown field is what the LLM sees

Build the Tool Registry

A central map of toolName → { execute, description }. Add aliases for names the LLM might generate.

Tool Registry

typescript

export const TOOLS: Record<string, ToolMeta> = {
  fetch_device_info: {
    name: 'fetch_device_info',
    description: 'Fetch device details — hardware, OS, network, status',
    execute: fetchDeviceInfo,
  },
  // Aliases — LLMs sometimes generate slightly different names
  fetch_devices: {
    name: 'fetch_device_info',
    description: 'Alias for fetch_device_info',
    execute: fetchDeviceInfo,
  },
};

// Single dispatcher — wraps any thrown exceptions into a failed ToolResult
async function executeTool(
  name: string,
  params: ToolParams,
  context: ToolContext,
  token: string
): Promise<ToolResult> {
  const tool = TOOLS[name] ?? TOOLS[Object.keys(TOOLS).find(k => k.includes(name)) ?? ''];
  if (!tool) return { success: false, toolName: name, data: null, summary: 'Unknown tool', markdown: '' };
  try {
    return await tool.execute(params, context, token);
  } catch (err) {
    return { success: false, toolName: name, data: null, summary: String(err), markdown: '' };
  }
}

Build the Pre-Enrichment Engine

A lightweight AI call that looks at the work item and decides which tools to run BEFORE the main analysis. This replaces keyword matching — the AI reasons about intent.

Pre-Enrichment System Prompt

typescript

const PRE_ENRICHMENT_PROMPT = `
You select data-gathering tools to run before a [work item] is analyzed.
Pick only tools that would provide genuinely useful context for THIS work item.

Available tools:
- fetch_device_info: Fetch device hardware, OS, network. Use when a device is mentioned.
- fetch_backup_status: Check backup health. Use for data loss, disk, or reliability issues.
- fetch_user_info: Look up user account details. Use when a user or email is mentioned.

Respond with ONLY a JSON array:
[{"tool": "tool_name", "params": {"target": "value"}, "reason": "why"}]
`;

// Execute selected tools in parallel with a hard timeout
const results = await Promise.allSettled(
  selectedTools.map(({ tool, params }) =>
    Promise.race([
      executeTool(tool, params, context, token),
      new Promise<ToolResult>((_, reject) =>
        setTimeout(() => reject(new Error('timeout')), 12_000)
      ),
    ])
  )
);

Build the Main Analysis Prompt

Two parts: a static system prompt (rules, tool descriptions, confidence contract) and a per-work-item user message (all metadata, linked entities, output format).

Confidence Contract (include in system prompt)

text

## Confidence Contract
After each analysis pass, you MUST output a confidence assessment in this JSON block:

```json
{
  "analysis_confidence": "high | medium | low",
  "confidence_reason": "why you chose this level",
  "ai_actions": [...]
}
```

- "high" = You have specific, actionable data. Not guessing.
- "medium" = Reasonable analysis but tool data would improve it.
- "low" = Mostly guessing. Critical context is missing.

When you report "high", the loop EXITS and your answer is shown.
Do not report "high" to end the loop early if you are uncertain.

Build the AI Actions Schema

The AI requests tools via a structured JSON block in its response. Define the exact schema and include it in the system prompt.

AI Actions Schema

json

{
  "ai_actions": [
    {
      "action_id": "unique-kebab-case-id",
      "action_type": "fetch_device_info",
      "reason": "How this data would change the recommendation",
      "parameters": { "target": "hostname-or-search-term" },
      "priority": "required | recommended | nice_to_have"
    }
  ]
}

// Priority rules:
// required + recommended → execute automatically
// nice_to_have → skip (reduces cost, the AI said it's not critical)

Build the Refinement Prompt

For Pass 2+, the prompt changes. The system prompt gets all accumulated tool data appended. The user message contains the current document and refinement instructions.

Refinement Instructions (user message)

text

## Refinement Instructions
- ONLY output sections that NEED CHANGES based on the new data.
- Sections you do not output will remain unchanged.
- Use the SAME section headers as your original output.
- NEVER re-request a tool that already ran.
- Output an empty ai_actions array if all needed data is gathered.

## Tools Already Executed
- fetch_device_info (target: HOSTNAME): Success — Found device with ...
- fetch_backup_status: Success — 3 appliances, 12 agents
- fetch_azure_user (target: [email protected]): Failed — User not found

Build the Living Document (Section Map)

The analysis is a map of typed sections, not a monolithic string. On Pass 1, build it from scratch. On Pass 2+, merge — only sections present in the new response overwrite existing ones.

Section Map Types

typescript

interface SectionState {
  section: {
    heading: string;       // e.g. "🔍 Situation"
    headingText: string;   // e.g. "Situation"
    type: SectionType;     // enum: situation | findings | next-steps | ...
    body: string;
    emoji: string;
  };
  lastUpdatedPass: number;
  lastUpdatedAt: number;
}

type SectionMap = Map<SectionType, SectionState>;

// Merge function — only update sections present in the new response
function mergeSections(existing: SectionMap, updated: SectionMap): SectionMap {
  const merged = new Map(existing);
  for (const [type, state] of updated) {
    merged.set(type, state);
  }
  return merged;
}

Build the Orchestrator Loop

Tie it all together. The orchestrator is a state machine with 5 critical implementation details:

Deduplication: Track action signatures (actionType::JSON(params)) in a Set. Never re-run the same tool with the same params.
Budget tracking: Sum estimatedCost from every LLM call's usage stats. Check against budget at every Decision Gate.
Stale confidence detection: If confidence hasn't improved after 2+ passes, the tools aren't helping. Stop.
Model escalation: If confidence is still “low” after 2+ passes, temporarily switch to a more capable model. Restore when the loop completes.
Parallel tool execution: Always run all requested tools in parallel (Promise.allSettled), never sequentially.

Add Safety Rails

Context blowout prevention, graceful degradation, and timeout management:

Context Blowout Prevention

Cap each tool's markdown (4000 chars)
Limit pre-enrichment to 5 tools
Hard pass limit (5)
Cost budget with escalation

Graceful Degradation

Pre-enrichment fails → default tools
Smart tool AI fails → code fallback
Tool fails → return error as ToolResult
Streaming error → show what we have

Timeout Management

Pre-enrichment tools: 12s timeout
AI sub-calls in smart tools: 10s
Pre-enrichment AI picker: 8s

Context Management Deep Dive

The biggest challenge in a recursive AI loop is managing the context window. Each pass adds more data. Here's how to keep it under control:

System Prompt Growth

Pass 1 system prompt:
  Base rules (~4,000 tokens)
  + Pre-enriched tool data (~2,000 tokens)
  ≈ 6,000 tokens

Pass 3 system prompt:
  Base rules (~4,000 tokens)
  + Pre-enriched data (~2,000 tokens)
  + Pass 1 tool results (~3,000 tokens)
  + Pass 2 tool results (~3,000 tokens)
  ≈ 12,000 tokens

User Message Changes

Pass 1 user message:
  Full work item data + notes + contacts
  + format spec
  ≈ 3,000–8,000 tokens

Pass 2+ user message:
  Current document (section map) (~2,000)
  + Tool execution summary (~500)
  + Compact work item data (~1,500)
  + Refinement instructions (~500)
  ≈ 4,500 tokens

Real-World Cost Analysis

Using Gemini Flash tier pricing as a reference point:

Component	Tokens	Cost
Pre-enrichment AI picker	~800	$0.0005
Pass 1 analysis	~12,000	$0.003
Smart tool sub-call	~1,000	$0.0005
Pass 2 refinement	~15,000	$0.004
Pass 3 refinement	~18,000	$0.005
Typical 2-pass analysis	—	~$0.008
Complex 4-pass analysis	—	~$0.02

Budget default: $0.10 per work item. This gives massive headroom for complex cases while catching infinite loops.

Adapting to Your Domain

Replace these concepts with your equivalents — the loop architecture stays the same:

Our Concept	Your Equivalent	Examples
`Support ticket`	Work item	Sales lead, code PR, patient chart, order
`Company`	Account/scope	Customer, organization, project, repository
`Device`	Asset	Server, product, vehicle, instrument
`Contact`	Person	Lead, patient, author, stakeholder
`Ticket notes`	Activity log	Comments, events, messages, lab results
`GraphQL API`	Your data layer	REST API, database queries, file system
`Auth token`	Your auth	API key, OAuth token, session cookie

Quick Reference: Build Order

If starting from scratch, build in this order to get something working as fast as possible:

ToolResult type + 2 simple tools

Get data flowing

Tool registry + dispatcher

Centralized lookup

Main analysis prompt

System prompt with confidence contract

Streaming LLM call

Get Pass 1 working end-to-end

AI Actions parser

Extract tool requests from response

Orchestrator loop

Wire up analyze → decision → tools → refine

Section map + merge

Make refinement passes update, not replace

Pre-enrichment

Reduce the number of passes needed

Smart tools

Add AI sub-processing to high-volume tools

Budget + safety rails

Cost tracking, dedup, stale detection

Back to Coach Resources