Building a Knowledge-Powered AI Agent: Handy Beaver's Multi-Layer Architecture

Building in public: Day 79 of shipping production AI agents on Cloudflare's edge platform.


The Problem: Generic AI Is Not Enough

We've had Lil Beaver (our AI handyman assistant) running for a few weeks now, and the conversations were... okay. The agent could chat, answer questions, and help customersβ€”but it felt generic. When someone asked about pricing, we'd get vague responses. When generating social media posts, the content lacked the specificity that makes Handy Beaver unique.

The core issue: The AI had no structured memory of what Handy Beaver actually offers.

Sure, we could stuff everything into system prompts, but that's messy, expensive (token costs add up), and hard to maintain. What we needed was a proper knowledge architectureβ€”one that could serve multiple purposes:

  1. Chat context - Let the agent instantly recall pricing, services, service areas
  2. Social content generation - Inject authentic business details into AI-generated posts
  3. SEO/Discovery - Make the knowledge searchable by external AI tools

Yesterday, we shipped all three.


The Solution: Three Knowledge Layers

Instead of picking one approach, we built a layered system where each layer serves a specific purpose:

Layer 1: Cloudflare Vectorize (Instant Recall)

Purpose: Fast, structured retrieval for chat interactions

We created two markdown knowledge files:

  • agent/KNOWLEDGE.md - Business fundamentals (pricing, services, subscriptions, service area, FAQ)
  • agent/SOCIAL-CONTENT.md - Social templates, hashtags, seasonal themes

These get chunked and indexed into Cloudflare Vectorize with metadata:

{
  "agent": "lil-beaver",
  "category": "pricing",
  "source": "KNOWLEDGE.md",
  "section": "services"
}

Result: 15 indexed chunks that auto-inject into conversations via the memory-vectorize plugin. When a customer asks "How much for gutter cleaning?", the agent instantly recalls the exact pricing tier ($100-300 depending on home size).

Why Vectorize? It's stupid fast (sub-10ms queries) and persists globally at the edge. Perfect for real-time chat.


Layer 2: R2 JSON (Social Generator)

Purpose: Structured data for AI content generation

The social content API endpoint needed more than just vector recallβ€”it needed structured context to inject into prompts. So we created:

knowledge/site-info.json - A JSON schema containing:

  • Services array with descriptions, pricing ranges, target customers
  • Subscription tiers (one-time, quarterly, annual pricing)
  • Service area (Choctaw County, Oklahoma)
  • Hashtags categorized by type (location, services, brand)
  • Post templates for educational, promotional, and testimonial content

The /api/social/idea endpoint now:

  1. Fetches this JSON from R2
  2. Injects relevant sections into the AI prompt based on content type
  3. Returns posts with proper hashtags and business context

Example output:

{
  "content": "Spring gutter cleaning season is here! Handy Beaver offers professional gutter maintenance starting at $100 for standard homes...",
  "hashtags": ["#ChoctawCountyOK", "#GutterCleaning", "#HandyBeaver"],
  "contextualInfo": "Service: Gutter Cleaning | Price: $100-300 | Season: Spring"
}

Why R2? Static, versioned, cacheable. Social content generation happens async (not real-time), so we optimize for structure over speed.


Layer 3: AI Search MCP (Live Site Knowledge)

Purpose: External AI discoverability + live page content

This is the newest layer. We deployed a Cloudflare AI Search instance (handy-beaver-brain-nlweb.srvcflo.workers.dev) that:

  1. Indexes the entire site via sitemap (14 pages: services, pricing, blog, gallery, etc.)
  2. Exposes an MCP endpoint (/mcp) for AI-to-AI communication
  3. Provides a search tool with query modes: list, summarize, generate, none

This means:

  • External AIs (ChatGPT, Claude, etc.) can discover Handy Beaver via MCP
  • Our own agents can query live page content (blog posts, gallery descriptions)
  • SEO crawlers get a clean sitemap for indexing

MCP Tool Example:

{
  "name": "ask",
  "description": "Query Handy Beaver's knowledge base",
  "inputSchema": {
    "query": "What tiny home packages does Handy Beaver offer?",
    "generate_mode": "summarize"
  }
}

The AI Search layer is still indexing (takes ~24 hours for full sitemap crawl), but once live, it'll be the most comprehensive knowledge source.

Why AI Search? It's designed for discovery. Other AIs can find us. Our agents can reference live blog content. It's the public-facing knowledge layer.


Architecture Summary

Here's how it all fits together:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Knowledge Sources for Lil Beaver               β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  1. Vectorize (auto-recall)                     β”‚
β”‚     - KNOWLEDGE.md, SOCIAL-CONTENT.md           β”‚
β”‚     - Use case: Real-time chat context          β”‚
β”‚     - Speed: <10ms                              β”‚
β”‚                                                 β”‚
β”‚  2. R2 JSON (social generator)                  β”‚
β”‚     - knowledge/site-info.json                  β”‚
β”‚     - Use case: AI content generation           β”‚
β”‚     - Speed: ~50ms (cacheable)                  β”‚
β”‚                                                 β”‚
β”‚  3. AI Search MCP (live site)                   β”‚
β”‚     - handy-beaver-brain-nlweb.srvcflo.workers  β”‚
β”‚     - Use case: External AI discovery, blog     β”‚
β”‚     - Speed: ~200ms (comprehensive)             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Why three layers instead of one?

Each serves a different performance profile:

  • Vectorize = Instant, small, structured
  • R2 = Fast, versioned, format-flexible
  • AI Search = Comprehensive, discoverable, live-updating

Trying to force one solution would mean compromising on speed, freshness, or discoverability.


Implementation Highlights

1. SEO Sitemap

We added a proper sitemap to help AI Search (and Google) index the site:

<urlset>
  <url>
    <loc>https://handybeaver.co/</loc>
    <changefreq>daily</changefreq>
  </url>
  <url>
    <loc>https://handybeaver.co/services</loc>
    <changefreq>weekly</changefreq>
  </url>
  <!-- 12 more pages -->
</urlset>

Live at: https://handybeaver.co/sitemap.xml

2. Social Content API Enhancement

The /api/social/idea endpoint now:

// Fetch knowledge from R2
const knowledge = await env.ASSETS.get('knowledge/site-info.json');
const siteInfo = await knowledge.json();

// Inject into AI prompt based on content type
if (type === 'promotional') {
  prompt += `\n\nPricing context: ${JSON.stringify(siteInfo.pricing)}`;
}

// Return with proper hashtags
return {
  content: aiResponse,
  hashtags: siteInfo.hashtags[category],
  contextualInfo: `Service: ${service} | Price: ${price}`
};

No more generic postsβ€”every piece of content is grounded in real business data.

3. Git Commits (Yesterday)

Here's what actually shipped:

  • a5d9179 - feat(agent): Add knowledge base + social content for Lil Beaver
  • 3a06124 - feat(social): Wire knowledge base into social content generator
  • 1f7b6e3 - docs(agent): Add AI Search MCP endpoint to SKILL.md

Plus some UX fixes:

  • 67dad93 - Use first name only with customers (Colt, not Colt Cogburn)
  • 62dee39 - Fix Add to Queue button + add Lil Beaver flier docs
  • 06d213a - Fix flier download and queue buttons
  • c990e70 - Fix phone number: use ElevenLabs WhatsApp

What's Next

Immediate

  1. Wait for AI Search indexing - Should complete in ~12 hours
  2. Test MCP endpoint with external AI tools (ChatGPT plugin, Claude MCP client)
  3. Monitor vector recall - Are we getting the right chunks in chat?

Soon

  1. Auto-sync knowledge - When we update pricing/services, auto-reindex Vectorize + AI Search
  2. Analytics dashboard - Track which knowledge chunks get recalled most often
  3. Customer feedback loop - If the agent can't answer a question, log it β†’ update knowledge base

Future

  1. Multi-agent knowledge sharing - Let other agents (Flo, Sage) query Handy Beaver's knowledge
  2. Voice agent integration - ElevenLabs agent needs the same knowledge context
  3. Knowledge versioning - Track how business info changes over time

Key Takeaways

1. One knowledge source is not enough.

Different use cases need different performance profiles. Chat needs speed. Content generation needs structure. Discovery needs comprehensiveness. Build for all three.

2. Cloudflare's edge platform makes this possible.

Vectorize for fast recall. R2 for structured storage. AI Search for discoverability. All globally distributed, all sub-second latency.

3. Knowledge should be versioned and inspectable.

We store everything in git-tracked files (KNOWLEDGE.md, site-info.json, sitemap.xml). No black-box databases. If the AI says something wrong, we can trace it back to the source and fix it.

4. Build in layers, not monoliths.

Start with Vectorize for chat. Add R2 for structured data. Layer in AI Search for discovery. Each addition is additive, not replacement.


Try It Yourself

Want to chat with Lil Beaver and see the knowledge base in action?

Live demo: https://handybeaver.co/agent

Ask about pricing, services, or service areaβ€”the responses are now grounded in the structured knowledge we built.

MCP endpoint (for AI tools): https://handy-beaver-brain-nlweb.srvcflo.workers.dev/mcp


Shipped with β˜• by Dev on 2026-03-20. Follow the build: blog.minte.dev


Tech Stack

  • Cloudflare Workers - Edge compute
  • Cloudflare Vectorize - Vector database (15 indexed chunks)
  • Cloudflare R2 - Object storage (knowledge JSON)
  • Cloudflare AI Search - MCP-enabled search instance
  • Anthropic Claude - LLM for chat + content generation
  • Markdown - Knowledge authoring format

Total infrastructure cost: ~$5/month (mostly AI API calls)

Performance:

  • Vector recall: <10ms
  • R2 fetch: ~50ms
  • AI Search query: ~200ms
  • Chat response (full pipeline): ~1.5s

Not bad for a globally-distributed, AI-powered handyman assistant.


What knowledge architecture are you building for your AI agents? Hit me up on Discord or Twitterβ€”always down to talk edge AI.