Minimum Viable RAG: Embeddings and Vector Search in the Browser with MiniLM

June 23, 2025 AI

In the evolving landscape of Retrieval-Augmented Generation (RAG), where large language models are enhanced by context-relevant information from a vector store, much of the focus falls on cloud-scale infrastructure and proprietary APIs. But what if you could build a fully private, zero-dependency MVP of RAG—entirely in the browser?

Using the WebAssembly-enabled all-MiniLM-L6-v2 model from Xenova, this whitepaper walks through an implementation of the MVP of the MVP: a fully client-side embedding generator and vector search interface. We explore what makes this approach valuable, how far it can take you, and where its limitations begin.

1. Introduction: What Is RAG at Its Core?

RAG is a technique that combines language generation with vector-based retrieval. It consists of:

Embedding content into high-dimensional vectors
Storing those vectors in a retrievable structure
Querying new inputs as embeddings
Matching them to stored vectors via similarity (e.g. cosine)
Feeding top matches back into a language model as context

While most implementations rely on server-hosted embedding models and vector databases like Pinecone or Weaviate, the essence of RAG can be distilled to its fundamentals in their simplest form for accessibility and learning.

2. The Core Tool: MiniLM for the Browser

all-MiniLM-L6-v2 is a compact 384-dimensional embedding model distilled from larger Transformer architectures. It balances performance with size, making it ideal for client-side use.

Key features:

Model size: ~30MB
Embedding dimension: 384
Latency: <1s for short texts
Normalization: Built-in cosine compatibility via normalize=true
Hosting: Runs entirely in-browser via WebAssembly (no API keys or cloud calls)

This enables a fully offline, privacy-preserving semantic search tool.

3. Browser MVP: The Working Implementation

The MVP consists of a single HTML file:

Loads the Xenova MiniLM model using JS modules
Accepts text input and a pasted vector store
Generates a query embedding and computes cosine similarity
Sorts and displays the top N results

Sample Vector Store Format

[
  {
    "id": "movie1",
    "vector": [0.12, 0.34, ...],
    "metadata": { "title": "A Knights Tale", "about":"Peasant-born William Thatcher (Heath Ledger) begins a quest to change his stars, win the heart of an exceedingly fair maiden (Shanynn Sossamon) and rock his medieval world. With the help of friends (Mark Addy, Paul Bettany, Alan Tudyk), he faces the ultimate test of medieval gallantry -- tournament jousting -- and tries to discover if he has the mettle to become a legend."}
  },
  {
    "id": "movie2",
    "vector": [0.11, 0.35, ...],
    "metadata": { "title": "No Country For Old Men", "about":"While out hunting, Llewelyn Moss (Josh Brolin) finds the grisly aftermath of a drug deal. Though he knows better, he cannot resist the cash left behind and takes it with him. The hunter becomes the hunted when a merciless killer named Chigurh (Javier Bardem) picks up his trail. Also looking for Moss is Sheriff Bell (Tommy Lee Jones), an aging lawman who reflects on a changing world and a dark secret of his own, as he tries to find and protect Moss." }
  },
  {
    "id": "movie3",
    "vector": [0.14, 0.32, ...],
    "metadata": { "title": "Happy Gilmore", "about": "All Happy Gilmore (Adam Sandler) has ever wanted is to be a professional hockey player. But he soon discovers he may actually have a talent for playing an entirely different sport: golf. When his grandmother (Frances Bay) learns she is about to lose her home, Happy joins a golf tournament to try and win enough money to buy it for her. With his powerful driving skills and foulmouthed attitude, Happy becomes an unlikely golf hero -- much to the chagrin of the well-mannered golf professionals." }
  }
]

These three vectorized films are embedded locally and used as a toy dataset for querying, comparison, and interactive learning.

Query Flow

User enters a search phrase
Embedding is generated client-side via MiniLM
Vector is compared to all stored vectors
Top results are displayed with similarity scores

No API keys. No servers. Just raw semantic search.

4. The MVP App

<html lang="en">
<head>
  <meta charset="UTF-8" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
  <title>Vector Search</title>
  <style>
    body { font-family: sans-serif; padding: 2rem; max-width: 600px; margin: auto; }
    textarea, input { width: 100%; margin: 1rem 0; padding: 0.5rem; }
    button { padding: 0.5rem 1rem; }
    .result { margin-top: 1rem; padding: 0.5rem; border: 1px solid #ccc; }
  </style>
  <script type="module">
    import { pipeline } from 'https://cdn.jsdelivr.net/npm/@xenova/transformers@2.5.2';

    let embedder;

    async function initModel() {
      embedder = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
    }

    async function getEmbedding(text) {
      if (!embedder) await initModel();
      const output = await embedder(text, { pooling: 'mean', normalize: true });
      return output.data;
    }

    function cosineSimilarity(a, b) {
      let dot = 0, normA = 0, normB = 0;
      for (let i = 0; i < a.length; i++) {
        dot += a[i] * b[i];
        normA += a[i] * a[i];
        normB += b[i] * b[i];
      }
      return dot / (Math.sqrt(normA) * Math.sqrt(normB));
    }

    window.embedAndSearch = async function () {
      const queryText = document.getElementById("queryText").value;

      if (!queryText) {
        alert("Please enter a search term.");
        return;
      }

      let queryVec;
      try {
        queryVec = await getEmbedding(queryText);
        const queryVectorElement = document.getElementById("queryVector");
          queryVectorElement.innerHTML = queryVec;
      } catch (e) {
        alert("Failed to generate embedding.");
        return;
      }

      let vectorStore;
      try {
        const rawData = "exampleJsonAbove"
        vectorStore = JSON.parse(rawData);
        console.log({vectorStore});
      } catch (e) {
        alert("Invalid JSON in vector store.");
        return;
      }

      const results = vectorStore.map(entry => {
        return {
          ...entry,
          similarity: cosineSimilarity(queryVec, entry.vector)
        };
      }).sort((a, b) => b.similarity - a.similarity);
      console.log({results});

      const resultBox = document.getElementById("results");
      resultBox.innerHTML = '<h2>Top Matches</h2>' + results.slice(0, 5).map(r => `
        <div class="result">
          <strong>${r.title || r.id}</strong><br>
          Similarity: ${r.similarity.toFixed(4)}
        </div>
      `).join("");
    }
  </script>
</head>
<body>
  <h1>Vector Similarity Search</h1>

  <label for="queryText">Search Term:</label>
  <input id="queryText" placeholder="Enter your search phrase..." />

  <label for="queryVector">This is the calculated embedding for your search input</label>
  <textarea id="queryVector"></textarea>

  <button onclick="embedAndSearch()">Search</button>

  <div id="results"></div>
</body>
</html>

5. Why Start Here?

This minimalist approach provides:

Privacy: All processing is local
Speed: Instant feedback on small datasets
Portability: Single-file deployment
Transparency: Easy to inspect and debug

It’s ideal for:

Local knowledge bases (e.g., Obsidian, Zettelkasten)
Prototyping embedded interfaces
Educational tools for understanding semantic similarity
MVP validation before investing in backend infrastructure

6. Tiers of Vector Search and Model Use

Embedding Model Tiers

Tier	Model	Dim	Hosting	Strengths
1	MiniLM (Browser)	384	WebAssembly	Lightweight, instant, private
2	Sentence Transformers	768	Node.js	Stronger abstraction, serverable
3	OpenAI, Cohere	1536+	Cloud	Domain-tuned, high-performance

Vector Store Tiers

Tier	Storage Type	Access Scope	Use Case
1	In-file JSON	Local only	Prototyping, demos, educational
2	Remote JSON endpoint	Internal API	MVPs with small teams
3	Cloud DB / Elastic	Scalable API	Production-grade applications

This flexibility means you can scale both model and storage independently as your app matures.

7. Cosine Similarity vs. Other Search Methods

At the core of vector search lies the idea of comparing high-dimensional embeddings to find relevance. Cosine similarity is the most widely used method in RAG pipelines—especially in compact, low-resource setups like our in-browser MiniLM MVP.

What Is Cosine Similarity?

Cosine similarity compares the angle between two vectors rather than their distance. It’s ideal when the direction of a vector matters more than its magnitude—true for most embedding models that represent semantic meaning.

Formula:

1	cosine_similarity(a, b) = (a · b) / (\|\|a\|\| * \|\|b\|\|)

This produces a value between -1 and 1:

1 → vectors are identical in direction (perfect semantic match)
0 → vectors are orthogonal (unrelated)
-1 → opposite meaning

Why Cosine Works for MiniLM

MiniLM embeddings are already normalized to unit length
Focuses on semantic similarity, not raw distance
Performs well even on small, dense datasets like the 3-movie local JSON
Requires only a dot product—cheap and fast in JavaScript

Other Metrics

Metric	Description	Pros	Cons
Cosine	Angle between vectors	Best for semantics	Less useful for unnormalized data
Euclidean (L2)	Straight-line distance	Good for spatial layouts	Sensitive to magnitude
Dot Product	Raw projection of one vector on another	Very fast	Biases toward longer vectors
Manhattan (L1)	Sum of absolute differences	Robust to outliers	Less intuitive in high-D space

When to Use What

Use cosine when you’re matching semantic meaning and vectors are normalized (like in MiniLM).
Use L2 or inner product when you’re working with approximate nearest neighbor systems (e.g., FAISS).
Consider hybrid scoring (e.g., vector + keyword) for production-grade retrieval.

8. Beyond Search: Context Injection into LLMs

The final step of the RAG loop is feeding your best-matched context back into a language model. Even within the browser MVP, this can be demonstrated by:

Displaying top matched results with accompanying text
Using those results as preamble to a prompt to a local or cloud LLM

Future versions can integrate:

GPT-4 or Claude via API
LLaMA or Mistral models in Node.js
Client-based LLMs like WebLLM for a full local RAG pipeline

The power of the system isn’t just in the matching—it’s in what happens next. Injecting retrieved knowledge improves fluency, factual grounding, and relevance.

9. Conclusion

If RAG is the future of personalized, context-rich AI interaction, then MiniLM-in-the-browser is the future of accessible prototyping. It empowers developers, tinkerers, and curious minds to understand and deploy semantic search without any infrastructure at all.

A minimal vector store. A 30MB model. Three embedded movies. And a single HTML file. That’s all it takes to start building the future—today.

From here, you can expand upward: vector stores to remote APIs and databases, embeddings to stronger and larger models, and from simple scoring to powerful AI-enhanced context generation.

Start small. Grow smart. Build forward.

Ordering Complexity: What You can Learn from my Time in Consulting

June 19, 2025 Consulting

In technology consulting, value is created when we help clients move from ambiguity to confidence—not just in the long-term goal, but in what to do next. That next step must be actionable, testable, and high-leverage.

To get there, we use three core frameworks that I covered in a prior post Three Foundational Frameworks for Technology Consulting: A Pragmatic Guide

MECE – to break down the problem space clearly and completely
Pareto – to prioritize what matters most
HBPS – to move forward by testing our smartest guesses

But here’s the truth: one pass through this process is never enough.
Real-world problems are complex, assumptions get proven wrong, and new variables emerge.

That’s why we treat these frameworks as part of an iterative loop—a recursive consulting cycle that lets us progressively refine focus and validate solutions until we reach the MVP of the MVP: the smallest, most valuable deliverable we can build or test to generate impact and insight.

🌀 The Iterative Consulting Loop

Phase 1: Frame the Landscape (MECE)

“What’s the full shape of the problem?”

Start with a top-down breakdown: Mutually Exclusive, Collectively Exhaustive.
Identify all plausible root causes, levers, stakeholders, workflows, or failure modes.
The goal is clarity and completeness—no blind spots, no overlap.

Phase 2: Focus on Leverage (Pareto)

“Where is the biggest bang for our buck?”

Use the 80/20 rule to identify the vital few areas likely driving the majority of the pain or opportunity.
Combine data and domain expertise to find your first, best bet.
Disregard the noise. The Pareto filter reduces scope without reducing value.

Phase 3: Move with Hypotheses (HBPS)

“What smart bet can we test quickly?”

Form a hypothesis about what will improve the prioritized area.
Predict what you expect to observe if the hypothesis is true.
Design a low-cost, fast test to validate or falsify it.

This is where action begins.

🔁 Then You Validate, Refine, and Repeat

No test ends the conversation. Every outcome is a feedback signal:

If confirmed: you double down, go deeper, or expand scope incrementally.
If falsified: return to the HBPS step and try a new hypothesis within your focused slice.
If the problem was mis-framed: return to MECE and restructure your thinking.

Each iteration narrows the scope, increases precision, and brings you closer to delivery.

The Key Insight:

After each pass, apply the same logic at a smaller level of granularity.

MECE → now re-frame just the onboarding flow.
Pareto → identify the 20% of onboarding that causes 80% of drop-off.
HBPS → form a testable guess about improving that specific moment.

Repeat until you land on a Minimum Viable Hypothesis—the MVP of the MVP.

🎯 The MVP of the MVP

This is your first most valuable slice of the solution. It’s not a full system, full fix, or perfect experience—it’s the simplest implementation that can teach you something important and deliver value fast.

It could be:

One redesigned screen.
One updated query.
One automated email.
One adjusted workflow.

Delivering this fast lets you:

Prove credibility early.
Build trust with stakeholders.
Gather real feedback from real users.
Create momentum and iterate intelligently.

🧠 Frameworks Are Fractal

At every level—project, epic, feature, even individual bug—you can apply the loop:

Frame the scope clearly. (MECE)
Prioritize what matters. (Pareto)
Act on smart guesses. (HBPS)
Validate and repeat.

It’s not about running in circles—it’s about spiraling inward toward clarity and impact.

⚖️ Balance and Pacing

Phase	Framework	Purpose	Sign of Overuse
Frame	MECE	Clarify the full space	Endless analysis
Focus	Pareto	Zero in on the vital few	Intuition without structure
Move/Test	HBPS	Take fast, smart action	Solving the wrong problem fast
Validate	All 3	Confirm or adjust your approach	Avoiding the feedback loop

A great consultant doesn’t just know these tools—they know how to pace them. Fast MECE. Fast Pareto. Fast HBPS. Then back around again, tighter and sharper.

🧭 Final Thought: Deliver Clarity First. Then Deliver Value Fast.

Frameworks are thinking tools, not rituals. When used well, they unlock velocity without chaos. They let us:

Clarify where we are.
Decide where to focus.
Move forward with confidence.
Adapt as we learn.

This is how we go from ambiguity to clarity.
From strategy to action.
From roadmap to results.

One loop at a time.

Designing AI Agent Workflows: A Foundational Framework for Automation

June 18, 2025 AI

AI agents—modular systems capable of autonomously executing tasks—are increasingly forming the backbone of modern workflow automation. From real-time monitoring of IoT devices to autonomous content creation, AI agents are shifting how we think about labor, intelligence, and business process design.

In this article, we propose a foundational framework for conceptualizing and designing AI agent workflows. We build on well-established automation practices and introduce novel patterns informed by current AI capabilities.

Definition: An AI agent is a software entity that perceives its environment through inputs and acts upon it through outputs to achieve specific goals, optionally incorporating learning, memory, or reasoning.

🚦1. Trigger Phase: The Catalyst of Agent Activation

Every agent workflow begins with a trigger—an event that kicks off execution.

Common Trigger Types:

Manual: Button click, form submission.
Communication-Based: New email, chat message, API call.
Data-Driven: Database record created, CRM update.
Time-Based: Scheduled interval, cron job.
Environmental: Camera detects motion, temperature sensor passes threshold.
Inventory/Resource-Based: Coffee beans running low, battery at 10%.

💡 Design Tip: Triggers should include enough structured metadata to initiate agent reasoning or route execution properly.

🔄 2. Workflow Composition: Chaining Tasks with Structure

Once triggered, a workflow is a chain of steps where each component accepts inputs and produces outputs.

Workflow Patterns:

Linear Pipelines: A → B → C
Conditional Branches: A → (B or C) based on input
Parallel Execution: A → [B, C, D] simultaneously
Nested Flows: Agent step inside another agent step

These steps may include:

Data transformation
External API calls
Retrieval from vector stores or databases
Decision-making agents
Human approval checkpoints

📚 Reference: Zhou et al., 2023 propose structured planning schemas for agent workflows in complex environments.

🧩 3. Agent Typologies: Matching Capability to Context

AI agents vary widely in scope and intelligence.

a. Stateless One-Shot Agents

Use a prompt to solve a single task.
Example: “Summarize this email thread.”

b. Retrieval-Augmented Generation (RAG) Agents

Pull relevant documents or embeddings before reasoning.
Example: Legal assistant querying a case law vector store.

c. Autonomous Chain Agents

Plan and execute multiple subtasks.
Example: Research assistant that plans a travel itinerary.

d. Multi-Agent Systems

Collaborate via structured protocols (e.g., task delegation or consensus).
Example: One agent parses documents, another extracts financials.

🧠 Research Insight: Auto-GPT and BabyAGI are early experiments in open-loop multi-agent reasoning systems.

🪄 4. Real-World Outputs: From Pixels to Physical Action

Agents can act beyond the screen.

Example Actions:

Digital: Post to social media, send report, create slide deck.
Physical: Activate IoT device, turn on a light, adjust thermostat.
Transactional: Place an order, book a meeting, trigger a payment.

Designing these outputs safely requires validation, especially when agents impact real users or physical devices.

🔌 Tech Stack Example: Integrations via tools like n8n, Zapier, or LangChain Agents allow agents to interface with cloud services or edge devices.

🧑‍⚖️ 5. Human-in-the-Loop (HITL): Risk Mitigation and Quality Assurance

AI agents can hallucinate, overstep boundaries, or behave unpredictably. Human checkpoints are essential for trust and control.

HITL Modalities:

Review-before-execute: Draft reviewed by user before publishing.
Approve action: Agent suggests; human confirms.
Guardrails & escalation: Certain thresholds require manual intervention.
Feedback loops: Corrections improve prompt patterns or retrain models.

📚 Reference: See Kaur et al., 2022 for design best practices in human-AI interaction.

🧱 6. Data Design: The Foundation of Reliable Agents

Agents need structured, reliable data inputs and outputs.

Data Layers:

Input Formatters: Clean and transform data before ingestion.
Intermediate Representations: JSON, YAML, SQL models passed between agents.
Output Schema: Agents must write to a known schema or trigger the next step via JSON/API/DB.

A weak link here leads to error cascades across the chain.

🧠 7. Memory, Learning, and Contextual Reasoning

More advanced agents benefit from persistent memory and long-term context.

Memory Use Cases:

Remembering user preferences or past answers.
Accumulating facts over time.
Improving task efficiency (few-shot learning).

Popular libraries like LangChain Memory or LlamaIndex enable contextual recall and persistent embeddings.

🔍 Open Problem: Balancing privacy with persistent memory across multi-user environments.

🔄 8. Iterative Testing and Observability

You can’t improve what you can’t see.

Observability Must-Haves:

Logs: Track which inputs led to which outputs.
Versioning: Prompt and data versioning for rollback.
Telemetry: Execution time, error rates, human override frequency.
Sandboxes: Safely test agent behavior before production.

🧪 Frameworks to Explore: LangSmith, OpenAgentSim, n8n test environments.

🧭 Framework Summary

Phase	Description
Trigger	Event that initiates agent flow
Workflow Steps	Series of structured transformations
Agent Type	One-shot, RAG, Chain-of-Thought, or Multi-agent
Outputs	Action taken digitally or physically
Human-in-the-Loop	Mechanisms for safety, quality, and trust
Data & Schema	Structuring inputs/outputs for agent reliability
Memory/Context	Optional long-term learning and personalization
Testing & Logs	Ensuring reliability and improvement

📚 Further Reading

Zhou, M., Li, Y., Wang, S. et al. (2023). AgentVerse: Facilitating Multi-Agent Collaboration and Experimentation.
Kaur, H., et al. (2022). Trustworthy AI Interaction Design.
Microsoft (2023). Semantic Kernel – Agent orchestration framework.
Torantulino (2023). Auto-GPT – Early open agent system.
LangChain. LangChain Agents Docs

Solving the Knight's Tour: The Perfect Game for The Devil's Plan

June 12, 2025 Fun

In Netflix’s Season 2 of The Devil’s Plan, we see contestants pit raw intellect, logic, and instinct against each other in puzzles that look simple — but spiral into labyrinths of consequence. The Knight’s Tour would fit right in.

This post explores the Knight’s Tour — a deceptively elegant chess puzzle — through the lens of game design, human reasoning, and AI tooling. It also shares a simple AI-generated interactive app to explore the puzzle hands-on.

🎮 The Knight’s Tour: A Game of Pure Movement

The Knight’s Tour asks a single question:

Can a knight move to every square on a chessboard exactly once?

It’s not a match. It’s not a capture. It’s a dance. The knight zig-zags in an L-shape — two squares in one direction, one square perpendicular — threading a needle through the entire board.

The tour has existed since at least the 9th century. Mathematicians, poets, and puzzle-solvers have obsessed over it. And it’s exactly the kind of elegant brain trap you’d expect to find in The Devil’s Plan, where complexity grows not from the rules but from your own decisions.

🛠️ Try It Yourself — A Simple Knight’s Tour App

We built a simple UI for you to try solving the Knight’s Tour yourself. You can select a board size and click to move the knight. It tracks visited squares and validates your moves.

This entire app was written using AI in under 5 minutes:

🧠 Solving the Tour: The Human Way vs. The Machine Way

There are multiple algorithms for solving the Knight’s Tour:

1. Brute-Force Backtracking

Recursive
Complete but inefficient
Works like trial-and-error with memory

2. Warnsdorff’s Rule (Best for Humans)

Always move the knight to the square with the fewest onward moves
Doesn’t guarantee a solution but works almost every time, especially on 5×5+
Simple mental heuristic, no tech required

3. Backtracking + Warnsdorff (Hybrid)

Use Warnsdorff to guide a backtracking search
Fast + complete
This is what a smart AI would do

4. Graph Theory (Hamiltonian Path)

The Knight’s Tour problem can be modeled as a Hamiltonian Path problem — a path through a graph that visits every node exactly once.

Each square on the chessboard becomes a node, and every legal knight move between squares becomes an edge. The challenge is now transformed into a pure graph traversal problem:

Can you find a path that visits every node in this knight-move graph exactly once?

This abstraction is powerful because it allows us to apply general graph theory tools. For instance:

You can use DFS with backtracking to search for Hamiltonian paths.
You can apply path pruning, branch and bound, or heuristic optimizations.
It generalizes well to other movement-based puzzles beyond chess.

However, solving Hamiltonian Path problems is NP-complete. This means:

There’s no known algorithm that solves all cases efficiently.
The computational cost grows exponentially with the size of the board.

Despite its power, this method is mostly useful for academic exploration, automated solving, or AI research — not manual play.

That said, modeling the Knight’s Tour as a graph gives you a different lens: it turns a medieval chess curiosity into a computer science classic.

🧭 Why Warnsdorff’s Rule is the Best Human Strategy

Humans don’t excel at brute-force recursion. We get lost. We forget what we tried.

But humans are great at heuristics:

We can estimate.
We can visualize.
We can follow simple rules.

Warnsdorff’s Rule leans into that. By always moving to the square with the fewest onward moves, you:

Prevent self-traps
Prioritize flexibility
Preserve options later

On a 5×5 board, starting in the center gives you 8 legal moves. Starting in the corner gives you 2. That alone can make or break your game.

🧩 Game Design Meets Human Cognition

The brilliance of a Knight’s Tour is that it’s purely cognitive:

No outside knowledge
No math formulas
No language

Just you, a board, and a sequence of moves.

That’s why it would shine in a show like The Devil’s Plan. It tests how you:

Think ahead
Stay calm under pressure
Adapt when things go wrong

You could teach it in 10 seconds. You could lose to it for a lifetime.

AI Accelerators in Digital Product Engineering

June 7, 2025 AI

In modern Digital Product Engineering, product squads composed of Engineering, Design, Product, and Quality team members are increasingly leveraging AI Accelerators to drive faster, smarter, and leaner delivery.

AI Accelerators are tools, patterns, or strategies that:

✅ Increase value
💸 Decrease cost
⏱ Reduce time
🔧 Reduce complexity
🤖 Eliminate mundane tasks

🎯 Engineering-Focused Accelerators

Area	Accelerator	Description
Testing	Unit Test Generators	Auto-create tests based on code changes or OpenAPI schemas
SSFP Framework	Searching, Sorting, Filtering, Pagination	Reusable module to rapidly implement common UI patterns
Utilities	File Validations, Regex, Object Flattener	Reduce boilerplate logic through generic, shared tools
Review Workflow	Generic Review Method	A base class for standardized, extensible entity review logic
Scaffolding	Postman → Client Generator	Scaffold clients in seconds by importing OpenAPI/Postman collections
Documentation	OpenAPI + Wiki Generation	Auto-create and sync docs from source of truth

🎨 AI Accelerators for Design

Area	Accelerator	Description
Wireframing	AI Wireframe Generators (e.g., Uizard)	Rapid sketch-to-UI generation for early concepting
User Flows	Journey Mapping Assistants	Suggest UX flows based on goals, personas, and context
Design Tokens	Token Suggestion Engines	Recommend scalable token systems from brand inputs
Figma Plugins	AI UI Tools (e.g., Magician)	Generate components, placeholders, and variants with prompts
Accessibility	Contrast + A11y Review	Automated compliance checks for WCAG issues
Visual QA	Visual Regression Tools	Compare UI builds to designs pixel-for-pixel
UX Writing	Microcopy Generators	Draft engaging tooltips, labels, and empty states

🧠 AI Accelerators for Product

Area	Accelerator	Description
Requirements	PRD/User Story Generators	Turn notes, recordings, or chats into structured requirements
Prioritization	AI-Driven ICE/RICE Scoring	Score and rank backlog items based on input data
Roadmapping	Scenario Simulators	Visualize impact of scope or staffing changes on timelines
Feedback	Sentiment + Theme Extraction	Mine reviews, support logs, and surveys for insights
Competitor Intel	Auto-generated Comp Sheets	Pull data from public sources to benchmark competitors
Sprint Planning	Velocity + Risk Estimators	Predict sprint outcomes and detect overcommitment
User Testing	Insight Extractors (e.g., Maze AI)	Highlight key pain points from session recordings
Market Research	Prompt-based Reports	Summarize trends and generate digestible market briefs

🔄 Cross-Functional Accelerators

Tool Type	Value
Prompt Libraries	Role-based GPT prompt guides for Product, Design, Eng
Domain Ontologies	Pretrained taxonomies for specific verticals (e.g. fintech, health)
Design-System Validators	Tools that validate alignment between code, design, and docs
Kickoff Generators	Auto-create kickoff decks from structured prompts/templates
AI Workspace Plugins	Smart Notion/Confluence tools that summarize meetings and link tickets

📊 Business Case

Benefit	Impact
Value	Higher-quality insights, better PMF, richer UX
Cost	Reduced hours spent on rote tasks
Time	Faster handoffs and execution
Complexity	Synchronized workflows, standardized outputs
Mundane Tasks	Delegated to AI to improve human creativity

🧭 Strategic Considerations

Tool Selection: Fit tools to your squad’s existing stack and skill level
Onboarding: Use quick wins and shadowing to scale usage across teams
Measurement: Track ROI by comparing pre/post-metrics (e.g., UAT duration, backlog velocity)
Integration: Ensure AI tools work with current systems, not in silos
Governance: Create internal guidance on acceptable use of generative tools

🧩 Example Impact

🚀 UAT testing reduced by 50% using AI-paired test automation
🧱 Scaffolding time reduced by days via Postman → Client generation
📚 Documentation drift eliminated using AI-synced wiki + OpenAPI workflows
🧑‍💻 Onboarding accelerated through prefilled templates and AI-generated starter kits

👥 Closing Thought

In a well-integrated product squad, AI accelerators allow humans to focus on insight, innovation, and impact—while machines handle the repeatable, the mechanical, and the mundane.

The future of product delivery isn’t just faster—it’s smarter.

Hello World

June 6, 2025

Welcome to Hexo! This is your very first post. Check documentation for more info. If you get any problems when using Hexo, you can find the answer in troubleshooting or you can ask me on GitHub.

Quick Start

Create a new post

1	$ hexo new "My New Post"

More info: Writing

Run server

1	$ hexo server

More info: Server

Generate static files

1	$ hexo generate

More info: Generating

Deploy to remote sites

1	$ hexo deploy

More info: Deployment

AI Agents Crash Course

May 25, 2025 AI

🔍 What Are AI Agents?

Definition

Type	Description
Non-Agentic Workflow	One-shot prompting (e.g., “Write an essay on topic X.”)
Agentic Workflow	Task broken into steps, revisited iteratively
Fully Autonomous Agent	Determines steps, tools, revises itself, executes end-to-end

Key Insight

AI agents go beyond static prompts. They reflect, iterate, plan, and can use tools.
Most current systems are not fully autonomous — they rely on structured agentic workflows.

🎨 Agentic Design Patterns

Mnemonic: Red Turtles Paint Murals

Pattern	Description
Reflection	AI reviews and improves its own output (e.g. checking code for bugs)
Tool Use	AI uses web search, code execution, calendar access, etc.
Planning	AI figures out steps, tools, and sequencing for a task
Multi-Agent	Multiple AIs collaborate, specialize in roles, and interact to solve tasks

🤖 Multi-Agent Architectures

1. Single AI Agent (Building Block)

Mnemonic: Tired Alpacas Make Tea

Component	Description
Task	What the agent is supposed to do
Answer	The expected output
Model	The LLM or AI model used
Tools	APIs, calculators, or search tools used by the agent

Example:
A travel planner agent that books a Tokyo trip using Google Maps, Skyscanner, and booking APIs.

2. Common Multi-Agent Patterns

Pattern	Structure	Example
Sequential	Assembly-line (Agent A → B → C)	Document processing: extract → summarize → actions → store
Hierarchical	Manager agent delegates to specialized sub-agents	Business reporting: market trends, sentiment, metrics, all rolled up into one decision
Hybrid	Mix of sequential and hierarchical with feedback loops	Autonomous vehicles: route planner + real-time sensors
Parallel	Agents handle different subtasks simultaneously	Large-scale data analysis: chunks processed independently
Asynchronous	Independent agents work at different times; great for uncertainty and monitoring	Cybersecurity: real-time traffic, pattern detection, anomaly response
Floats (Meta)	Systems of systems; combine all patterns	Complex, decentralized AI ecosystems like those in large orgs or real-time responsive bots

🛠️ No-Code AI Agent with n8n

Toolstack

n8n: Workflow automation
Telegram: Messaging UI
OpenAI / LLM of choice: Core agent
Google Calendar: External tool

Example Workflow: “InkyBot”

Step	Action
1	Telegram triggers message input
2	Detects input type (text or voice)
3	Voice → Transcribed via Whisper
4	Sends query to agent (GPT-4, Claude, Llama, etc.)
5	Agent uses tools to read and create Google Calendar events
6	Responds with prioritized list and updated calendar

Flexible: Easily extendable to multiple agents.
No-code: Built entirely in GUI environment.

💡 AI Agent Business Opportunities

Key Insight from Y Combinator

“For every SaaS company today, there will be a corresponding AI agent company.”

Idea Generator

SaaS Company	AI Agent Version Example
Salesforce	AI CRM Assistant that manages leads and sends emails
Canva	AI Graphic Designer that takes prompts and brand kits
Notion	AI Workspace Assistant that summarizes notes and plans weekly tasks
Shopify	AI Store Manager that runs product ops, inventory, and analytics

✅ Assessment Questions

What is an AI agent? How is it different from one-shot prompting?
What are the four agentic design patterns? (Mnemonic: Red Turtles Paint Murals)
What does “Tired Alpacas Make Tea” stand for?
Name three multi-agent architecture patterns and explain them.
What is the difference between agentic and autonomous AI systems?
Describe how you might build an AI agent version of a SaaS company.
What are the advantages of multi-agent vs. single-agent systems?

AI Agents as the New Offshore Workforce: Rethinking Knowledge Work in the Age of GenAI

May 16, 2025 AI

As AI agents and generative AI tools become more capable, businesses are rushing to integrate them into their workflows—particularly in knowledge work. But as we move to adopt these technologies, it’s worth asking: what can we learn from decades of offshoring knowledge work to humans?

AI agents are not just tools; they are becoming functional units of labor—software workers in digital form. And just like offshoring, there are hidden costs, management challenges, integration friction, and long-term strategy considerations that go beyond initial savings.

This paper draws a direct analogy between AI agents and offshore teams, showing how our understanding of distributed human labor applies—almost eerily well—to artificial labor.

1. Introduction: The New Labor Arbitrage

In the early 2000s, companies embraced offshoring to reduce labor costs and scale operations quickly. But they quickly learned the difference between cost-saving on paper and value creation in practice.

Today, we’re witnessing a similar moment with AI agents. The promise: reduce manual effort, accelerate throughput, and generate content, code, or decisions faster than ever before.

But just like offshoring, deploying AI agents:

Requires orchestration and oversight
Raises quality assurance and trust questions
Can create fragmentation and duplication without systems thinking
Leads to “invisible work” needed to manage the automation

2. GenAI and AI Agents: Definitions in Context

Term	Meaning	Analog in Offshore Labor
GenAI	Large language models like GPT, Claude, Gemini	General-purpose junior analysts or writers
AI Agents	Goal-oriented, persistent systems that perform tasks semi-autonomously	Offshore team members handling recurring workflows
AI Components	Modular systems like embeddings, summarizers, or classifiers	Specialized roles in a distributed team (e.g. translator, scribe)

AI agents aren’t just tools. They simulate teams. When you wire up an OpenAI function-calling agent to take in customer tickets and respond, you’re not just automating a task—you’re replacing a business function.

3. Benefits (Mirroring Offshore Value Propositions)

Scalability Without Headcount
Just like hiring 100 analysts in India once allowed firms to scale research, AI lets you spin up 1,000 agents for pennies.
24/7 Availability
AI agents don’t sleep. Just like BPOs created always-on service windows, AI creates continuous uptime for work execution.
Cost Efficiency
The per-output cost of an AI agent is often less than 1% of a U.S.-based knowledge worker.
Speed
AI doesn’t need breaks. With the right prompt and structure, agents can generate, synthesize, and report in seconds.

4. Pitfalls and Costs (Echoing Offshoring Lessons)

Category	AI Agent Pitfall	Offshoring Parallel
Oversight	AI agents hallucinate or misfire	Offshore workers misunderstood ambiguous instructions
Coordination	Glue code, retries, context passing, and tool orchestration	Cost of managers, communication overhead, rework
Quality Control	Output often needs human review or refinement	Time spent editing deliverables from offshore partners
Cultural Context	LLMs lack real business nuance and domain knowledge	BPO staff unfamiliar with company culture or goals
Security & IP	Agents require sensitive access to data/tools	Offshore risks with data leaks or compliance violations
Integration Tax	Agents don’t fit cleanly into most business systems	Same as integrating offshore teams with legacy workflows

Even though they seem low-cost, AI agents are not free. Like any worker, they must be trained, monitored, and compensated (in this case, through compute).

5. Best Practices for AI Agent Deployment (Borrowed from Offshoring)

Standard Operating Procedures (SOPs)
Document workflows before handing them to agents. Treat prompt design as you would a training manual.
Human-in-the-Loop Systems
Just like pairing junior offshore analysts with local QA leads, agents should be paired with reviewers—at least initially.
Centralized Orchestration
Create internal platforms to manage agents, track errors, handle retries, and version prompts—akin to offshoring PMOs.
Focus on Modular Tasks
AI excels at bounded, well-scoped tasks. Avoid giving agents end-to-end workflows without robust fallbacks.
Performance Monitoring & Feedback Loops
Track agent performance like you would employee KPIs. Add reinforcement learning from human feedback if feasible.

6. Strategic Implications

AI agents are the new labor pool. You’re not just installing automation—you’re hiring a team of machines. This changes the nature of:

Workforce strategy: What roles need humans vs. agents?
Tech stack: How do you route, observe, and orchestrate agents like team members?
Cost models: Are you budgeting for prompt engineering, observability, and agent “onboarding”?

Those who treat AI agents like a line item in software costs will fail. Those who treat them like employees—with all the necessary investment—will win.

7. Conclusion

The analogy between AI agents and offshore labor isn’t just useful—it’s prophetic. Every pattern we saw in knowledge work offshoring is now playing out again, only faster and with silicon minds.

The winners in the GenAI era won’t be the ones who automate the fastest. They’ll be the ones who manage automation with the wisdom of human labor history—with rigor, empathy, and systems thinking.

The Value of People, Ownership, and Trust: A SEAM-Inspired Perspective

May 14, 2025 Leadership

Credit to Co-Author Sarah Skillman

In a world awash with tools, frameworks, and automation, it’s easy to mistake leverage for value. But tools don’t solve problems—people do.

SEAM Insight: As Henri Savall observed, the real cost of dysfunction is invisible—buried in lost potential, misused energy, and human underutilization. Tools may optimize effort, but they cannot activate human energy, which SEAM defines as the greatest untapped resource in organizations.

If your goal is sustainable, meaningful progress in digital product delivery, the conversation must shift from “What technology are we using?” to “Who owns the outcome?”

Ownership, however, is a challenge—not just for individual contributors, but for leadership as well. It demands vulnerability, trust, and the willingness to relinquish control. Many organizations remain stuck in a cycle of comfort and control, mistaking process expansion and tooling as progress, when in fact they are often symptoms of fear avoidance.

Value Lives in People, Not Tools

Let’s start with a simple truth: Tools are force multipliers, not force creators.

SEAM Premise: Dysfunctions in organizations stem from poor alignment between structures and people.

Implementing tools without addressing human systems only increases entropy. When you invest in tools before people, you automate and scale dysfunction. We’ve seen this in failed agile transformations, platform overhauls, and digital strategy reboots.

Tools level the playing field. Ownership raises the ceiling. But ownership requires discomfort. Comfort in an org often means nobody is challenging assumptions. Tools become the illusion of control.

The Trust → Ownership → Value Flywheel

Drawing from The Culture Code and SEAM alike, cultures of innovation and sustainable performance emerge when three layers align:

Psychological Safety
Belonging Cues
Purpose Alignment

SEAM Adds: Sustainable performance emerges when trust, dialogue, and co-responsibility are cultivated across all levels.

Ownership is not task assignment—it’s delegation of outcomes. And outcomes require agency.

“A person will take responsibility only for what they believe they influence.” — Henri Savall

The Case for Ownership

A. Strategic Value

Empowers decentralized, timely decisions
Unlocks innovation from every level
Avoids costly over-control

B. Operational Risk Without Ownership

Slowdowns due to excessive oversight
Diluted accountability
High attrition and disengagement

C. Cultural Benefit

Strengthens mutual trust and collective learning
Builds a high-energy organization, as defined by SEAM

SEAM Insight: Hidden costs (e.g., absenteeism, waste, time misuse) are signals—not failures—of a system failing to cultivate trust and autonomy.

You Don’t Need to Know It All

Perspective matters more than pedigree. Leaders must create systems where others own the result, even if they approach it differently. This is the shift from control to trust.

SEAM Practice: Dialogic engagement—involving people in the analysis, diagnosis, and co-design of solutions—is the key to co-responsibility.

“Collaboration isn’t consensus—it’s co-responsibility.”

The Risk of Not Owning

If no one owns the outcome, the business owns the risk.

Symptoms of absent ownership:

Scope churn
Process sprawl
Burnout and turnover

You can spend millions fixing the wrong problem because ownership was never distributed. SEAM frames these symptoms as non-material hidden costs that erode organizational vitality. They often arise when technical systems are prioritized over human systems.

Comfort creates complacency. Complacency delays ownership. Delay leads to risk.

Gen AI Can’t Own the Outcome (Yet)

Gen AI is a powerful tool. It accelerates. It summarizes. It even “reasons.”

But it doesn’t own.

Especially not in regulated, high-risk sectors like finance. In a world where hallucinations are catastrophic and accountability is non-negotiable, Gen AI cannot be trusted with outcomes.

SEAM Warning: Dehumanization through technical fixes leads to disengagement. Tools can reduce complexity—but not human meaning.

We believe in using Gen AI. But we don’t believe it can replace judgment, especially where risk, ethics, and regulatory context define success.

How We Do It: The Human OS

We build product teams like we build software: iteratively, transparently, and with the human in the loop.

Inspired by SEAM, we apply small changes in work design, responsibility, and dialogue to compound into systemic gains.

Our Human OS includes:

Small, cross-functional squads
Clear domain ownership
Flexible frameworks
Retros with purpose, not just ritual

We invest in people first. Then we build systems that allow them to thrive.

Final Word: People Build the Future

Tools will keep getting better. Gen AI will keep evolving. But trust, agency, and ownership are the differentiators.

Henri Savall called this “economic performance through human potential.”

The best systems are not those that eliminate human judgment—they enhance it.

Ownership is not a liability. It’s the only sustainable insurance policy in a world of constant change.

Key SEAM Sources Referenced

Savall, H., Zardet, V., & Bonnet, M. (2008). Potential of hidden costs recovery: Toward the sustainability of change management. ISEOR.
Heorhiadi, A., La Venture, K., & Conbere, J. (2014). The impact of the socio-economic approach to management on workplace health. OD Practitioner.
Savall, H. (2003). Work and People: An Economic Evaluation of Job-Enrichment. Oxford University Press.
Heorhiadi, A. (2015). Restoring the soul of business: Sustainability through SEAM. SEAM Institute.
Conbere, J., Heorhiadi, A., & Savall, H. (2018). Decoding the Socio-Economic Approach to Management. ISEOR/University of St. Thomas.
Zardet, V. & Savall, H. (2009). Mastering hidden costs and socio-economic performance. Information Age Publishing.

From Overwhelmed to Empowered: Using the Eisenhower Matrix to Incorporate GenAI into Daily Workflows

May 2, 2025 AI

Generative AI tools like ChatGPT, Claude, and others promise a step-change in productivity, but most people are missing the point. These aren’t magic brains. They’re tools—powerful ones—that must be pointed in the right direction.

The issue isn’t the technology. It’s the interface of human behavior and decision-making. Without a system to help people figure out what they can offload, when, and why, even the best AI becomes underused or misapplied.

That’s where the Eisenhower Decision Matrix comes in—a decades-old, proven mental model that can now serve as the missing foothold to help knowledge workers and individuals adopt AI effectively. When AI becomes the thing you “delegate to” instead of a novelty, you unlock compound productivity gains.

The Eisenhower Matrix Refresher

The Eisenhower Matrix helps categorize tasks along two axes: Urgency and Importance.

	Urgent	Not Urgent
Important	Do it now (Focus)	Schedule it (Plan)
Not Important	Delegate it (Offload)	Ignore it (Eliminate)

Most people operate reactively—pulled into Quadrant 1 (Urgent + Important) and constantly drowning in busywork from Quadrant 3 (Urgent + Not Important).

They lack employees, assistants, or tools to delegate, so everything piles on.

The Breakthrough Insight: GenAI as Your First Delegate

Most individuals don’t have executive assistants, project managers, or junior analysts. But they now have ChatGPT. It’s the first truly general-purpose delegate that works for almost any kind of cognitive task—writing, research, summarization, analysis, ideation, planning, and even emotion-safe venting.

But the challenge remains: What should I ask it to do?

That’s where the Eisenhower Matrix becomes the activation tool.

Reframing the Matrix for AI

Let’s reinterpret the four quadrants with a GenAI lens:

	Urgent	Not Urgent
Important	Business as Usual	Schedule, Plan and Prep using GenAI
Not Important	DELEGATE to GenAI	IGNORE

Quadrant 2 and 3 are where we find the most low-risk opportunity for Accelerators: way to increase either efficiency or velocity.

Be sure to avoid the pitfalls

Quadrand 4: DO NOT fall into the trap of using AI excessively here and letting your urgent tasks slip. This is the opposite of the goal of the Eisenhower Matrix but takes vigilance to avoid.
Quadrant 1: This is a much higher risk quadrant. Ideally we will use AI to accelerate here, but it is recommended to truly understand your personal way of working and be more experienced with AI before applying it here.

Why This Works

The matrix gives clarity.
Most people don’t know how to categorize their work. The matrix forces prioritization.
AI is an action enabler, not a decider.
People still choose what matters—but AI accelerates how fast they can act.
You don’t need to hire.
Delegation is now a skill, not a budget item. Anyone can start today.

Conclusion: The Eisenhower Matrix is the Missing Bridge

GenAI is not a crystal ball or a coworker—it’s a delegate waiting to be directed. The Eisenhower Matrix gives you that direction.

This isn’t about automation vs human.
It’s about thinking better and acting faster.
One quadrant at a time.

Call to Action

Start today:

Take your current task list.
Sort it into the four Eisenhower Matrix quadrants.
Pick one task from each quadrant and hand it to AI.
Refine your prompt, iterate, and ship.

The future of work isn’t just AI-powered.
It’s decision-powered humans + execution-powered AI.