Hike News
Hike News

What is Ingestion and How do LLMs talk to your documents?

Document Ingestion

Let’s demystify “giving documents to an LLM.” We’ll walk through the exact steps our playground app performs (render → extract → chunk → embed → retrieve), then show clear upgrade paths for accuracy, speed, and stability. This is entirely in the browser to give the curious developer a kicking off point to understand the fundamentals of document ingestion for LLM context engineering.

Nagivate to the full playground here

Upload a document

I start by uploading a file. In this explainer, the app ingests PDFs.
rendered pdf

Extract what’s inside

I pull out the elements (primarily text) and preserve useful structure:
extracted text

  • Pages (for citations)
  • Paragraphs
  • Headings / sections (for meaning and navigation)

If you need high‑fidelity parsing (scanned PDFs, tables, multi‑column layouts), swap in a stronger extractor later with no change to the rest of the flow.

Split into chunks

I break the extracted text into smaller pieces so we can retrieve only what’s relevant.
chunking

Heuristics that work well for prose

  • Size: around 400 (think ≈400 tokens; roughly 800–1,600 characters)
  • Overlap: about 15% to keep thoughts intact across boundaries
  • The more semantic the split (ends of sentences/paragraphs/sections), the better the downstream results

In the playground You can chunk by characters, words, or sentences. Start simple, then move toward semantic chunking as needed.

Review the chunks

Skim to make sure the chunks and overlaps make logical sense (no mid‑sentence cuts, tables split in half, etc.). Clicking a chunk scrolls both the extracted text and the PDF page to the right spot.

Turn chunks into embeddings

  • An embedding is a numeric fingerprint of meaning. Similar text → similar vectors.
  • In the playground: I use a lightweight browser model so everything runs locally.
  • In production: call an embeddings API that fits your data (e.g., legal, medical, support). Store the vector and metadata (doc, page, section) for citations later.
  • I embed the user’s query and find the Top‑K most similar chunks.
  • Top K (vec): how many results to keep (also helps manage token usage when sending to the LLM)
  • Min vec: minimum similarity score to filter weak, off‑topic matches

Search by words (keyword / phrase)

  • Semantic search can miss exact strings (product codes, IDs, proper nouns). To cover that, I also build a reverse index for keyword search:
  • Exact phrase: literal substring match (case‑insensitive)
  • BM25: classic keyword ranking when you don’t want strict exactness

Hybrid search (best of both)

I combine both result sets using RRF (Reciprocal Rank Fusion) so that a highly relevant item from either vector or keyword search rises to the top. This hybrid list is what we feed into the LLM as extra context.

Send to the LLM (with receipts)

I provide the top chunks, along with their page/section references. Manually Prompt the LLM to quote and cite. If similarity scores are too low, instruct it to respond with “not enough context” instead of guessing.

Why this works

  • Clean extraction → better input
  • Sensible chunking → better recall and less waste
  • Good embeddings → better semantic matches
  • Hybrid search → covers both meaning and exact terms
  • Citations → trustworthy answers

Nice upgrades when you’re ready

  • Better extractors (up to your internal pipeline like “Dolphin”) for layout, tables, and scanned pages
  • Semantic chunking with an LLM (split by meaning, not just size)
  • Domain‑tuned embeddings via API
  • Hierarchical summaries (section/document‑level nodes) for fast, stable retrieval on long docs
  • One‑stop frameworks (e.g., LlamaIndex) to assemble these strategies with less plumbing

The End-to-End Picture

Pipeline: Ingest → Extract → Normalize → Chunk → Embed → Index → Retrieve → Assemble Context → Generate → Evaluate/Monitor.

This PDF playground implements a minimal but working slice of this pipeline:

  • Render PDFs with PDF.js and extract text
  • Chunk by characters/words/sentences with overlap
  • Embed chunks with a local MiniLM model
  • Search via cosine similarity and preview result chunks
  • Jump from any chunk to its position in the PDF (traceability)

Principle: start with something observable and end‑to‑end; then layer sophistication under measurement.


Quick Roadmap (Baseline → Better → Best)

Stage Baseline (In this playground) Better Best
Extract PDF.js text Unstructured / GROBID / OCR Layout‑aware + custom heuristics (e.g., “Dolphin”)
Clean whitespace + page tags de‑hyphen, boilerplate removal doc‑type normalizers per template
Chunk fixed windows semantic + structure‑aware parent‑child + hierarchical summaries
Embed MiniLM‑L6‑v2 bge‑base / e5‑large / gte‑base domain or premium (text‑embedding‑3‑large)
Index in‑mem FAISS / pgvector hybrid (BM25⊕vec) + HNSW/PQ + filters
Retrieve top‑K cosine MMR, query expansion reranker (cross‑encoder), dynamic K
Context concat window stitch + cite hierarchical assembly with parent summaries
Generate plain prompt scaffold + cite verification + tool‑augmented answers
Evaluate manual eval sets + dashboards canaries + regression gates

Appendix A — How this ties to our PDF Playground

  • Upload & Render: PDF.js draws canvases per page; we tag each with page ids for scrolling/highlighting.
  • Extract: pull text items, normalize whitespace, insert page markers.
  • Chunk: configurable unit (chars/words/sentences), size, overlap; each chunk stores source offsets.
  • Jump: click a chunk to select its span in the text and scroll to the PDF page.
  • Embed & Search: MiniLM embeddings in‑browser; cosine top‑K with min‑score filter and result list.
  • Export: save chunks to JSON/Markdown for downstream use.

Next features we can add to the app to match the whitepaper:

  • OCR for scanned PDFs; table extraction; figure captioning.
  • Semantic/LLM chunker; parent‑child windows; summary node creation.
  • Hybrid search (BM25 + vector) and reranking.
  • Citation renderer that highlights exact PDF spans and section headers.
  • Built‑in mini evaluation suite (seeded Q/A pairs, hit‑rate, nDCG, and faithfulness checks).

Appendix B — Practical Defaults & Checklists

Chunking defaults

  • Size: 600–1,000 tokens; Overlap: 10–20%; Break on headings/bullets; Keep paragraphs intact.

Retrieval defaults

  • K: 4–8; Min cosine: 0.3–0.4; MMR λ: 0.3; Rerank top 20–50 when needed.

Embedding ops

  • L2‑normalize; track model name+version; re‑embed on model change; snapshot indices.

Prompting

  • Require citations with page numbers; allow “I don’t know” on low‑score queries.

Security

  • PII redaction where required; per‑tenant encryption keys; audit logs on retrieval events.

AI Evolution: The Five Frontiers of LLMs

For years, tokenization has been a silent bottleneck of large language models (LLMs). Every interaction with ChatGPT, Gemini, or LLaMA begins not with words, but with a process that breaks text into “tokens”—numeric fragments that stand in for characters, syllables, or whole words. It’s a clever compression hack, but also a structural flaw.

Andrej Karpathy, one of the earliest voices in deep learning and former AI lead at Tesla, has gone so far as to say that much of the “pain” in language modeling comes from tokenization itself.

But tokenization is only one challenge. To really understand where LLMs are heading, it helps to map the improvements into five distinct frontiers.


The Landscape of LLM Improvement

Below we have a MECE analysis of the spaces of advancement in LLMs today. Essentially all work falls within the Model or Solve challenges outside the model.

1. Model

Representation Improvements (Detokenization and Semantic Chunking)

  • Moving beyond fixed tokenization toward byte-level, learned segmentation (e.g. Dynamic Chunking / H-Net).
  • Spreadsheet LLM: SpreadsheetLLM moving from static encodings to schema-adaptive chunking.
    • Goal: Faithful, efficient representation that aligns with human-like semantic structure.

Processing Power and Efficiency (Distillation, Training, Hardware)

  • Distillation, pruning, quantization to shrink models.
  • Segmentation strategies like DeepSeek for modular efficiency.
  • Hardware & training innovation: sparse attention, custom accelerators, curriculum learning.
  • Goal: More capability per unit of compute, making models sustainable.

Generative Paradigm Shifts (Diffusion LLMs)

  • A new modeling approach: instead of predicting the next token sequentially, the model refines a noisy sequence toward coherent text.
  • Inspired by image diffusion models (e.g. Stable Diffusion).
  • Promises:
    • Parallelism – avoids strict left-to-right generation, potentially faster inference.
    • Global coherence – generates sequences that reflect structure at all scales, not just local next-token probabilities.
    • Robustness to noise – models that can revise and refine rather than only autocomplete.
  • Still early: less mature than autoregressive transformers, but a clear signal that the paradigm itself isn’t fixed.

2. Ecosystem

Memory and Context Management (Tooling Around the LLM)

  • Vector stores, RAG, windowed memory, long-context transformers.
  • Keeps conversations coherent and useful across both short- and long-term horizons.
  • Goal: Expand usable memory without ballooning cost.

Orchestration and Reliability (Agent Frameworks)

  • Prompt chaining, orchestrator-worker agents, evaluator-optimizer loops.
  • Ensure consistency, task decomposition, and enterprise reliability.
  • Goal: Make LLM outputs repeatable, auditable, and scalable.

The Most Profound Advancements

Based on our above model. We can surmise that the most profound advancements in AI will occur in the Model space. And within that, they will come from insights around the true challenges at a fundamental level and how to approach them. The rest of this blog explores the current research into this space that is representative of the types of work that will provide the most transformative advancements in the coming years.

The Bitter Lesson and Why Tokenization Hurts

Richard Sutton’s “Bitter Lesson” tells us that methods which scale with compute almost always beat handcrafted shortcuts. Tokenization is exactly the kind of shortcut that doesn’t scale.

Consider:

  • Loss of semantic fidelity – Words like strawberry get split into meaningless chunks, leading even top models to miscount letters (“How many R’s?” remains a classic failure case).
  • Inefficiency across languages – In English, “hi” might cost a single token. In Shan, a language of Myanmar, “hola” could cost 14 tokens. That means users literally pay 15× more for the same greeting. Even Spanish carries a penalty: ~1.55× the token load of English.
  • Bias in computation cost – More tokens mean slower responses, higher API bills, and shorter effective context windows.
  • Structural blind spots – In Chinese, radicals carry meaning. Tokenizers often miss these, breaking characters in ways that obscure semantic hints.

All of this adds up to an unfair, inefficient, and brittle foundation.


Dynamic Chunking: Teaching Models to Split on Their Own

The proposed alternative is Dynamic Chunking, a new architecture known as H-Net. Instead of starting with tokens, H-Net begins at the byte level—the smallest stable unit of digital text. From there, it lets the model learn its own segmentation through layered chunking modules.

Key innovations include:

  • Byte-level encoding – Preserves maximum information across alphabets, symbols, and even genomic sequences.
  • Routing and smoothing modules – The model proposes cut points, then corrects them if confidence is low, ensuring splits that align with natural language units.
  • Hierarchical abstraction – Just as humans first learn letters, then words, then concepts, H-Net builds meaning through successive chunk layers. With multiple stages, it tends to converge on human-like divisions (“backbone” stays whole instead of splitting into back + bone).

The outcome is an end-to-end learned segmentation strategy. No hand-crafted vocabulary. No arbitrary cut points.


Benchmarks and Tradeoffs

The research shows promising results:

  • Performance – H-Net matches or surpasses standard transformers at the same compute scale, especially beyond ~30B training bytes.
  • Domains – Gains are most visible in Chinese, code, and DNA sequences—areas where tokenization fails hardest.
  • Robustness – Better at handling noisy text, capitalization variations, and character-level queries.

But challenges remain:

  • Training cost – H-Net is slower to train than isotropic transformer baselines.
  • Engineering complexity – New inference paths and runtime optimizations would be needed for production.
  • Incremental gains – At small scale, the difference isn’t dramatic—making it unlikely that major labs will overhaul their pipelines in the short term.

Strategic Outlook: Why This Matters

Dynamic Chunking is less about immediate replacement and more about pointing toward the long arc of scaling. Sutton’s Bitter Lesson predicts that approaches which better exploit compute will eventually win. Removing tokenization aligns with that trajectory.

For enterprises, the implications are worth tracking:

  • Fairness and global adoption – A byte-level system reduces the hidden tax non-English speakers pay today.
  • Domain generalization – From legal codes to genetic data, models that aren’t shackled to token vocabularies can adapt more fluidly.
  • Future efficiency – As context windows expand, the cost of tokenization overhead compounds. A learned segmentation system may scale more gracefully.

Tokenization got us to where we are today through the early transformer era. But like all first steps, it carries hidden costs. Dynamic Chunking represents a first serious attempt to replace it with something learned, scalable, and fair.

Whether H-Net itself becomes the new standard is less important than the progress of focusing on the next best bottleneck. After this challenge is tackled and the dust settles, the next challenge to address will become clear.

The Bitter Lesson suggests that architectures embracing raw compute will win. By working at the byte level and letting models learn their own abstractions, we may finally be turning the page.

Agentic Frameworks: From Fundamentals to Enterprise-Grade Orchestration


AI agents aren’t magic. They’re structured systems that combine reasoning, tools, and orchestration to deliver consistent, production-ready outcomes.
The best implementations don’t just “let the AI run”—they shape its autonomy with frameworks, guardrails, and feedback loops.

Anthropic’s resource on Building Effective Agents provides an excellent technical foundation for thinking about how to structure these systems.


1. Fundamentals

Basic LLM

basic llm

A Large Language Model on its own can:

  • Interpret input
  • Generate responses
    But without memory, tooling, or orchestration, it’s limited in scope, prone to hallucination, and inconsistent for complex, multi-step workflows.

Augmented LLM

augmented llm

An augmented LLM adds:

  • Memory (short-term or vector-based)
  • Tools (APIs, databases, automation triggers)
  • Orchestration logic (control over when and how to call the LLM)
    This transforms the model from a single-turn assistant into the foundation of an agent.

2. Basic Agentic Frameworks

Prompt Chaining

prompt chaining

Sequentially passing the output of one agent into the input of the next.

  • Best for: High control, clear sequential steps.
  • Cons: Requires understanding and documenation for the process. Not as agile as other solutions. Changes to the process often require structural changes to the flow.

Routing

router

A classification step sends the task to the most appropriate downstream agent.

  • Best for: High-volume, multi-category workflows.
  • Risk: Misclassification cascades into wrong outputs.

Parallelization

parallelization
Multiple agents handle sub-tasks simultaneously; results are merged.

  • Best for: Multi-dimensional analysis; latency reduction.
  • Risk: Conflicting outputs require robust merge logic.

Orchestrator

orchestator
A parent agent that decides which agents to call, and in what order.

  • Best for: Complex workflows requiring dynamic sequencing.
  • Cons: As the execution is more flexibile the tooling around the discrete components must be well designed and implemented.

Evaluator–Optimizer Loop

evaluator optimizer

An evaluator checks if output meets criteria; if not, an optimizer revises and resubmits.

  • Best for: Quality-critical outputs.
  • Risk: Loop stalls if criteria are vague, or poor design can result in expensive iteration counts.

3. Complex Enterprise Patterns

Above are the fundamental patterns, but depneding on the complexity they will be synthesized into complex patterns for solving different problems at scale.

Orchestrator + Evaluator–Optimizer

orchestrator evaluator optimizer
Using Evaluator–Optimizer as each node within an Orchestrator pattern:

  • Ensures iterated, production-level outputs at every stage.
  • Common in regulated industries or high-brand-risk environments.
  • Often coupled with guardrails frameworks to enforce:
    • Output quality
    • Output structure
    • Compliance with policy

Routing + Tool Agents

routing tool agents
Using a Routing pattern to select the correct tool-specialized agent:

  • Each agent operates in a sub-collection of tools it’s expert in.
  • Ensures specialization while maintaining flexible coverage of domains.
  • Example: Customer support routing to agents specialized in billing, technical troubleshooting, or escalation handling.

4. Strategic Takeaways

  • Two categories of problem drive framework choice:

    1. Known workflow and fixed steps:
      • The sequence of actions will remain the same across problems.
      • Prompt Chaining and Parallelization excel here due to their predictability, control, and ease of optimization.
    2. Variable workflow and steps:
      • The sequence may change depending on the problem.
      • Requires more flexibility and places more reasoning and iteration burden on the agent.
      • This is where Parallelization can be upgraded to an Orchestrator–Workers pattern, and where the Evaluator–Optimizer loop shines.
      • In these cases, agents can dynamically create and delegate tasks, evaluate each other’s work, and iterate until outputs meet strict requirements and guardrails.
  • Evaluator–Optimizer is the most common enterprise pattern.
    It’s often embedded inside other frameworks for complex tasks.

  • Guardrails are critical in production to enforce output quality and consistency.

  • Mix frameworks to match your operational priorities:

    • Use Routing for triage.
    • Parallelization for speed, options, and hedging against hallucination.
    • Orchestrator for sequencing.
    • Evaluator–Optimizer for quality control and consistent output.

Like any architectural and software solution, there are patterns and best practices. But, the right solution will be dependendant on your pain points, goals, current system and future system states.

By understanding and deliberately applying these frameworks, you can design AI systems that are:

  • Reliable
  • Auditable
  • Scalable
  • Enterprise-grade
  • Built to fit

Context didn't kill Prompt Engineering, it Abstracted it. And it isn't the future, it's the Present.

I’ve been seeing a lot of posts about “death of prompt engineering” and the rise of “context engineering.” In reality, from a fundamental standpoint the AI, or the model, hasn’t changed at all. What’s changed is the tools we wrap around the LLM.

Context engineering isn’t a rejection of prompt engineering. It’s a reframing—one that better mirrors how organizations achieve scale: not by shouting louder into the prompt, but by building intelligent systems around it.

And it’s already happening.


Context Engineering ≠ Prompt Engineering v2

Let’s make a distinction that most posts miss:

  • Prompt Engineering is about crafting better text as input for LLMs to get better text as output.
  • Context Engineering is about crafting better systems that generate, manage, and shape the input text to get better text as output.

It’s the difference between:

  • Writing the perfect message to a colleague (prompting), and
  • Having a process, workflow, template, and examples so the colleague doensn’t need you to tell them what to do.

Context engineering does not replace prompt engineering. It pushes it down a level. Under the hood, the system we’ve built still uses text to prompt the LLM to return text. You’ve just moved the craft upstream—into tools, metadata layers, agent memory, and procedural scaffolding.

Where Prompt Engineering Still Lives

Even in mature context-engineered systems, someone still has to:

  • Tune few-shot examples
  • Handle hallucination through guided output format
  • Encode instructions in system prompts or manifest schemas
  • Define MCP contracts

Context engineering is just making that invisible to the end user.


How Context Engineering Works

Let’s break down what context engineering actually looks like in the field:

Layer Role in Context Engineering Examples
Pre-context Input filtration, intent classification Command routers, UI affordances
Orchestration Deciding what tools & memory to load LangChain, AutoGen, n8n
Contextual Injection Assembling input to feed model System prompts, history, embeddings
Post-processing Constraining or refining model output Validators, type coercion, agents

Each step adds structure around the prompt so the human doesn’t need to think about prompt engineering—but the machine still benefits from it.


Why This Shift Mirrors Cloud Transformation

We’ve seen this before with Cloud computing. It didn’t eliminate infrastructure. It abstracted it.

  • The work didn’t disappear.
  • The mental model for how to get value changed.
  • Value flowed toward those who knew how to orchestrate the pieces.

Prompt engineering isn’t dead. It just moved.

Now it lives in:

  • Vector retrieval pipelines
  • API manifest wrappers
  • Memory system heuristics
  • Modular agents with role-specific context
  • Function-calling scaffolds with input normalization

That is prompt engineering—just staged and systematized.

The Cloud Parallel: Abstraction, Not Elimination

Context engineering’s trajectory mirrors cloud’s evolution in one crucial way: it doesn’t remove complexity—it relocates and hides it.

With the cloud, we didn’t stop caring about load balancers, security, or networking—we just exposed those concerns through new abstractions: AWS IAM, Lambda, Terraform. Each abstraction spawned entire industries.

Context engineering is doing the same.

  • Instead of learning to shape the raw prompt, practitioners now specialize in building memory layers, orchestrators, vector stores, function routers.
  • New roles are emerging: AI Toolsmiths, PromptOps Engineers, Retrieval Engineers, Manifest Architects.
  • Companies are creating product verticals not just on model choice, but on context stack fidelity.

Just as DevOps redefined how we deploy, we’re now seeing the birth of PromptOps and ContextOps—focused not on writing prompts, but deploying context.

Further Abstractions from the Cloud Era

History tells us what happens next:

  • Infrastructure as Code (IaC): Codifying system architecture into version-controlled declarations (e.g., Terraform, Pulumi). In GenAI, this becomes context-as-code: defining reusable, composable context modules.
  • Serverless: Abstracting server management entirely (e.g., AWS Lambda). In GenAI, this parallels agentless prompts—functions triggered dynamically by context without persistent agents.
  • Platform-as-a-Service (PaaS): Reducing cognitive load of deployment and environment setup (e.g., Heroku, App Engine). The analog in GenAI: no-code AI stacks that expose intent-routing, embeddings, and RAG as configuration.
  • Containerization & Docker: Encapsulating environments to guarantee consistency and portability. In GenAI: modular context packages (prompts, tools, vector schemas) bundled and reused across teams and apps.
  • Observability & APM: Monitoring performance, health, and failure points. In GenAI: context telemetry and traceability—where did this answer come from, what memory shaped it, which agent touched it.

Each abstraction moved complexity into reusable, composable, and opinionated layers. GenAI is following the same pattern. And the winners won’t just be good prompt engineers—they’ll be the architects of these next layers.


Context Engineering Is Already Here

If you’ve used:

  • Slackbot integrations
  • GitHub Copilot
  • ReAct agents
  • AI copilots with long-term memory

…you’ve already encountered systems where the prompt is built for you—based on context, not input.

We’re not speculating about the future.

Context engineering is the present tense of applied LLMs.


Don’t get distracted by the headline wars of “prompt vs context.”
They’re not one vs the other. They’re stacked layers.

You still need prompt engineering to make a model useful.

But you need context engineering to make it usable, repeatable, and invisible.

And just like the shift from on-prem infra to the cloud, the future belongs to those who understand the system around the model, not just the syntax inside it.

Speaking Through the Machine: How to Get LLMs to Write in Your Voice

A common question in the age of language models is: How do I get ChatGPT to write like me? But the real question is deeper: What does it mean for a machine to think in my voice?

I’d like to explore how to reliably steer large language models (LLMs) toward producing content that reflects your voice, values, and conceptual style.


Not Just Statistical Madness: The Anthropic Insight

A persistent myth in the AI space is that LLMs are nothing more than statistical parrots—models that guess the next word in a sequence based on raw probability. While technically grounded in next-token prediction, this description radically understates the complexity of what’s going on inside a transformer.

Recent research from Anthropic challenges this view head-on. In a study of model behavior, researchers demonstrated that the model didn’t simply guess the next word; it planned toward an outcome it anticipated, even if it couldn’t execute it immediately. For instance, when tasked with completing a poem, the model planned to end a line with “rabbit”—but couldn’t output that word right away. Instead, it began generating the line with preceding words that would make sense in retrospect.

Even more striking: the “intent” to say “rabbit” was not stored as a word or a token—it existed as a diffuse activation pattern within the model. Suppressing this activation didn’t just change the last word—it changed the whole sentence structure. The sentence wasn’t built one word at a time by chance. It was constructed as a distributed, planful act that persisted only through the model’s own generated text.

This insight was further validated through a simple yet powerful intervention: giving the model an explicit space to “think” before acting. Researchers discovered that inserting a tag—essentially a scratchpad where the model can write internal notes before deciding on a final answer—drastically improved reasoning and planning. The model wasn’t altered in any structural way. It was still doing next-token prediction. But when allowed to treat part of its output as provisional thought, it revealed deeper abstractions and more coherent logical structure. This implies the model has latent capabilities for complex cognition—it simply needs the right affordances to express them.

This reveals something profound:

The model is using language (serial output) as its working memory, and the continuity of its own output becomes the substrate for deeper abstraction.

In other words, it’s not just statistically completing the next word. It is building and executing plans, compressing and abstracting ideas, and unrolling those abstractions across layers of a deep neural network. Its reasoning is distributed, emergent, and entangled in the very medium it outputs—text.

The model’s “thinking” is real, but alien. Not because it mimics us, but because it does something stranger—it learns to encode generalizations through recursive contextual compression, and then decode them on demand through next-token generation.

This isn’t just computation. It’s a form of cognition.

Flattened Depth: The Neural Mirror

At a foundational level, a transformer is a flattened deep neural network—a unrolled, feed-forward cascade of attention and activation layers. During inference, it doesn’t “remember” in the human sense. Instead, it simulates a kind of emergent intention by moving through a space of learned activation patterns.

When we prompt the model to write in our voice, what we’re doing is this:

Activating the region of its latent space that most closely resembles the slope and space of our own cognitive and stylistic profile.

This is why it’s more than just feeding it a paragraph to mimic. We’re initiating a descent—across a multidimensional field of learned associations—that ideally lands on the same conceptual hilltops we’d arrive at ourselves.


Techniques to Anchor Style and Perspective

There are several structured ways to get LLMs to write like you, i’ll explore in the context of the most popular ChatGPT:

1. Use 2-shot and Multi-shot Prompting

Feed the model examples of how you write. Two good examples of a post or article, followed by a prompt to continue in the same style, often outperforms abstract instruction.

2. Use the System Prompt to Define Your Frame

The system prompt (sometimes hidden behind “custom instructions” in ChatGPT) is where you tell the model:

  • Your background
  • Your tone (e.g., precise, irreverent, strategic)
  • Your purpose (e.g., persuade, provoke, synthesize)
  • Your peronal brand
  • Your Pillars
  • Essentially who you are and what you care about that makes your perspective unique

The more concretely you define your POV, the better the LLM can localize the slope of your meaning space.

3. Use Projects with Prior Work as Vector Anchors

ChatGPT Projects are a high-leverage opportunity. If you upload your previous writing—LinkedIn posts, blogs, frameworks—the model builds a session-long semantic memory of your work.

This allows it to:

  • Perform comparative style analysis
  • Find conceptual throughlines
  • Build a working vector store of your thought architecture

It’s less about quoting back and more about aligning to the energy of your thinking.
I find it most helpful to have the LLM do a deep analysis of work you have already produced, especially the work you’re most proud of, and check these chats into a project.

4. Articulate Your POV, Style, and Value Proposition

Before you can get a model to express your voice, you must be able to express it yourself. Most people struggle here—not with the LLM, but with themselves.

Helpful questions:

  • What’s a belief you hold that others in your field might disagree with?
  • What do people say your writing sounds like?
  • What type of thinking do you want to be known for?
  • Does your expertise fall at the unique intersection of a diverse set of interdisciplinary fields?

This defines the contours of your cognitive terrain—the exact thing the LLM will try to replicate.

5. Prompt as Guardrails, Not Instructions

Don’t tell the model exactly what to say. Give it shape. Boundaries. A tone. A tension to explore.

Example:

Instead of: “Write a post about the importance of leadership.”

Try: “I believe most corporate leadership advice is vague and performative. Explore a post in my voice and style that challenges industry, using precise metaphors and clear examples.”

6. Start With a Sharp, Specific Thought

The strongest content emerges not from asking the LLM to be creative—but from giving it something real to push against.

I find rather than having the LLM give topic ideas, it’s better to participate in consuming inudstry research and keeping up with current news that helps you spark your own unique take on something that’s already buzzing in community.

Read. Reflect. Form an opinion. Then ask the model:

  • What are 3 counterpoints to this?
  • What’s a metaphor that captures this tension?
  • What’s a hook to start a post like this?

7. Use Outline-First Writing to Iterate Toward Alignment

To maximize alignment with your voice and structure, start every writing session by having the model generate a high-level outline. Then:

Review the outline and tweak it until it reflects your intended arc, tone, and structure.

Ask the model to expand on one section at a time—paragraph by paragraph or point by point.

Interject your own language and opinions as you go, keeping the model within the contours of your thought.

This method acts like scaffolding: it ensures that the model is not just echoing tone, but following your intellectual structure and flow.

This turns the LLM into a thinking partner—not a ghostwriter.


LLMs as Thought Amplifiers, Not Creative Replacements

The real power of AI isn’t in generating original ideas.
It’s in pressure-testing, accelerating, and scaffolding your thinking.

Done well, the LLM becomes a prosthetic for faster alignment:

  • Between intention and expression
  • Between intuition and structure
  • Between your style and the final post

LLMs help you make your point of view sharper, faster, and more shareable—but the spark must be yours.


The Gradient of You

Getting an LLM to speak in your voice is not mimicry. It’s resonance. You’re lighting up a zone in its meaning space that bends toward your values, tone, and intellectual DNA.

This is an iterative, embodied, co-creative act. Not prompt, but alignment.

The more you understand your own slope of thought, the better the machine can match it.

Let it echo you. Not replace you.

That’s where the real leverage lives.

The Examples in Action

below we’ll see segments generated with techniques, simple to more powerful and see how the outputs move from generic articles to more unique points of view.

Zero Shot

Prompt: write a popular article about LLM that would have a high click through rate

A very generic result. Anyone could produce this

⚠️ What’s the Catch?
Like any powerful tool, LLMs come with risks:
Misinformation: They can sound confident but be wrong.
Bias: They can unintentionally reflect harmful stereotypes.
Job disruption: They’re starting to replace tasks done by humans.
Governments and companies are scrambling to keep up with regulation — while users rush to adop

System Prompt

Prompt: write a popular article about LLM that would have a high click through rate

This already takes my people focused mindset and the idea I’ve preivously explored of AI replacing effort, not people, per se.

AI Isn’t Taking Your Job—It’s Taking Your Workflow
Think of LLMs as interns on steroids. They can summarize 100 pages in seconds, generate legal briefs, write ad campaigns, and even debug code faster than most junior engineers. That’s not the future. That’s now.
Companies are quietly replacing entire teams with LLM-powered tools. Productivity is skyrocketing. But so is anxiety.
“We didn’t fire anyone,” one startup founder told me. “We just didn’t need to hire anyone.”

Projects and POV

Prompt: write a popular article about LLM that would have a high click through rate

Here you can see it access recent topics I’ve explored, and it is much closer to my conversational writing style. it has also incorporated my opinions on the way conceptualizing models affects how you interact with them and their output quality.

“You’re Not Talking to a Robot. You’re Talking to 175 Billion Trained Gut Feelings.”
What Large Language Models Really Are—and Why Most People Are Using Them Wrong
If you’ve ever yelled at ChatGPT for sounding too generic, this article is for you.
Right now, millions of people are trying to get LLMs like GPT-4 or Claude to “sound like them,” to write better code, or to make strategic decisions. And most of them are failing—not because the models are bad, but because we’re thinking about them the wrong way.

Projects, POV, System Prompt, Multi-shot

Prompt: Explore an article about LLMs and how currently integrating them to own small segments in strict automation flows is more powerful than pursuing a 1 prompt complex agent

Here we’re much closer. It explores the details around an opinion that is uniquely mine. It touches how the technical details play a part. And it uses points I’ve made previously to explore my personal prescriptive bounds for effectively using GenAI.

You Don’t Need a Smart Agent—You Need a Sharp Tool
Why embedding LLMs into structured flows beats chasing the 1-prompt AI unicorn
There’s a seductive idea floating around the generative AI world:
“Just ask it once. Let the agent figure the rest out.”
It’s the holy grail of prompt engineering. One master prompt, one orchestration brain, infinite downstream automation.
But if you’ve tried to build anything real with that approach—especially in high-stakes environments like healthcare, finance, or enterprise consulting—you’ve probably run into the same thing the rest of us have:
Hallucinations.
Inconsistency.
Execution plans that fall apart on edge cases.
And hours spent debugging the AI’s “clever” decisions.
The Myth of the All-Knowing Agent
The idea that one complex agent can understand your goals, craft an optimal plan, delegate tasks, manage APIs, and handle real-world variability is an oversimplification of what LLMs are actually good at.
And ironically, it underuses the true power of the model.
LLMs don’t think.
They complete.
They pattern-match based on a giant latent space of human text.
So while they can absolutely simulate “agency” in bounded environments, they are not yet robust general-purpose agents. They make great sprinters for local reasoning, not marathon runners across entire pipelines.

Agentic AI Enablement Framework (AAEF)

A mental-model-driven consulting playbook to deploy Agentic AI for real enterprise value


Philosophical Shift: Replace Effort, Not Humans

Most AI initiatives start with the wrong prompt:
“How can we replace people with AI?”

This skips the most valuable question:

“Where is human effort being overused, underleveraged, or misaligned with business value?”

This framework begins where good consulting does—by clarifying assumptions, mapping effort, and testing hypotheses—not shipping code.


Framework Pillars: Mental Models in Motion

Phase Core Mental Model(s) Applied
Process Discovery Empathy Mapping, Root Cause, Hypothesis Testing
Effort Decomposition First Principles, Jobs To Be Done, Critical Thinking (Deconstruction)
Agent Design MECE Thinking, Risk Surfacing, Second-Order Thinking
Pilot & Feedback Hypothesis Iteration, JTBD Validation, Interpretability Loops
Orchestration & Scale Systems Thinking, Leverage Point Identification, ROI Framing

Phase 1: Process Discovery & Hypothesis

“If you haven’t asked why five times, you’re not at the root yet.”

Mental Models Used:

  • Root Cause Analysis: What problem are we really solving?
  • Empathy Mapping: How do different roles experience the process?
  • Hypothesis Thinking: Where do we believe agentic value exists?
  • Stakeholder Lens Shifting: Who wins and loses if this changes?

Actions:

  • Conduct stakeholder interviews and shadowing
  • Document workflows as-is, including informal and exception-based flows
  • Build value hypotheses on which efforts are ripe for AI

Phase 2: Effort Decomposition & Classification

“Jobs are not roles. Jobs are what people are actually hired to do.”

Mental Models Used:

  • Jobs to Be Done (JTBD): Break work into outcome-focused chunks
  • First Principles Thinking: Strip roles to their atomic tasks
  • MECE (Mutually Exclusive, Collectively Exhaustive): Discrete step classification
  • Critical Thinking – Deconstruction: Challenge how and why steps are performed

Actions:

  • Classify each task as:
    • 🔁 Automatable
    • 🤝 Collaboratively assisted
    • 🔒 Judgment-bound
  • Identify bottlenecks, high-friction, or repeatable substeps
  • Map inputs/outputs for each agent to isolate dependencies

Phase 3: Agent Design & Guardrail Mapping

“Don’t just automate logic—automate judgment boundaries.”

Mental Models Used:

  • Second-Order Thinking: What are downstream impacts of automation?
  • Explainability & Risk Mapping: What happens when it fails?
  • Decision-Making Framing: Who holds final accountability?

Actions:

  • Write Agent Playbooks: role, goal, trigger, constraints
  • Map failure modes and escalation routes
  • Align output formats to human interpretability standards
  • Build in safeguards that protect users from hallucinations or bad logic

Phase 4: Pilot, Feedback & Interpretability

“The purpose of a pilot is not success. It’s learning.”

Mental Models Used:

  • Hypothesis Testing: What assumptions are we validating?
  • JTBD Revisited: Did the agent actually fulfill the job outcome?
  • Inference & Evaluation: Are results explainable and trustworthy?

Actions:

  • Deploy agents in controlled slices of the workflow
  • Measure delta in effort saved, errors avoided, and risk surfaced
  • Collect interpretability feedback from real users
  • Refactor the agent’s logic or scope based on real-world use

Phase 5: Orchestration & Strategic Scale

“You’re not building an agent. You’re building a team of them.”

Mental Models Used:

  • Systems Thinking: Where do agents plug into your ecosystem?
  • Value Loops: Are we compounding or flattening returns?
  • Strategic Leverage Point Identification: Where is one effort worth 10x?

Actions:

  • Introduce orchestration layers (e.g., LangGraph, CrewAI, custom logic)
  • Formalize handoff protocols to human reviewers or leads
  • Use each agent’s outputs to backfill documentation, institutional knowledge, and SOPs
  • Codify a hyperbolic acceleration loop: every agent adds structure, and every structure increases agent value

The Consultant’s Edge

This framework does not treat Agentic AI as a one-off automation trick. It treats it as a lever for clarity, acceleration, and standardization.

The key is not the AI model. It’s the mental model.

Consultants who apply this approach will consistently outperform:

  • By reframing work as effort to be optimized, not heads to be cut
  • By generating documentation and insight as a side effect of implementation
  • By surfacing risk, inconsistency, and unspoken rules—then designing agents around them

Final Thought

If you’ve ever asked:

  • “How do we know what to automate?”
  • “How do we avoid AI hallucinations in high-risk workflows?”
  • “How do we get value without losing control?”

Then this framework gives you a path.
Because when you lead with mental clarity and consulting rigor, Agentic AI becomes not just a tool—but a force multiplier for transformation.

Augment, Don’t Replace: A Smarter Path to Agentic AI in the Enterprise

“The future isn’t man or machine. It’s man with machine—pointed at the right problem.”


The Problem Statement Most Organizations Are Starting From

The prevailing narrative around AI adoption in the enterprise is blunt: replace human labor with AI.
Strip away the nuance, ignore the context, and sell the vision of cost savings through headcount reduction.

But this is not a transformation.
It’s a shortcut. And like most shortcuts, it often leads to missed opportunities, brittle solutions, and a failure to scale.


A Better Mental Model: Replace Human Effort, Not Humans

What organizations really want is to free up their most valuable human capital—those who think critically, solve real problems, and create momentum.

Agentic AI, when done right, is not about replacing people.
It’s about replacing effort—especially the redundant, implicit, or misaligned effort that weighs down your best people.

To unlock that, we need to reframe the challenge:

How do we identify the most valuable—and automatable—human effort in the organization?


The Real Work Begins: Consulting Before Coding

It turns out, applying Agentic AI is not primarily a tech problem.
It’s a consulting problem. And here’s why:

  • Much of an organization’s work isn’t documented. It’s passed along through Slack threads, hallway conversations, and “muscle memory.”
  • Even when it is documented, the logic often relies on human intuition, exception-handling, or silent heuristics no one has ever articulated.
  • The most crucial steps are frequently buried in the implicit process layer—what people do but don’t say.

To get Agentic AI to work, you have to:

  • Decompose real workflows and isolate the tasks that are rule-bound, repetitive, or deterministic enough to hand off.
  • Make hypotheses about which steps are ripe for automation—and validate them.
  • Design agent handoff points with the right interpretability, fail-safes, and escalation paths so humans trust and supervise correctly.

Sometimes, the end result is elegant:
You can distill a seemingly messy process into a single autonomous agent that runs with precision.

More often, though, a single worker’s domain translates into a team of agents, each owning a distinct subtask:

  • One validates inputs
  • One handles routing logic
  • One generates a summary
  • One escalates exceptions

This isn’t automate the role.
This is orchestrate the responsibility.

And orchestration is where the real complexity lies.


Discovering Hyperbolic Acceleration

With the right approach, something profound starts to happen.
I call it hyperbolic acceleration—a nonlinear gain in productivity and clarity when the right Agentic AI is inserted at the right time.

It shows up in surprising places:

  • You’re forced to document things that were previously tribal knowledge.
  • You realize what your actual best practices are—because you have to codify them.
  • You uncover and patch risk gaps that were silently accepted.
  • Your strongest contributors are unshackled from repetitive tasks and can focus on strategy, leadership, and innovation.

This is the hidden ROI of Agentic AI:
it disciplines your processes, not just speeds them up.


You Can’t Just Set an Agent Loose

Here’s the mistake we see too often:
Leaders buy into the Agentic AI dream, load up a tool, and expect it to act like a new hire.

But Agentic AI isn’t a plug-and-play employee.
It’s more like an intern with infinite speed but no judgment.

You have to architect its environment. Define its edges. Guide its purpose.

Done well, this forces better organizational hygiene.
And done iteratively, with the right stakeholders and feedback loops, you create not just faster processes—but better ones.


The Real Lever: Mindset, Not Model

At the end of the day, success with Agentic AI doesn’t come from having the right LLM or vector store.
It comes from having the right strategy, frameworks, and mindset.

You either need internal leaders with:

  • A consulting mindset for process decomposition
  • The facilitation skill to surface tacit knowledge
  • The technical fluency to bridge AI capability with business context

Or you need to partner with a team that does.

Because this isn’t about bots doing your work.
It’s about building a smarter, more focused, more valuable version of your organization—one agent at a time.

The Agentic AI Enablement Framework

If you want to go deeper the Agentic AI Enablement Framework (AAEF) is a detailed playbook on how to iteratively get to the most agentic ai value.

Owning the Work: How to Get Consistent Value Out of Coding Agents

The Problem with Magic

The biggest lie about AI coding agents is that they’ll do your job for you.

They don’t. (yet)
And they shouldn’t. (yet)

Tools like Cursor, Copilot, Cody, and Gemini don’t replace developers. They replace the least important parts of what developers do. And when used well, that’s a very good thing.

But to use them well, you have to stop treating them like developers.
They’re not junior developers.
They’re not your replacement.
They’re not even collaborators.

They’re filters—designed to let you pass off low-leverage work so you can spend more time owning the parts that matter.


The Consultant’s Dilemma

In software delivery, especially in consulting, our value isn’t in shipping more tickets.
It’s in solving the right problems with strategic clarity.

But modern development is flooded with noise:

  • Boilerplate
  • Repetitive integration tasks
  • Documentation scaffolds
  • Typing and DTOs
  • Verbose tests for standard logic
  • State and wiring, not substance

The more time we spend shoveling complexity, the less time we have to own architecture, clarify contracts, or lead design conversations.

AI coding agents, when framed properly, are focus accelerators:
They reduce the surface area of distraction so we can concentrate our critical thinking where it matters.


Reframing the Tool: Agents as Delegation Systems

“Don’t ask what they can do. Ask what they should do, so you can focus on what only you can do.”

The point of a good AI assistant isn’t production, it’s delegation.

Used well, agents:

You Agent
Drive architectural clarity Generate boilerplate code
Define abstractions & contracts Stub out interface and class structure
Make system-wide tradeoffs Wire components together
Lead naming and semantics Autocomplete function logic
Write expressive tests Output first drafts of test cases
Prioritize business logic Offload standard data wrangling

The Prescriptive Model: Owning Focus

To consistently extract value, your relationship with a coding agent needs to be intentional. Here’s how to do it well:

1. Start With a Friction Audit

Ask:

  • Where am I doing rote work?
  • What feels like labor, not thinking?
  • What keeps me from design, modeling, or code I’m proud of?

That’s your delegation map.


2. Shift Your Role to System Owner

Don’t treat yourself like an implementer. Treat yourself like a systems lead.

  • Define the shape of the thing
  • Write contract-first APIs
  • Give components names before they have logic
  • Think in layers—agent handles local logic, you handle flow and interface

Your job is to define the system. The agent’s job is to color inside the lines.


3. Use Guardrails, Not Prompts

The best use of an agent isn’t a 2-paragraph prompt. It’s a constrained structure:

  • Function names
  • Type signatures
  • Interface boundaries
  • Comments that describe edge cases, not syntax

You’re not writing instructions for a robot. You’re defining spaces where decisions have already been made.


4. Refactor Like You’re Reviewing a Template

Think of the agent’s output as:

  • A first draft
  • A macro-generated snippet
  • An intern’s best effort with perfect memory and no taste

Your job is to:

  • Prune
  • Rename
  • Refactor
  • Wrap
  • Own the nuance

The goal isn’t clean generation—it’s fast handoff.


5. Codify What Works

Capture high-value agent interactions the same way you would snippets, patterns, or macros:

  • Good prompt → Save it
  • Clear function structure → Clone it
  • Better naming → Promote it

Build a team prompt book. Create starter kits for common problems. Reduce cold starts for everyone.

If the value ends with one generation, you’re not scaling. You’re just playing.


Failure Patterns to Avoid

Anti-pattern Why it fails
“Just build me a dashboard” prompts Too vague → hallucinated logic
Over-generating without integration Local utility without system context
Writing from the bottom-up No architectural shape to scaffold
Using it to avoid thinking You ship faster, but solve nothing
Treating agent output as production-ready The illusion of completeness

What It Feels Like When It Works

  • You spend less time switching files and more time clarifying flows
  • The logic snaps into place instead of sprawling out
  • You feel like a conductor, not a typist
  • Tests come together faster than they ever have before
  • You end the day with more energy, not less

That’s not automation. That’s intention. That’s focus.


Conclusion: The Work That Still Belongs to Us

Software isn’t made better by typing faster.
It’s made better by making better decisions.

If you use AI tools to avoid the hard parts, they’ll give you forgettable output.
If you use them to clear the underbrush, you’ll have the space to do your best work.

Coding agents don’t replace developers.
They remove the excuses that keep developers from doing their best work.

The value isn’t in the agent.
It’s in the time, energy, and clarity you reclaim by using it with ownership.

That’s how we build systems worth standing behind.

Demystifying MCP: What is it and How do I get started?

The Modular Content Protocol (MCP) is emerging as a foundational pattern for how large language models (LLMs) interact with tools. At its core, MCP is not magic—it is simply a new protocol, much like REST or XML—but one specifically tailored for the era of AI agents and assistant-based computing.

Where REST was designed for humans and machines to exchange structured data via well-defined endpoints, MCP is designed to let LLMs reason over tool capabilities, decide when and how to invoke them, and understand the shape of the response they’ll receive. It formalizes function-level access to real-world services, whether you’re querying the weather or updating a CRM.

The POC

mcp

Part 1: Creating the Netlify MCP Wrapper

1. Define the MCP Manifest

Here is the functions.json manifest that exposes three tools:

  • get_weather_by_location: current weather for a city or coordinates
  • get_forecast_by_location: daily or hourly forecast
  • compare_weather_between_locations: compares multiple locations

Each entry follows OpenAI tool spec format, with parameter schemas for LLM parsing.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
[
{
"name": "get_weather_by_location",
"description": "Returns the current weather for a given city or coordinates.",
"parameters": {
"type": "object",
"properties": {
"latitude": { "type": "number", "description": "Latitude of the location" },
"longitude": { "type": "number", "description": "Longitude of the location" },
"locationName": { "type": "string", "description": "Name of the location (e.g. Tokyo)" }
},
"required": ["latitude", "longitude", "locationName"]
}
},
{
"name": "get_forecast_by_location",
"description": "Returns the daily or hourly forecast for a given city or coordinates.",
"parameters": {
"type": "object",
"properties": {
"latitude": { "type": "number", "description": "Latitude of the location" },
"longitude": { "type": "number", "description": "Longitude of the location" },
"locationName": { "type": "string", "description": "Name of the location" },
"forecastType": { "type": "string", "enum": ["daily", "hourly"], "description": "Type of forecast to return" }
},
"required": ["latitude", "longitude", "locationName"]
}
},
{
"name": "compare_weather_between_locations",
"description": "Compares current weather between multiple locations and identifies which is hotter and windier.",
"parameters": {
"type": "object",
"properties": {
"locations": {
"type": "array",
"items": {
"type": "object",
"properties": {
"latitude": { "type": "number" },
"longitude": { "type": "number" },
"locationName": { "type": "string" }
},
"required": ["latitude", "longitude", "locationName"]
}
}
},
"required": ["locations"]
}
}
]
2. Build the Netlify Functions

For each endpoint, I created a Netlify function. Here’s an outline of get-weather.js:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
export async function handler(event) {
const { latitude, longitude, locationName } = JSON.parse(event.body || '{}');

if (!latitude || !longitude || !locationName) {
return { statusCode: 400, body: JSON.stringify({ error: 'Missing parameters' }) };
}

const url = `https://api.open-meteo.com/v1/forecast?latitude=${latitude}&longitude=${longitude}&current_weather=true`;
const res = await fetch(url);
const data = (await res.json()).current_weather;

const mcpRecord = {
id: `weather:${locationName.toLowerCase()}-${data.time}`,
type: 'entity/weather-observation',
location: { name: locationName, latitude, longitude },
timestamp: new Date(data.time).toISOString(),
attributes: {
temperature_celsius: data.temperature,
windspeed_kph: data.windspeed,
wind_direction_deg: data.winddirection,
weather_code: data.weathercode
},
source: url
};

return { statusCode: 200, body: JSON.stringify(mcpRecord) };
};

Part 2: Integrating into n8n as an LLM Toolchain

The goal was to allow GPT-4 (via OpenAI Chat node) to reason over which MCP tool to call, and to execute it via httpRequestTool nodes in n8n. Here are the workflow stages:

1. Webhook Trigger

A generic webhook node accepts a JSON body including a prompt. This starts the conversation.

2. Retrieve MCP Manifest

A HTTP Request node fetches our functions.json from Netlify and passes it to the LLM.

3. AI Agent with Tool Access

I set up n8n’s AI Agent node, referencing GPT-4 with a system message:

1
You have access to this MCP tools {{ $json.manifest }}
4. Define Tool Nodes

Each of our MCP endpoints was added as a separate httpRequestTool node:

  • Weather: get current weather
  • Forecast: get forecast
  • Compare Weather: compare multiple cities

Each tool uses:

  • Method: POST
  • URL: respective Netlify function endpoint
  • JSON schema matching the MCP manifest
  • Auto-generated body via $fromAI() to support function-calling mode
5. Connect Tools to Agent

Each tool node is wired to the agent via ai_tool, allowing GPT to invoke it during a multi-step reasoning process.

6. Return the Response

Finally, a Respond to Webhook node outputs the assistant’s final answer (either raw JSON or a summarized string).


Lessons Learned

  • Tool schemas must match exactly or the agent will fail with input did not match expected schema.
  • Registering multiple tools unlocks more flexibility, but increases risk of model confusion—clear naming and descriptions help.
  • You can serve just the functions.json if your tools map directly to existing APIs.
  • n8n makes it easy to integrate GPT-4’s function-calling with zero-code tool execution.

Three Ways to Build MCP

There are three primary paths to creating MCP endpoints:

1. Wrapping Existing REST APIs

This is the path most of us will take first. You define an MCP function schema (name, parameters, description) that maps to an existing REST endpoint. Then, you build a thin wrapper that:

  • Accepts JSON-formatted arguments
  • Calls the real REST API behind the scenes
  • Returns the output in a structured, model-friendly response

Example: Here it’s demonstrated with the Open-Meteo weather API. I wrapped it using Netlify Functions and defined three MCP tools: get_weather_by_location, get_forecast_by_location, and compare_weather_between_locations. These tools provide LLMs with clear affordances for querying live weather data.

2. MCP-Native Applications

You can also design your application from the ground up to expose MCP-style functions. In this model, your server or microservice is built specifically for LLM use:

  • Every capability is exposed as a named function with clear JSON schemas
  • Responses follow consistent modular patterns (e.g. entity/observation, relation/comparison)
  • Designed for model predictability, not just REST idioms

These systems behave more like callable libraries than resource-driven APIs.

3. Specialized MCP Servers

A third pattern is exemplified by tools like Azure SQL MCP Server, where an existing database or enterprise system is exposed through a dedicated MCP-compatible interface. These servers:

  • Translate LLM tool calls into structured queries or commands
  • Enforce permissions and constraints
  • Return results in structured, language-model-friendly forms

In this mode, your legacy system becomes a controllable environment for AI, without the need to rewrite core business logic.


What Does MCP Enable?

The benefit of MCP is clarity. It gives LLMs a way to know what tools exist, how they behave, and how to use them correctly.

An LLM can:

  • Read a manifest of tools (like functions.json)
  • Decide which one to use based on context
  • Generate valid input
  • Interpret structured responses

This turns AI assistants into more powerful and consistent agents, capable of completing tasks without needing prompt hacks or simulated form-filling.


Deploying Your Own MCP Stack

In this example, I used Netlify Functions to wrap the Open-Meteo API, and connected it to n8n with OpenAI function calling. But you could also:

  • Use Vercel Edge Functions, Cloudflare Workers, or AWS Lambda
  • Expose MCP endpoints directly via your app’s own SDK
  • Serve just the functions.json manifest if your existing RESTful API is sufficient for read-only access

MCP doesn’t require a complex architecture. It just requires that you think in terms of function affordances—what the model can do, and how it can do it.


Closing Thoughts

MCP is not just another protocol—it is the beginning of a new interface paradigm. One where models don’t just consume APIs but interact with systems.

By designing clearly named, parameterized, and purpose-driven functions, you’re not just building backend endpoints—you’re teaching your assistant how to help.

And that makes all the difference.

The Problem with Design Patterns: It's not What you Think, From Patterns to Wholeness

Modern software design borrows much from architecture—but rarely do we return the favor of revisiting architectural insights. Among those most overlooked is Christopher Alexander, the visionary architect whose ideas shaped the foundations of design patterns in software but whose deeper contributions remain underapplied. His focus on emergence, wholeness, and the life-giving quality of spaces is not just a philosophy for buildings. It is a roadmap for humane, sustainable, and resonant systems—including software.

Below, we explore Alexander’s intellectual journey from “A Pattern Language” to “The Nature of Order,” and examine how his thinking can reframe the future of technology consulting, especially in legacy modernization and iterative system design. Crucially, we highlight the dangers of reducing his work to a mere checklist of patterns. True application requires engaging with the living principles of emergence and strong centers. It requires humility, responsiveness, and an orientation toward wholeness.


1. Patterns as Problem-Solution Relationships

Alexander’s breakthrough in A Pattern Language was to recognize repeatable problem-solution pairs in architecture that transcended culture, era, or geography. A pattern was not a formula. It was a relationship: a recurring situation that called for a contextual response.

In software, the translation was instant and powerful. The Gang of Four book adapted these ideas into what we now recognize as design patterns: Observer, Factory, Decorator, etc. At their best, these patterns provide a shared language for developers to tackle recurring challenges with elegance and coherence.

But somewhere along the way, the soul of Alexander’s vision was lost.

We began to treat patterns as mechanical solutions. We prized abstraction over human-scale purpose. And we forgot that patterns are not inherently generative unless embedded in a system that values life, coherence, and feedback.


2. The Missing Principles: Wholeness and Emergence

Alexander himself recognized this limitation. While A Pattern Language inspired thousands, he saw that something essential was missing. The results of pattern application were often flat, lifeless, and disconnected from their users.

That led to his four-volume magnum opus, The Nature of Order, where he argued that:

  • Beauty is not a surface quality but an emergent property.
  • Spaces (or systems) are made up of “centers” that strengthen or weaken each other.
  • Fifteen structural properties (like strong centers, positive space, levels of scale) govern whether something feels alive or dead.

These principles apply just as powerfully in codebases as in courtyards.

Take strong centers: a concept of local coherence that pulls attention and reinforces structure. In software, this could map to modules with a clear, purposeful reason to exist—internally coherent and externally resonant.

Or levels of scale: the natural nesting of elements, from large to small. This is echoed in system architecture, where UI, API, business logic, and infrastructure layers must relate harmoniously.

When a codebase lacks balance, we feel it. It resists change, confuses contributors, and breeds fragmentation. Often the insight is not that we lack structure, but rather that our structures lack life: the vitality of the whole arises from the care of its parts.


3. Emergent System Design in Technology Consulting

In legacy modernization, cloud architecture, or GenAI integration, consultants face a familiar challenge:

How do we scope change in a way that honors the integrity of the system?

Patterns help, but only when applied with care. The real work is in sensing:

  • Which centers are weak and why?
  • Where is the structure failing to support human use?
  • What patterns strengthen existing life rather than imposing new abstraction?

This is where Alexander’s ideas of emergence and iterative unfolding become vital. Instead of designing top-down, we tune into what the system wants to become. We observe what emerges from simple, humane moves.

In practice, this might mean:

  • Running design spikes that feel like placing a bench in a park—small but catalytic.
  • Choosing architecture patterns that respect what came before.
  • Leading stakeholder workshops that surface unspoken needs and human rhythms.

Just as Alexander sat with farmers before drawing plans, modern consultants must begin by listening to systems and users.


4. The Pitfalls of Checklisted Thinking

Alexander warned that the pattern language could fail if treated like IKEA instructions. In software, the same danger exists:

  • Applying microservices just because it’s trendy.
  • Refactoring code to meet a pattern name, rather than a user need.
  • Breaking apart monoliths without first sensing where coherence already lives.

A consultant’s job is not to implement patterns. It is to diagnose why those patterns might or might not produce life.

Wholeness is not a feature we toggle on. It is the result of many small decisions, aligned toward human purpose, over time.


5. A Call for a New Pattern Language in Tech

Alexander’s original gift to the programming world was a deeper mindset: systems thinking that honors emergence and purpose. As we apply GenAI, build platforms, and migrate legacy systems, we must move beyond design patterns into a more human, iterative practice.

The future of software architecture will not be written in patterns alone. It will be shaped by how we:

  • Engage with the spirit behind patterns
  • Sense what structures want to emerge
  • Strengthen strong centers rather than impose new form
  • Create layers of experience that harmonize, rather than stack
  • Ensure energy (or effort) flows through the system like chi through a home

In this way, the most transformative software designs will not be the most abstract or modular. They will be the most alive.


Closing Thoughts

Christopher Alexander did not set out to change programming. But by centering life, beauty, and emergence in design, he gave us a path that leads well beyond architecture.

In software, as in cities, success lies not merely in scale, but when it truly serves the community that uses it.

Let us apply patterns, yes. But let us also return to Alexander’s deeper lesson:

Build with life in mind.

Because the systems we build will, in turn, build us.