Another Minimum Viable RAG: Browser API with Distilbert

June 26, 2025 AI 本文总阅读量次

This browser prototype demonstrates how to:

Perform live web search using SerpAPI
Retrieve top search results and combine them into a contextual passage
Run a transformer-based question-answering model entirely in the browser
Extract answers from the retrieved text using Hugging Face’s transformers.js

The goal is to make RAG approachable by removing dependencies on server-based LLMs, proprietary models, or large-scale infrastructure.

The Mini App

Key Technologies

SerpAPI

SerpAPI is a real-time web search API that returns structured search results from engines like Google. For this app, it is used to:

Perform a live web query from a user prompt
Extract organic search snippets (titles + previews)
Simulate the “retrieval” portion of RAG using real web data

Because SerpAPI does not support CORS, we wrapped it in a Netlify Function proxy to securely obfuscate the API key and allow frontend use.

BrowserAPI	What is it?
SerpAPI (used here)	Google-accurate results + raw snippets
Tavily	LLM-native summaries, structured output
Brave API	Privacy and Google-alternative results

The Serp Function

// netlify/functions/serp.js
export async function handler(event) {

if (event.httpMethod === 'OPTIONS') {
    return {
      statusCode: 200,
      headers: {
        'Access-Control-Allow-Origin': '*',
        'Access-Control-Allow-Headers': 'Content-Type',
        'Access-Control-Allow-Methods': 'POST, OPTIONS',
      },
      body: '',
    };
  }


  const query = event.body;
  const apiKey = process.env.SERPAPI_KEY;

  if (!apiKey) {
    return { statusCode: 500, body: 'Missing API key' };
  }

  const url = `https://serpapi.com/search.json?q=${encodeURIComponent(query)}&api_key=${apiKey}&num=3`;
  try {
    const response = await fetch(url); // 👈 native fetch — no import needed
    const data = await response.json();
    return {
      statusCode: 200,
      headers: {
        'Access-Control-Allow-Origin': '*',
        'Access-Control-Allow-Headers': 'Content-Type',
        'Access-Control-Allow-Methods': 'POST, OPTIONS',

      },
      body: JSON.stringify(data),
    };
  } catch (err) {
    return {
      statusCode: 500,
      body: JSON.stringify({ error: err.message }),
    };
  }
}

`Xenova/distilbert-base-cased-distilled-squad`

This is a compact transformer model distilled from BERT and fine-tuned on the SQuAD v1.1 dataset, making it ideal for extractive question answering.

It does not generate new text — instead, it selects a span of text from a passage that best answers the question.
It runs entirely in-browser using Hugging Face’s transformers.js, a WebAssembly/WebGPU-powered library for client-side inference.
The QA functionality is built on the question-answering pipeline.

This model was selected for its speed, size, and open availability — ideal for demo and learning purposes.

try {
  const qa = await pipeline('question-answering', 'Xenova/distilbert-base-cased-distilled-squad');
  const result = await qa(query,resultsText);
  output.textContent = result.answer;
  status.textContent = '✅ Done!';
} catch (err) {
  output.textContent = '❌ Failed to generate response.';
  status.textContent = '❌ LLM error.';
  console.error(err);
}

System Flow

User enters a natural language question
A request is sent to a Netlify Function which forwards it to SerpAPI
The top 3 search results (title + snippet) are extracted and concatenated
The question + text are passed into the browser QA model
The model returns a highlighted extractive answer from the search content

[ User Input ]  
      ↓  
[ Netlify Function ]  
      ↓  
[ SerpAPI → Top 3 Snippets ]  
      ↓  
[ QA Pipeline (transformers.js in browser) ]  
      ↓  
[ Extracted Answer ]

Learning Outcomes

Understand the difference between extractive and generative RAG
Learn how to interact with live search APIs like SerpAPI
See how transformer models can run 100% in-browser using ONNX/WebAssembly
Grasp how real-world RAG tools combine multiple subsystems: retrieval, context merging, and inference

Next Steps for Learners

If you want to expand this idea:

Swap in a generative model like phi-2 or t5-small using the text2text-generation pipeline
Add a source citation mechanism by preserving links to original search snippets
Add a local document search using MiniSearch or transformers.js embeddings like our previous guide Minimum Viable RAG: Embeddings and Vector Search in the Browser with MiniLM
Tune stop tokens, temperature, and prompt structure for better output control

Conclusion

This project is a teaching tool, not a product. It illustrates how Retrieval-Augmented Generation can work at a small scale, using modern browser APIs, public search, and efficient transformer models.

It empowers new ML engineers, students, and tinkerers to understand how RAG systems are wired — and gives them a sandbox to explore.