Another Minimum Viable RAG: Browser API with Distilbert

This browser prototype demonstrates how to:

  • Perform live web search using SerpAPI
  • Retrieve top search results and combine them into a contextual passage
  • Run a transformer-based question-answering model entirely in the browser
  • Extract answers from the retrieved text using Hugging Face’s transformers.js

The goal is to make RAG approachable by removing dependencies on server-based LLMs, proprietary models, or large-scale infrastructure.

The Mini App

Key Technologies

SerpAPI

SerpAPI is a real-time web search API that returns structured search results from engines like Google. For this app, it is used to:

  • Perform a live web query from a user prompt
  • Extract organic search snippets (titles + previews)
  • Simulate the “retrieval” portion of RAG using real web data

Because SerpAPI does not support CORS, we wrapped it in a Netlify Function proxy to securely obfuscate the API key and allow frontend use.

BrowserAPI What is it?
SerpAPI (used here) Google-accurate results + raw snippets
Tavily LLM-native summaries, structured output
Brave API Privacy and Google-alternative results

The Serp Function

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
// netlify/functions/serp.js
export async function handler(event) {

if (event.httpMethod === 'OPTIONS') {
return {
statusCode: 200,
headers: {
'Access-Control-Allow-Origin': '*',
'Access-Control-Allow-Headers': 'Content-Type',
'Access-Control-Allow-Methods': 'POST, OPTIONS',
},
body: '',
};
}


const query = event.body;
const apiKey = process.env.SERPAPI_KEY;

if (!apiKey) {
return { statusCode: 500, body: 'Missing API key' };
}

const url = `https://serpapi.com/search.json?q=${encodeURIComponent(query)}&api_key=${apiKey}&num=3`;
try {
const response = await fetch(url); // 👈 native fetch — no import needed
const data = await response.json();
return {
statusCode: 200,
headers: {
'Access-Control-Allow-Origin': '*',
'Access-Control-Allow-Headers': 'Content-Type',
'Access-Control-Allow-Methods': 'POST, OPTIONS',

},
body: JSON.stringify(data),
};
} catch (err) {
return {
statusCode: 500,
body: JSON.stringify({ error: err.message }),
};
}
}

Xenova/distilbert-base-cased-distilled-squad

This is a compact transformer model distilled from BERT and fine-tuned on the SQuAD v1.1 dataset, making it ideal for extractive question answering.

  • It does not generate new text — instead, it selects a span of text from a passage that best answers the question.
  • It runs entirely in-browser using Hugging Face’s transformers.js, a WebAssembly/WebGPU-powered library for client-side inference.
  • The QA functionality is built on the question-answering pipeline.

This model was selected for its speed, size, and open availability — ideal for demo and learning purposes.

1
2
3
4
5
6
7
8
9
10
try {
const qa = await pipeline('question-answering', 'Xenova/distilbert-base-cased-distilled-squad');
const result = await qa(query,resultsText);
output.textContent = result.answer;
status.textContent = '✅ Done!';
} catch (err) {
output.textContent = '❌ Failed to generate response.';
status.textContent = '❌ LLM error.';
console.error(err);
}

System Flow

  1. User enters a natural language question
  2. A request is sent to a Netlify Function which forwards it to SerpAPI
  3. The top 3 search results (title + snippet) are extracted and concatenated
  4. The question + text are passed into the browser QA model
  5. The model returns a highlighted extractive answer from the search content
1
2
3
4
5
6
7
8
9
[ User Input ]  

[ Netlify Function ]

[ SerpAPI → Top 3 Snippets ]

[ QA Pipeline (transformers.js in browser) ]

[ Extracted Answer ]

Learning Outcomes

  • Understand the difference between extractive and generative RAG
  • Learn how to interact with live search APIs like SerpAPI
  • See how transformer models can run 100% in-browser using ONNX/WebAssembly
  • Grasp how real-world RAG tools combine multiple subsystems: retrieval, context merging, and inference

Next Steps for Learners

If you want to expand this idea:

  • Swap in a generative model like phi-2 or t5-small using the text2text-generation pipeline
  • Add a source citation mechanism by preserving links to original search snippets
  • Add a local document search using MiniSearch or transformers.js embeddings like our previous guide Minimum Viable RAG: Embeddings and Vector Search in the Browser with MiniLM
  • Tune stop tokens, temperature, and prompt structure for better output control

Conclusion

This project is a teaching tool, not a product. It illustrates how Retrieval-Augmented Generation can work at a small scale, using modern browser APIs, public search, and efficient transformer models.

It empowers new ML engineers, students, and tinkerers to understand how RAG systems are wired — and gives them a sandbox to explore.