This browser prototype demonstrates how to:
- Perform live web search using SerpAPI
- Retrieve top search results and combine them into a contextual passage
- Run a transformer-based question-answering model entirely in the browser
- Extract answers from the retrieved text using Hugging Face’s
transformers.js
The goal is to make RAG approachable by removing dependencies on server-based LLMs, proprietary models, or large-scale infrastructure.
The Mini App
Key Technologies
SerpAPI
SerpAPI is a real-time web search API that returns structured search results from engines like Google. For this app, it is used to:
- Perform a live web query from a user prompt
- Extract organic search snippets (titles + previews)
- Simulate the “retrieval” portion of RAG using real web data
Because SerpAPI does not support CORS, we wrapped it in a Netlify Function proxy to securely obfuscate the API key and allow frontend use.
BrowserAPI | What is it? |
---|---|
SerpAPI (used here) | Google-accurate results + raw snippets |
Tavily | LLM-native summaries, structured output |
Brave API | Privacy and Google-alternative results |
The Serp Function
1 | // netlify/functions/serp.js |
Xenova/distilbert-base-cased-distilled-squad
This is a compact transformer model distilled from BERT and fine-tuned on the SQuAD v1.1 dataset, making it ideal for extractive question answering.
- It does not generate new text — instead, it selects a span of text from a passage that best answers the question.
- It runs entirely in-browser using Hugging Face’s
transformers.js
, a WebAssembly/WebGPU-powered library for client-side inference. - The QA functionality is built on the
question-answering
pipeline.
This model was selected for its speed, size, and open availability — ideal for demo and learning purposes.
1 | try { |
System Flow
- User enters a natural language question
- A request is sent to a Netlify Function which forwards it to SerpAPI
- The top 3 search results (title + snippet) are extracted and concatenated
- The question + text are passed into the browser QA model
- The model returns a highlighted extractive answer from the search content
1 | [ User Input ] |
Learning Outcomes
- Understand the difference between extractive and generative RAG
- Learn how to interact with live search APIs like SerpAPI
- See how transformer models can run 100% in-browser using ONNX/WebAssembly
- Grasp how real-world RAG tools combine multiple subsystems: retrieval, context merging, and inference
Next Steps for Learners
If you want to expand this idea:
- Swap in a generative model like
phi-2
ort5-small
using thetext2text-generation
pipeline - Add a source citation mechanism by preserving links to original search snippets
- Add a local document search using
MiniSearch
ortransformers.js
embeddings like our previous guide Minimum Viable RAG: Embeddings and Vector Search in the Browser with MiniLM - Tune stop tokens, temperature, and prompt structure for better output control
Conclusion
This project is a teaching tool, not a product. It illustrates how Retrieval-Augmented Generation can work at a small scale, using modern browser APIs, public search, and efficient transformer models.
It empowers new ML engineers, students, and tinkerers to understand how RAG systems are wired — and gives them a sandbox to explore.