Developer Builds Fully Offline RAG Agent Using LangGraph, Ollama, and Embedded Qdrant
A developer has demonstrated how to run a complete Retrieval-Augmented Generation (RAG) agent entirely offline on a laptop, requiring no API keys, no Docker, and no cloud services. The setup uses Ollama to serve two local models — Qwen3.5:9b for chat and bge-m3 for embeddings — alongside an embedded Qdrant vector store that persists data to a local directory. A provider-swap architecture built in an earlier project phase allows switching between local and cloud backends by changing a single config variable, without modifying application code. The ingestion pipeline automatically detects the embedding dimension at runtime, ensuring the vector collection is created with the correct size regardless of which provider is active. In a test run, five markdown documents were processed into 53 chunks and stored as 1024-dimensional vectors using the fully local stack.
This is an AI-generated summary. ShortSingh links to the original source for the complete article.
Discussion (0)
Log in to join the discussion and vote.
Log in