Developer Builds Fully Offline RAG Agent Using LangGraph, Ollama, and Embedded Qdrant

·1 views

A developer has demonstrated how to run a complete Retrieval-Augmented Generation (RAG) agent entirely offline on a laptop, requiring no API keys, no Docker, and no cloud services. The setup uses Ollama to serve two local models — Qwen3.5:9b for chat and bge-m3 for embeddings — alongside an embedded Qdrant vector store that persists data to a local directory. A provider-swap architecture built in an earlier project phase allows switching between local and cloud backends by changing a single config variable, without modifying application code. The ingestion pipeline automatically detects the embedding dimension at runtime, ensuring the vector collection is created with the correct size regardless of which provider is active. In a test run, five markdown documents were processed into 53 chunks and stored as 1024-dimensional vectors using the fully local stack.

Read the full story at DEV Community

This is an AI-generated summary. ShortSingh links to the original source for the complete article.

Discussion (0)

How the useDebounce Hook Fixes Common React Debouncing Mistakes

When users type in a search box, React components can fire an API request on every keystroke, generating redundant and stale calls. A common workaround is writing debounce logic manually with setTimeout inside components, but this approach introduces bugs like memory leaks on unmount, stale closures, and scattered duplicate code. The useDebounce hook from @reactuses/core addresses all three issues by wrapping lodash.debounce internally, handling edge cases like leading and trailing execution. It works by maintaining two separate values: a fast-updating one bound to the UI input, and a debounced one used to trigger side effects only after typing pauses. This pattern keeps the input responsive while reducing API calls to one per typing pause rather than one per keystroke.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

smolagents Enables Python-Based AI Agents But Demands Clear Safety Boundaries

smolagents is an open-source Python library by Hugging Face that lets developers build AI agents in minimal code, with a key feature being 'CodeAgent', which expresses actions as executable Python rather than JSON or plain-text tool calls. This design allows agents to perform complex tasks involving loops, conditionals, and tool composition, but also raises the stakes if execution boundaries are not properly defined. The library integrates with a wide range of model providers, tool sources like MCP servers and LangChain, and optional sandboxed environments such as Docker, E2B, and Modal. Security experts and the Doramagic project both advise a staged onboarding approach: starting with no-tool agents, then adding read-only tools, and explicitly deciding the execution environment before granting real system access. The core safety question is not whether the package installs correctly, but whether the host environment, tool permissions, and sandbox policies are properly configured before deployment.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

Has anyone used DeepSeek? Is it really good?

0 comments Read more at DEV Community

ProgrammingDEV Community ·

Seoul Developer Builds Self-Reinforcing K-pop Music Pipeline on OCI Free Tier

A Seoul-based backend developer has built k-cosmos, a web-based 3D music space that maps K-pop tracks using 768-dimensional vector embeddings, after finding no structured K-pop metadata or emotional tag datasets publicly available. The self-reinforcing data pipeline runs on Oracle Cloud's free tier and uses Spring Boot with pgvector to continuously enrich its own music database. To prevent database connection exhaustion, the developer split external API calls and embedding generation into three decoupled transaction phases, ensuring heavy network I/O occurs outside active database connections. A two-stage SQL window function enforces artist diversity in recommendations, preventing any single artist's large discography from dominating the suggestion space. Budget controls randomize and flatten the processing queue nightly to evenly distribute API quota usage and avoid hitting free-tier LLM limits.

0 comments Read more at DEV Community