Developer Ditches Gemini API to Self-Host Open-Source LLM Across Two Production Apps
A developer has replaced Google's Gemini Flash API with a self-hosted language model to power two production applications: a portfolio terminal and an email-drafting tool called PayChasers. The switch was driven by cost concerns, privacy considerations around sending client data to third parties, and a desire to treat AI as shared infrastructure rather than a per-call expense. After failing to secure a free Oracle Cloud ARM instance despite over 200 automated attempts, the developer routed production traffic through a Cloudflare Tunnel to a Mac mini at home, with no open ports required. The Oracle instance eventually came through and was repurposed as an always-on fallback, creating a resilience chain that keeps both apps running when the primary hardware is unavailable. The setup now serves multiple products from a single self-hosted inference server, eliminating recurring API costs beyond electricity.
This is an AI-generated summary. ShortSingh links to the original source for the complete article.
Discussion (0)
Log in to join the discussion and vote.
Log in