mlx-serve lets Apple Silicon Mac users run Claude Code locally for free
mlx-serve is a lightweight, Zig-based server for Apple Silicon Macs that allows developers to run AI language models locally without relying on paid cloud APIs. It exposes OpenAI-, Anthropic-, and Ollama-compatible HTTP endpoints from a single binary, installable via Homebrew with no Python or Docker required. Users can redirect Claude Code to the local server by setting environment variables, enabling full functionality including streaming, tool calls, and thinking blocks at no cost. The server reportedly achieves over 35% faster decode speeds than LM Studio on Gemma 4 E4B 4-bit models. Additional features include a macOS menu-bar app, an isolated Linux sandbox environment, and support for image, video, and audio generation within the same server process.
This is an AI-generated summary. ShortSingh links to the original source for the complete article.
Discussion (0)
Log in to join the discussion and vote.
Log in