Developer cuts GPT-4 job-listing pipeline cost 63% after fixing rate limits and batch logic

·1 views

A developer built a production system that scores over 10,000 job listings daily using GPT-4 function calling, vector search, and a REST API. Early runs were costly and slow, with the first full-day batch taking 47 minutes and costing $86, while aggressive retry logic later caused a three-hour delay. Switching from single-chunk to multi-chunk extraction with focused schemas reduced structured-output errors from 12% to under 2%, at the cost of more API calls per listing. Choosing pgvector over Pinecone and OpenAI's smaller embedding model over the larger one cut monthly embedding costs from roughly $1,872 to $144. Adopting OpenAI's Batch API, which offers 50% off in exchange for deferred processing, brought the per-run cost down from $86 to $32, a 63% reduction.

Read the full story at DEV Community

This is an AI-generated summary. ShortSingh links to the original source for the complete article.

Discussion (0)

Three-Layer Testing Framework Proposed for Reliable AI Workflow Evaluation

A structured evaluation framework for LLM-based workflows has been outlined, addressing challenges like non-deterministic outputs and cross-step debugging complexity. The approach divides testing into three layers: unit tests validating subagent JSON schemas without real LLM calls, integration tests checking cross-phase data flow and routing logic, and end-to-end tests measuring full pipeline metrics like completion rate and gate trigger rate. Unit tests are recommended as the most numerous and fastest layer, while end-to-end tests are reserved for changes affecting the main pipeline. The framework also incorporates trace tracking via tools like Langfuse, enabling developers to monitor phase durations, token usage, and error details at each step. Key performance benchmarks suggested include a completion rate above 80% and a Phase 4 average round count below 2 for fully automated runs.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

Building an LLM Red-Team Suite Reveals That Judging Harm Matters More Than Breaking Models

A developer built a red-team test suite to fire adversarial prompts at a local LLM-backed application, aiming to measure how often attacks succeed and whether the outputs are genuinely harmful. Using NVIDIA's open-source tool garak, the suite initially reported a 100% Attack Success Rate, yet only about 2% of responses contained anything actionable or dangerous. Even a smarter, content-aware detector dropped the rate to 73%, but real harm in those flagged replies remained close to zero, exposing a critical flaw in detectors that score how a reply looks rather than what it actually contains. The project found that accurately classifying harm requires human review, since automated metrics alone can report bypasses on batches where nothing harmful was produced. The developer concluded that structuring reliable datasets, defining clear harm criteria, and keeping a human in the loop is the hardest and most important part of AI red-teaming.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

Model Context Protocol Emerges as Universal Standard for AI Agent Integration

Model Context Protocol (MCP) is an open standard designed to connect AI models to external tools and data sources without requiring custom integration code for each service. Before MCP, developers building autonomous AI agents had to write separate, model-specific logic for every tool — from GitHub to Slack to databases — making the process fragmented and difficult to maintain. MCP addresses this by acting as a universal connector, allowing any AI agent to plug into a compatible MCP server and immediately access its exposed capabilities, regardless of the underlying language model. A growing ecosystem of open-source MCP servers now covers popular platforms such as Jira, AWS, and local file systems, enabling faster and more secure agent deployment. The protocol is shifting the developer role from crafting prompts toward orchestrating networks of specialized AI agents with standardized tool access.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

7 Underrated VS Code Extensions That Can Boost Developer Productivity

A roundup of seven lesser-known Visual Studio Code extensions highlights tools that go beyond popular staples like Prettier and ESLint. Extensions such as Error Lens and Console Ninja bring inline error messages and console output directly into the editor, reducing the need to switch between tools. Others like Mintlify use AI to auto-generate code documentation, while CSS Peek lets developers view and edit styles by hovering over class names. A spell-checker extension helps maintain clean, professional codebases by flagging typos in variable names. WakaTime rounds out the list by tracking time spent across languages, projects, and files to help developers monitor their own productivity.

0 comments Read more at DEV Community