How Terraform Drift Silently Breaks Infrastructure and How to Manage It

·1 views

Terraform drift occurs when real cloud infrastructure diverges from what Terraform's state file records, typically due to manual console edits, auto-scaling adjustments, or changes made by other tools. Common scenarios include emergency hotfixes applied directly in the cloud that get silently reverted on the next terraform apply, and cross-tool modifications where separate platforms alter resources Terraform believes it controls. Drift is particularly dangerous because it makes terraform plan output unreliable, leaving reviewers unable to distinguish intentional changes from unintended reversions. Security-sensitive resources face the highest risk, as manually altered IAM policies or security groups can represent compliance violations that persist undetected until an apply overwrites them. Running terraform plan with detailed exit codes is one of the simplest first steps teams can take to surface unexpected infrastructure differences.

Read the full story at DEV Community

This is an AI-generated summary. ShortSingh links to the original source for the complete article.

Discussion (0)

Hybrid LLM-SLM Architecture Could Solve the Rising Cost Problem in AI Agents

AI agents are expensive to run because a single task often requires dozens of model calls, each hitting a costly frontier large language model. Experts argue that a smarter approach is to reserve powerful LLMs only for complex reasoning tasks like planning and judgment, while delegating repetitive work such as formatting, routing, and validation to smaller, cheaper models. Desktop agents offer an additional advantage by leveraging local compute for routine steps, reducing reliance on cloud-based token billing. Over time, agent systems can analyze usage traces to identify repetitive patterns and distill them into fine-tuned small models, making operations progressively cheaper. A recently published paper titled 'Small Language Models are the Future of Agentic AI' supports this hybrid compute strategy as a path to sustainable AI agent economics.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

How to Correctly Size ClickHouse for High-Concurrency User-Facing Analytics

ClickHouse has no fixed architectural ceiling on concurrent queries, with its server-level concurrency limit defaulting to unlimited and ClickHouse Cloud set to 1,000 per replica by default — both configurable. Sustainable concurrency is defined as the number of simultaneous queries that meet a deployment's latency targets for a specific workload, not a hard engine cap. To size accurately, teams must first translate active user counts and dashboard interactions into peak queries per second and simultaneous query estimates, since user counts alone are misleading. Benchmarking under production-like conditions — using a representative query mix, real ingestion load, and realistic cache state — is essential before configuring resource limits and admission controls. When per-replica capacity is insufficient, adding replicas is the recommended path to meeting throughput and availability requirements.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

How to Correctly Configure Ceph Storage for Proxmox on Dedicated Hardware

A technical guide from DEV Community outlines best practices for deploying a hyper-converged Proxmox VE and Ceph storage cluster on dedicated hardware for production environments. The guide warns against four common mistakes: using consumer SSDs without power-loss protection, leaving hardware RAID enabled instead of switching controllers to HBA/IT Mode, running all traffic over a single network interface, and reducing replica settings to gain usable space at the cost of data safety. A minimum of three identical physical nodes is recommended, each equipped with enterprise SSDs or NVMe drives, sufficient RAM, and at least two 10GbE network interfaces. Strict network isolation is emphasized, with separate physical links advised for Corosync heartbeats, VM traffic, and Ceph replication to prevent cluster instability during recovery events. Additional optimizations such as configuring Jumbo Frames with MTU 9000 across all Ceph-dedicated interfaces are also recommended for maximum throughput.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

Tutorial: Build a production-ready AI agent using LangChain.js and NestJS

A developer from SOM-OS has published a hands-on tutorial detailing how to integrate an AI agent into a NestJS application using LangChain.js. The architecture routes user requests through a NestJS controller into a BullMQ queue, where a worker processes each job asynchronously. A LangChain agent powered by GPT-4o handles the requests using Zod-typed tools, while conversation history is persisted in PostgreSQL and memory is cached via Redis. The guide includes full code snippets covering module setup, queue configuration, and agent execution. According to the author, this is the same architecture used in their own production systems rather than a purely theoretical exercise.

0 comments Read more at DEV Community