SShortSingh.
Back to feed

Apache Iceberg, Polaris, and DataFusion push spec and reliability work in busy week

0
·1 views

Between June 24 and July 1, 2026, several Apache open lakehouse projects focused on correctness and standardization rather than new features. Iceberg held two significant votes: one to adopt a shared expressions spec that defines how filters and data transformations behave across implementations, and another to add a specific-name field to the UDF spec so catalogs can reference exact function versions. The expressions vote drew broad community support, with binding and non-binding approvals from multiple contributors who highlighted its potential to unlock new use cases. Polaris worked on multi-database catalog support, welcomed a new committer, and rejected a release candidate for legitimate technical reasons. DataFusion shipped a clean release of its Python bindings, while Arrow rebuilt its benchmarking service and Parquet debated how to handle versioning as features outpace release cycles.

Read the full story at DEV Community

This is an AI-generated summary. ShortSingh links to the original source for the complete article.

Discussion (0)

Log in to join the discussion and vote.

Log in

Related stories

0
ProgrammingDEV Community ·

How AdaBoost Turns Weak Decision Stumps Into a Powerful Classifier

AdaBoost is a machine learning boosting algorithm that combines hundreds of simple, near-random classifiers called decision stumps to build a highly accurate ensemble model. Each stump makes just one binary split on one feature, performing only marginally better than random guessing on its own. The algorithm assigns a weight to every training point, increasing the weight of misclassified examples after each round so subsequent stumps focus on the hardest cases. Each stump's contribution to the final vote is scaled by a confidence value called alpha, calculated from its weighted error, ensuring accurate stumps dominate and poor ones are discounted or flipped. This iterative reweighting process is mathematically equivalent to gradient descent on an exponential loss function, which guarantees that training error decreases with each added stump.

0
ProgrammingDEV Community ·

Developer builds cron expression explainer tool, shares key parsing lessons

A developer who spent years copying cron expressions from Stack Overflow without truly understanding them decided to build a plain-English explainer tool from scratch. The tool parses any standard five-field cron expression, describes it in plain English, and displays the next five scheduled run times — all in roughly 50 lines of logic with no external libraries. Through the project, the developer documented core cron rules, including field order, supported operators like commas, hyphens, and slashes, and the acceptance of three-letter month and weekday names. One notable discovery was that day-of-month and day-of-week fields are OR'd rather than AND'd, meaning a job runs when either condition matches, not both simultaneously. The writeup aims to help other developers move beyond copy-pasting and actually read and reason through cron syntax themselves.

0
ProgrammingDEV Community ·

Graph of Thoughts lets AI merge reasoning branches, surpassing Tree of Thoughts limits

Graph of Thoughts (GoT) is an AI reasoning framework that extends the Tree of Thoughts approach by allowing reasoning branches to merge rather than forcing a single path to be selected. In Tree of Thoughts, each node has exactly one parent, meaning partial solutions developed on separate branches cannot be combined, and useful insights from discarded branches are lost. GoT reframes reasoning as a directed graph where each node represents a partial solution and edges can connect multiple parent nodes to a single child, enabling aggregation of the best elements from different branches. Key operations include generating diverse sub-thoughts, scoring them objectively, merging multiple partial answers into one improved solution, and refining results through feedback loops. A merge-sort demonstration illustrates how two branches each sorted at 66% accuracy can be combined into a fully correct result, a score no single branch could have achieved on its own.

0
ProgrammingDEV Community ·

Nextcloud vs Immich: A Technical Comparison of Two Self-Hosting Solutions

Nextcloud and Immich are both open-source self-hosting platforms, but they serve distinct purposes: Nextcloud focuses on broad file synchronization, sharing, and collaboration, while Immich is purpose-built for managing photo and video libraries. In performance tests, Immich uploaded a 5 GB video file in roughly 1.5 minutes compared to Nextcloud's 3 minutes, attributed to Immich's stream-oriented design and lighter Go-based server stack. Nextcloud supports flexible storage backends such as NFS, S3, and RAID arrays via an External Storage plugin, whereas Immich stores media metadata in PostgreSQL, which can raise backup complexity as the database scales. On the security front, Nextcloud provides robust options including two-factor authentication, fail2ban integration, and a dedicated hardening guide, while Immich currently lacks comparable built-in security features. The choice between the two ultimately depends on use case: Nextcloud suits general-purpose file management and team collaboration, while Immich is better suited for media-centric workflows requiring fast uploads and automatic tagging.

Apache Iceberg, Polaris, and DataFusion push spec and reliability work in busy week · ShortSingh