Mistral and open-source MinerU race to make PDFs readable for AI
French AI company Mistral launched an updated hosted document-reading service on June 25, 2026, claiming state-of-the-art accuracy in converting complex PDFs into clean, structured text. Around the same time, the open-source project MinerU gained significant traction on GitHub by offering a self-hosted, free alternative that processes PDFs and office files into AI-ready formats. Both tools tackle document intelligence, the process of extracting properly ordered, structured text from scanned contracts, multi-column papers, and table-heavy invoices that standard text extraction cannot handle. The quality of this conversion matters because AI systems built on top of poorly parsed documents will produce unreliable outputs, with errors occurring invisibly before any language model is even involved. The two tools represent a broader industry tension between convenient, paid cloud services and free, privacy-preserving tools that organisations run on their own infrastructure.
This is an AI-generated summary. ShortSingh links to the original source for the complete article.
Discussion (0)
Log in to join the discussion and vote.
Log in