Home » Mistral OCR 4: The Self-Hosted Document Recognition Model Released June 2026

Mistral OCR 4: The Self-Hosted Document Recognition Model Released June 2026

1 day agoby 18 min read

What is Mistral OCR 4: the fourth-generation optical character recognition model released June 23 2026 by Paris-headquartered Mistral AI as the successor to Mistral OCR 3 from December 2025 priced at $4 per 1000 pages through the API or $2 per 1000 pages through the Batch API supporting 170 languages across 10 language groups returning bounding boxes typed-block classification (titles tables equations signatures) and inline confidence scores accepting PDF DOC PPT and OpenDocument inputs and uniquely positioned in the market through its single-container self-hosted deployment option that no major US hyperscaler OCR API (Google Document AI AWS Textract Azure Document Intelligence) offers giving Mistral the EU data-sovereignty angle that anchors its broader sovereign-AI strategy as it reportedly negotiates a €3 billion funding round at a €20 billion valuation.

Mistral AI released the fourth generation of its OCR product yesterday, June 23, 2026. The product is called Mistral OCR 4 and it succeeds Mistral OCR 3, which had shipped in December 2025. The new release is positioned as a meaningful capability upgrade with broader language support, structured outputs that go well beyond raw text extraction, and improved benchmark performance. The release also doubles down on what has become Mistral’s distinctive differentiator in the optical character recognition market: a single self-hosted container deployment option that the major US hyperscaler OCR APIs do not offer.

This piece is a one-on-one explainer on Mistral OCR 4. We cover what the product is, what it does that base OCR products do not, the technical specifications, the pricing and availability, the deployment options that matter most for enterprise buyers, the benchmark results Mistral published with the launch, the competitive context (Google Document AI, AWS Textract, Azure Document Intelligence, open-source alternatives), the EU sovereign-AI positioning that the launch ties into, and the practical guidance for organizations considering whether to adopt Mistral OCR 4. The angle is operational and decision-supporting rather than deep-technical.

The short version is that Mistral OCR 4 is a document-processing model that turns PDFs, Word documents, slide decks, and OpenDocument files into structured representations that downstream applications can consume programmatically. The output is not just extracted text. It includes bounding boxes for layout, typed-block classification that identifies titles, tables, equations, signatures, and other document elements, and confidence scores for each extraction. The model is priced at $4 per 1,000 pages through the standard API and $2 per 1,000 pages through the Batch API. The key operational differentiator is that the model can be deployed as a single self-hosted container on customer infrastructure, which is the option no US hyperscaler offers and which makes Mistral OCR 4 the natural choice for organizations with data-residency or air-gap requirements.

What Mistral OCR 4 actually does

The product solves the document-processing problem that has historically been one of the more frustrating areas of enterprise computing. Documents arrive in formats designed for human reading (PDFs, scans, Word documents, slide decks) and downstream applications need them in formats designed for machine processing (structured records, database rows, indexed search). The translation between the two has traditionally required either heavy manual transcription, brittle template-based extraction systems, or older OCR pipelines that produced raw text without preserving the structural relationships that gave the original document its meaning.

Mistral OCR 4 produces a structured output that preserves what the original document looked like and what it meant. For a typical financial report uploaded to the API, the model returns: the document’s overall layout structure (single-column body with floating tables), the typed identification of each block (title at the top, executive summary paragraph below it, three tables in the middle section, a chart caption near the bottom), the contents of each block (the actual text, the table cell values, the inferred chart data), the bounding boxes that locate each block on each page, and a confidence score per block. Downstream applications can use the structured output to do things that raw OCR text could not support: populating a database with the table contents while preserving the row-and-column relationships, generating a search index that distinguishes between the document’s title and its body, producing a redacted version that masks specific block types, or feeding the structured representation to a downstream LLM for summarization or question-answering.

The structural understanding extends across document types. The model handles printed documents (the easy case), handwritten content (the historically hard case), forms with field-and-value pairs (preserving the field-to-value mappings), tables of various complexity (including nested headers and multi-row cells), equations rendered in LaTeX-style mathematical notation, and signature blocks (identified as signatures without attempting to extract their content as text). For multi-page documents, the model tracks structural relationships across pages, so a table that spans multiple pages is preserved as a single logical table rather than two disconnected fragments.

The model supports 170 languages organized across 10 language groups. The language coverage is broader than the previous generation and includes substantial improvements for languages that have historically been hard for OCR systems: right-to-left scripts (Arabic, Hebrew, Urdu), East Asian languages with their distinct character handling (Chinese, Japanese, Korean), Indic scripts, and a long tail of less-common languages that benefit from the unified model’s transfer learning across language families.

How it differs from base text-extraction OCR

The traditional OCR pipeline produced text. The user sent a document, the OCR system returned the extracted text, and downstream applications did whatever post-processing was needed to make the text useful. This pattern worked for narrow use cases (extracting text from screenshots, indexing the text content of scanned books) and broke down for almost everything else.

The shift that Mistral OCR 4 represents is from text extraction to structural extraction. The model treats the document as a structured object with semantic content rather than as an image to be transcribed. The structural extraction is qualitatively different from text extraction in three ways.

The first is that the structural extraction preserves relationships. A table with five columns is returned as a table with five columns, not as text with the columns flattened into a single stream. A form with field labels and values is returned with the field-value mappings explicit, not as a sequence of labels followed by a sequence of values that downstream applications have to re-pair.

The second is that the structural extraction is typed. The model identifies what each piece of content is (heading, body paragraph, list item, table cell, caption, signature, equation, footnote) rather than treating every piece of text as undifferentiated content. The typing lets downstream applications make decisions about how to process different content types differently.

The third is that the structural extraction is confidence-scored. Each piece of extracted content carries a confidence score that downstream applications can use to decide whether to accept the extraction, route for human review, or fail closed. The confidence scoring is the difference between an extraction system that requires human verification of every output and one that can be trusted to handle most documents autonomously.

The combination of these three properties is what makes Mistral OCR 4 useful for enterprise document processing pipelines that need to run autonomously. Traditional text OCR could not be trusted to handle complex documents at scale because the output required substantial post-processing to be useful. Structural OCR produces output that downstream applications can consume directly.

Pricing and availability

The API pricing is $4 per 1,000 pages for the standard real-time API and $2 per 1,000 pages for the Batch API. The Batch API is the right choice for workloads that do not need immediate results: a 24-hour turnaround is the standard, and the 50 percent discount is meaningful for high-volume document processing. The pricing applies per page rather than per document or per character, which simplifies cost estimation for variable document sizes.

The pricing is higher than the previous generation. Mistral OCR 3 launched at $2 per 1,000 pages, so OCR 4 is 2x more expensive at the same access tier. Mistral’s published explanation is that the capability improvements (broader language support, better structural extraction, higher accuracy) justify the higher rate and that customers who need the prior generation’s pricing can continue to use OCR 3 at its existing rate for at least the next 12 months.

Distribution channels include the Mistral API directly, Mistral Studio (the web UI for interactive document processing), Amazon SageMaker (for AWS-resident workloads), Microsoft Foundry (for Azure-resident workloads), and an upcoming integration with Snowflake’s Parse Document feature. The breadth of distribution is part of Mistral’s customer-facing pitch: customers can use OCR 4 through whatever existing cloud relationship they have without committing to a Mistral-specific procurement path.

The self-hosted deployment option is the operationally distinctive piece. Mistral ships OCR 4 as a single Docker container that can be deployed on customer infrastructure with reasonable resource requirements (a typical deployment runs on a single GPU and processes documents at competitive throughput). The self-hosted licensing is separate from the API pricing and is sold on a committed-throughput basis to enterprise customers. The cost structure is meaningfully different from the API model: customers pay for capacity rather than per page, which can be cheaper at high volumes but requires more operational management.

The competitive landscape

The OCR market in 2026 includes four broad categories of competitors.

Cloud OCR APIs from the major hyperscalers are the volume incumbents. Google Document AI, AWS Textract, and Azure Document Intelligence each have several years of production deployment, deep integration with their parent cloud platforms, and per-page pricing that ranges from $1.50 (basic tier) to $65 per 1,000 pages (advanced forms-and-tables tier) depending on capability and configuration. The hyperscaler APIs are cloud-only by design; there is no self-hosted option, which is the gap that Mistral OCR 4 is positioned to fill.

Specialized OCR vendors (ABBYY, Adobe Document Services, Tungsten Automation, Kofax, IBM Datacap) have historically dominated the enterprise forms-and-documents market with on-premise software products. The specialized vendors retain meaningful share in regulated industries and in document-processing workflows with complex configuration requirements. The product depth is real but the modernization pace has been slower than the AI-native entrants.

Open-source OCR (Tesseract, EasyOCR, PaddleOCR, and the newer LLM-based options like Marker and Surya) provides a zero-licensing-cost path for organizations with engineering capacity to build their own pipelines. The accuracy and capability vary substantially across open-source options; the best of them (Marker for PDF parsing, Surya for layout) are competitive with commercial offerings for specific use cases.

The newer AI-native OCR vendors include Mistral itself, several smaller startups (LandingAI, Docparser, Veryfi for specific verticals), and the OCR capabilities of the general-purpose multimodal LLMs (Claude Opus 4.8, GPT-5.5, Gemini 3 Pro) that can process documents through their vision capabilities without using a dedicated OCR product. The general-purpose multimodal LLMs are often the right choice for ad-hoc document processing; the dedicated OCR products are the right choice for production document pipelines where consistency, throughput, and structural extraction matter more than open-ended reasoning.

Mistral OCR 4’s position in the landscape is shaped by the self-hosted differentiator. For organizations that can use cloud OCR, the cloud options are often equally capable and may be cheaper depending on volume. For organizations that need on-premise or air-gapped deployment, Mistral OCR 4 is one of the few options that combines modern AI-native accuracy with a deployable container that does not require cloud connectivity.

The EU sovereign-AI angle

The self-hosted deployment option is not just a technical feature; it is the operational manifestation of a strategic positioning Mistral has been building since its founding. The company has positioned itself as the European alternative to the US-headquartered frontier-AI labs, with particular emphasis on data sovereignty, regulatory familiarity for European customers, and the strategic value of European AI infrastructure that does not depend on US providers.

The positioning has been gaining traction in 2026 as European regulatory discussion has emphasized AI sovereignty, the EU AI Act has come into effect, and several high-profile European AI procurement decisions have favored European vendors. Mistral has been a primary beneficiary of these dynamics. The reported €3 billion funding round at a €20 billion valuation that Mistral is currently negotiating with European sovereign wealth funds is the financial expression of the sovereignty thesis.

For Mistral OCR 4 specifically, the EU sovereignty angle matters in three concrete ways. The first is that European public-sector customers (government agencies, public hospitals, public utilities) face regulatory pressure to use European-provided services for sensitive data processing. Mistral OCR 4’s self-hosted option satisfies this pressure cleanly. The second is that European private-sector customers in regulated industries (banking, insurance, defense) face similar pressures from their regulators and from their internal data-governance frameworks. The third is that organizations everywhere with air-gapped requirements (defense contractors, intelligence services, certain critical infrastructure operators) need a deployable option that the cloud-only competitors cannot provide.

The implication is that Mistral OCR 4’s addressable market is not just "OCR customers who prefer Mistral" but specifically "OCR customers who need on-premise or air-gapped deployment." This is a smaller market than the total OCR market but is one where Mistral has minimal competition from the cloud OCR APIs that dominate the broader market. Mistral OCR 4 can plausibly be the dominant choice in this segment while remaining a smaller player in the overall OCR market.

The benchmark results

Mistral published several benchmark results with the launch. The numbers, as Mistral-internal benchmarks, should be treated as directional rather than definitive until independent verification accumulates.

On OlmOCRBench (the public benchmark for OCR system comparison), Mistral OCR 4 scored 85.20 percent. The comparison context: Google Document AI scored 82 percent on the same benchmark; AWS Textract scored 78 percent; Azure Document Intelligence scored 86 percent.

On OmniDocBench (a broader document understanding benchmark including layout, tables, and structured extraction), Mistral OCR 4 scored 93.07 percent. Direct comparisons with the hyperscaler products are harder because the benchmark is newer and the hyperscaler products have not been formally evaluated against it.

Mistral’s own annotator-based head-to-head evaluation found a 72 percent average win rate against competitors on diverse document types. This is the most internal-feeling of the published numbers and is the one most worth being skeptical about until third parties replicate.

Independent reviewers have published more cautious assessments. A January 2026 review of OCR 3 found Azure beat the Mistral product by 3.3 percentage points on the DocVQA benchmark, which is a more specialized vision-question-answering task than general OCR. The takeaway is that Mistral’s "best in the market" claims are real on the benchmarks Mistral has emphasized and are less robust on benchmarks Mistral has not emphasized. The product is competitive at the top of the OCR market but is not universally dominant across every relevant benchmark.

Practical guidance for adoption

Organizations considering Mistral OCR 4 should evaluate the decision along three dimensions.

The first is deployment requirements. If the organization can use a cloud OCR service, the choice between Mistral OCR 4 (API), Google Document AI, AWS Textract, and Azure Document Intelligence is largely about pricing, regional availability, and integration with existing infrastructure. The decision is often best made by running a benchmark on the organization’s actual document types.

If the organization needs self-hosted or air-gapped deployment, the choice narrows substantially. Mistral OCR 4 is one of the few modern AI-native OCR products with a clean self-hosted story. The alternatives are open-source options (with the integration and operational overhead they imply) or the older specialized OCR vendors (with the modernization gap that often comes with them). For most organizations in this category, Mistral OCR 4 is the strongest single option.

The second is document complexity. For simple documents (printed text, straightforward layouts), most modern OCR options work well, and the choice is dominated by deployment and cost considerations. For complex documents (multi-column layouts, embedded tables, mixed handwriting and print, mathematical notation), the differentiation between OCR products is meaningful and benchmark-running on actual documents is necessary.

The third is integration with downstream LLM workflows. If the OCR output will feed an LLM for question-answering, summarization, or structured generation, the structural extraction Mistral OCR 4 produces (bounding boxes, typed blocks, confidence scores) is more useful than raw text extraction. The output format is designed to be LLM-consumable, which matters for the increasingly common pattern of document-processing-plus-LLM-reasoning pipelines.

Frequently asked questions

Is Mistral OCR 4 a separate product from Mistral’s other models, or part of a bundle? It is a standalone product accessible through its own API endpoint, though it can also be invoked through the broader Mistral Document AI bundle that includes additional document-processing capabilities. The standalone access is the most common pattern.

Can I use Mistral OCR 4 in my European Union enterprise without data leaving the EU? Yes, with the self-hosted deployment. The API option also supports EU-only regions, which keeps the data within Mistral-operated EU infrastructure. The self-hosted option keeps data within the customer’s own infrastructure.

Does Mistral OCR 4 produce text in the same language as the document, or translated text? It produces text in the original language of the document. Translation is a separate workflow that customers can build on top of the OCR output (typically using Mistral’s general-purpose models for the translation step).

How does Mistral OCR 4 compare to using Claude Opus 4.8 or GPT-5.5 directly on documents? The general-purpose models can process documents through their vision capabilities and produce competitive results for one-off processing. For production pipelines that need consistent structural extraction, predictable throughput, and explicit confidence scores, the dedicated OCR product is the better choice.

What’s the latency for a typical document? The standard API processes a typical 10-page document in a few seconds. The Batch API has higher latency (up to 24 hours) but at the 50 percent discounted rate.

Is the self-hosted container free, or do I pay per use? The self-hosted licensing is sold on a committed-throughput basis to enterprise customers. The cost structure is different from the per-page API pricing; for high volumes the self-hosted option can be substantially cheaper, while for low volumes the API is more cost-effective.

Does Mistral OCR 4 handle forms with check-boxes, radio buttons, and other form elements? Yes. The structural extraction identifies form elements as typed blocks, including checkboxes (with their selected/unselected state), radio buttons, and signature areas. Form-handling has been a focus area for the OCR 4 release.

Can I customize the model for my specific document types? Yes, through a fine-tuning option that Mistral offers to enterprise customers. The fine-tuning uses a small labeled dataset of customer documents to improve extraction on document patterns that are specific to the customer’s workflow. The fine-tuning is a separate service and is priced separately from the standard API or self-hosted licensing.

Is OCR 4 backwards-compatible with OCR 3? The API surface is compatible; existing OCR 3 integrations will continue to work against OCR 4 endpoints. The output structure has been extended with additional fields, which downstream code should handle gracefully if it was written to be tolerant of unknown fields. Customers who want to stay on OCR 3 for stability can continue to do so for at least 12 months.

What happens to Mistral OCR 3 customers after OCR 4 launches? OCR 3 remains available at its current pricing for at least 12 months. Mistral will publish migration guidance for customers ready to move to OCR 4 and will provide migration support for enterprise customers under contract.

Tagged asAI Adoption, AI OCR (Optical Character Recognition), Large Language Models (LLMs), Mistral AI

Facebook X

Mistral OCR 4: The Self-Hosted Document Recognition Model Released June 2026

What Mistral OCR 4 actually does

How it differs from base text-extraction OCR

Pricing and availability

The competitive landscape

The EU sovereign-AI angle

The benchmark results

Practical guidance for adoption

Frequently asked questions

OpenAI Jalapeño Explained: From Years of Rumors to the First Official OpenAI AI Chip

Gemini Spark + Gmail: What an Agentic Inbox Actually Looks Like in 2026

What Is Sakana Fugu? The Tokyo-Based Multi-Agent Orchestration Model Released June 2026

Dynamic Workflows in Claude Code: Running Hundreds of Parallel Subagents in One Session

OpenAI Releases GPT-5.5-Cyber: The Daybreak Cybersecurity Model Now Generally Available to Vetted Defenders

What Is a Frontier Model? Defining the Term That Shapes AI Policy, Procurement, and Architecture in 2026

Menu

Instagram

Search

Mistral OCR 4: The Self-Hosted Document Recognition Model Released June 2026

What Mistral OCR 4 actually does

How it differs from base text-extraction OCR

Pricing and availability

The competitive landscape

The EU sovereign-AI angle

The benchmark results

Practical guidance for adoption

Frequently asked questions

Further reading

OpenAI Jalapeño Explained: From Years of Rumors to the First Official OpenAI AI Chip

Gemini Spark + Gmail: What an Agentic Inbox Actually Looks Like in 2026

What Is Sakana Fugu? The Tokyo-Based Multi-Agent Orchestration Model Released June 2026

Dynamic Workflows in Claude Code: Running Hundreds of Parallel Subagents in One Session

OpenAI Releases GPT-5.5-Cyber: The Daybreak Cybersecurity Model Now Generally Available to Vetted Defenders

What Is a Frontier Model? Defining the Term That Shapes AI Policy, Procurement, and Architecture in 2026

Menu

Instagram