Granite unveils sub-100M multilingual embeddings with 32K context

By Alexander Cole

Granite Embedding Multilingual R2: Open Apache 2.0 Multilingual Embeddings with 32K Context — Best Sub-100M Retrieval Quality

Image / huggingface.co

Two compact Granite models push into sub-100M territory with 32K context and multilingual reach across 200+ languages Granite Embedding Multilingual R2. The release pairs a 97M parameter compact model and a 311M full size model, both open and Apache 2.0 licensed, designed to close the gap between speed and breadth of language coverage Granite Embedding Multilingual R2.

The two models are positioned as a practical choice for teams doing multilingual retrieval augmented generation, cross language search, and code retrieval across teams that speak many tongues. The 97M model is pitched as a sub-100M breakthrough that outperforms open rivals on MTEB Multilingual Retrieval with a score of 60.3, while the 311M model hits 65.2 on the same benchmark and ranks second among open models under 500M parameters Granite Embedding Multilingual R2. The paper emphasizes a shared design goal: broad language coverage without sacrificing too much speed or deployability Granite Embedding Multilingual R2.

Both models support a 32K-token context, are tuned on 52 languages, and extend to code retrieval across 9 programming languages, expanding capabilities beyond natural language queries to technical search and software teams Granite Embedding Multilingual R2. They also come with Matryoshka embeddings, a framing that lets teams stack retrieval layers for speed on large corpora while retaining depth for multilingual nuances Granite Embedding Multilingual R2.

For teams building enterprise search and multilingual QA tools, the headline here is flexibility. The Apache 2.0 licensing and the dual-model approach give framework integrators concrete choices: the lean 97M model can be deployed quickly where latency and budget matter, while the 311M model provides stronger retrieval quality for heavier multilingual pipelines and code search workflows, all with Matryoshka support to tune retrieval depth and speed as needed Granite Embedding Multilingual R2.

Analysts and practitioners can think of these models as a bilingual chef and a multilingual sous chef in a single kitchen. The 97M model acts like a fast, versatile base that handles many languages well, while the 311M model behaves like a deeper stage that pays occasional attention to harder cross language cases, all with a shared 32K context spine and 200+ language reach Granite Embedding Multilingual R2. The Matryoshka setup lets teams start with broad retrieval and layer in more precise, language-aware steps as data and latency budgets allow Granite Embedding Multilingual R2.

What this means for products shipping this quarter is clear: you can lean on an open, scalable embedding stack that covers 200+ languages, supports 32K context, and includes code search across nine languages, all with practical low- and mid-range parameter footprints. If speed and cost matter, the 97M model may win; if you need stronger raw retrieval quality and are willing to allocate a bit more compute, the 311M variant with Matryoshka is a compelling choice Granite Embedding Multilingual R2.

Newsletter

The Robotics Briefing

A daily front-page digest delivered around noon Central Time, with the strongest headlines linked straight into the full stories.

No spam. Unsubscribe anytime. Read our privacy policy for details.