Follow the White Rabbit: Using Embeddings So You Never Get Lost in Translation

Community Article Published February 23, 2026

So here's a fun question: if you have a book in two languages, how do you know the translation is actually faithful to the original? You could read both versions (respect if you do), or you could let an embedding model do the heavy lifting for you.

That's exactly what we tried in this little experiment — and honestly, the results were pretty cool. We took Alice's Adventures in Wonderland in English and French, spun up an embedding model on Hugging Face Inference Endpoints, and asked it to tell us how well the two versions match up — chapter by chapter, paragraph by paragraph.

Here's how it all went down.

A white rabbit in a waistcoat holding a pocket watch, standing at the edge of a glowing rabbit hole with Matrix code raining in the background


Why Embeddings?

Quick background for the uninitiated: embedding models turn text into lists of numbers (vectors) that capture meaning. The neat thing is that if you use a multilingual model, "the cat sat on the mat" and "le chat était assis sur le tapis" end up pointing to very similar vectors — even though they're in completely different languages.

That means you can compute a simple cosine similarity between two embeddings and get a score between 0 and 1 telling you how semantically close the two pieces of text are. A 0.95? Very close. A 0.60? Something got lost in translation (or the paragraph structures don't match).

For this experiment we used Qwen3-Embedding-4B, a multilingual model from Alibaba's Qwen team that covers 100+ languages. It's a sweet spot between accuracy and efficiency — the smaller 0.6B variant just didn't have enough multilingual juice for this task.

In the notebook, we interact with the model through the OpenAI Python client — the standard openai library you probably already know. Since Inference Endpoints powered by vLLM expose an OpenAI-compatible API, we can use client.embeddings.create() to generate embeddings without any custom HTTP plumbing. This is why we specifically need a provider that serves the model through vLLM: it gives us a standard /v1/embeddings endpoint that the OpenAI SDK can talk to out of the box.


Step 1: Deploy the Model on Inference Endpoints

First things first: we need the embedding model running somewhere we can call it via the OpenAI API. Inference Endpoints make this dead simple — just pick your model from the catalog, filter for feature-extraction and vLLM, and hit deploy.

image

Under the hood, the endpoint runs vLLM, which exposes an OpenAI-compatible API including the /v1/embeddings route we need.

Once the endpoint light is green, copy its URL, you will need to send embeddings requests.

image

For this notebook we used Qwen/Qwen3-Embedding-4B on the smallest INF2 instance (2 cores, 32 GB device memory). That's it — no server management, no Docker headaches, just a URL you can POST to.


Step 2: Grab the Books

Two vintage Alice in Wonderland books open on a wooden table by candlelight

Both editions of Alice come from Project Gutenberg, where they've been sitting freely available for anyone to download.

English French
Title Alice's Adventures in Wonderland Aventures d'Alice au pays des merveilles
Author Lewis Carroll Lewis Carroll (transl. Henri Bué)
Gutenberg ID #11 #55456
Illustrator John Tenniel John Tenniel
Language English French

The original was published in 1865, illustrated by John Tenniel whose drawings remain iconic to this day. The French translation by Henri Bué was published the same year — so we're comparing two contemporary versions of the same story. Both EPUBs are downloaded at runtime; no local files needed.


Step 3: Parse the Books Into Chapters and Paragraphs

EPUB files are basically zipped HTML, so we use ebooklib and BeautifulSoup to crack them open. The logic is straightforward: walk through the document nodes, look for <h2> headings that match the table of contents, and collect everything between them as a chapter's paragraph list.

Both books have 12 chapters. The French version uses slightly different chapter titles (obviously), and has a slightly different paragraph structure — more on that in a moment.


Step 4: Match Chapters Across Languages

Once we have the chapter titles from both books, we embed them all using the same model and compute a similarity matrix. Then, for each English chapter, we find its best-matching French counterpart.

But pure title similarity has a catch — French chapter titles can be quite different from their English equivalents (think "Down the Rabbit-Hole" vs "Dans le terrier du lapin"). So we use a composite score that weighs three things:

  • Title similarity (70%) — the semantic match of the titles
  • Chapter index (15%) — chapters in the same position in the book are more likely to match
  • Paragraph count (15%) — chapters with similar numbers of paragraphs are more likely to be the same chapter

Here's what the correspondence table looks like for Alice:

Original (EN) Translation (FR) Title Sim Composite
Down the Rabbit-Hole Au fond du terrier 0.788 0.839
The Pool of Tears La mare aux larmes 0.819 0.848
A Caucus-Race and a Long Tale La course cocasse 0.734 0.804
The Rabbit Sends in a Little Bill L'habitation du Lapin Blanc 0.782 0.837
Advice from a Caterpillar Conseils d'une chenille 0.849 0.892
Pig and Pepper Porc et poivre 0.845 0.888
A Mad Tea-Party Un thé de fous 0.776 0.833
The Queen's Croquet-Ground Le croquet de la Reine 0.863 0.900
The Mock Turtle's Story Histoire de la Fausse-Tortue 0.824 0.875
The Lobster Quadrille Le quadrille de homards 0.849 0.856
Who Stole the Tarts? Qui a volé les tartes? 0.953 0.963
Alice's Evidence Déposition d'Alice 0.878 0.915

The model got every single chapter right. The most literal translation — "Who Stole the Tarts?" → "Qui a volé les tartes?" — scores a near-perfect 0.953. Meanwhile, more freely adapted titles like "A Caucus-Race and a Long Tale" → "La course cocasse" (0.734) or "A Mad Tea-Party" → "Un thé de fous" (0.776) pull the title similarity down — which is exactly where the index and paragraph count signals help boost the composite score.

A caucus race - La course cocasse


Step 5: Paragraph-Level Alignment

Now for the fun part. Once we've matched two chapters, we embed every paragraph from each version and compute a 1-to-1 similarity.

There's a wrinkle though: English and French paragraphs don't always split the same way. A translator might merge two short English sentences into one French paragraph, or split a long one. When the counts don't match, we use a greedy semantic merging strategy: iteratively merge adjacent paragraphs (in the longer version) that bring the alignment score up the most. This works because embedding spaces are approximately linear — the average of two paragraph embeddings is a decent approximation of the merged paragraph's embedding.

For the first chapter ("Down the Rabbit-Hole" / "Au fond du terrier"), the English version has 24 paragraphs and the French has 22. After semantic-guided merging (2 adjacent English paragraphs were merged to align counts), here's what the similarity scores look like across all 22 aligned paragraph pairs:

Paragraph-level translation quality chart for Chapter 1

And the quality breakdown:

Quality Threshold Count %
High ≥ 0.85 21 95.5%
Medium 0.75 – 0.85 1 4.5%
Low < 0.75 0 0.0%

The average similarity is 0.898, with a minimum of 0.846 and a maximum of 0.924 — a remarkably tight distribution (std dev: 0.020). Only a single paragraph falls into the "medium" range, and none score below 0.75. Henri Bué's 1865 translation holds up impressively well.


What This Is Good For

This whole approach is more useful than it might first seem. Some practical uses:

  • Translation QA at scale — flag paragraphs that might need a second look without reading the whole book
  • Cross-lingual document alignment — pair up corresponding sections of multilingual corpora for training or analysis
  • Chapter matching in untitled translations — even if the French edition had no chapter titles, you could still match chapters by embedding their full text

And because we're using a single multilingual model, this works for any language pair — not just English/French.

Embeddings for translation comparisons


Try It Yourself

The full notebook is available in this repository: compare-book-translations.ipynb

You'll need:

  1. A Hugging Face account with access to Inference Endpoints
  2. A Qwen/Qwen3-Embedding-4B deployed endpoint (see Step 1 above)
  3. Your endpoint URL set as INFERENCE_ENDPOINT_URL in your environment

Everything else — downloading the books, parsing chapters, computing similarities, making the plots — is handled by the notebook itself.

Happy translating! 🐇

Community

Sign up or log in to comment