Follow the White Rabbit: Using Embeddings So You Never Get Lost in Translation
That's exactly what we tried in this little experiment — and honestly, the results were pretty cool. We took Alice's Adventures in Wonderland in English and French, spun up an embedding model on Hugging Face Inference Endpoints, and asked it to tell us how well the two versions match up — chapter by chapter, paragraph by paragraph.
Here's how it all went down.
Why Embeddings?
Quick background for the uninitiated: embedding models turn text into lists of numbers (vectors) that capture meaning. The neat thing is that if you use a multilingual model, "the cat sat on the mat" and "le chat était assis sur le tapis" end up pointing to very similar vectors — even though they're in completely different languages.
That means you can compute a simple cosine similarity between two embeddings and get a score between 0 and 1 telling you how semantically close the two pieces of text are. A 0.95? Very close. A 0.60? Something got lost in translation (or the paragraph structures don't match).
For this experiment we used Qwen3-Embedding-4B, a multilingual model from Alibaba's Qwen team that covers 100+ languages. It's a sweet spot between accuracy and efficiency — the smaller 0.6B variant just didn't have enough multilingual juice for this task.
In the notebook, we interact with the model through the OpenAI Python client — the standard openai library you probably already know. Since Inference Endpoints powered by vLLM expose an OpenAI-compatible API, we can use client.embeddings.create() to generate embeddings without any custom HTTP plumbing. This is why we specifically need a provider that serves the model through vLLM: it gives us a standard /v1/embeddings endpoint that the OpenAI SDK can talk to out of the box.
Step 1: Deploy the Model on Inference Endpoints
First things first: we need the embedding model running somewhere we can call it via the OpenAI API. Inference Endpoints make this dead simple — just pick your model from the catalog, filter for feature-extraction and vLLM, and hit deploy.
Under the hood, the endpoint runs vLLM, which exposes an OpenAI-compatible API including the /v1/embeddings route we need.
Once the endpoint light is green, copy its URL, you will need to send embeddings requests.
For this notebook we used Qwen/Qwen3-Embedding-4B on the smallest INF2 instance (2 cores, 32 GB device memory). That's it — no server management, no Docker headaches, just a URL you can POST to.
Step 2: Grab the Books
Both editions of Alice come from Project Gutenberg, where they've been sitting freely available for anyone to download.
| English | French | |
|---|---|---|
| Title | Alice's Adventures in Wonderland | Aventures d'Alice au pays des merveilles |
| Author | Lewis Carroll | Lewis Carroll (transl. Henri Bué) |
| Gutenberg ID | #11 | #55456 |
| Illustrator | John Tenniel | John Tenniel |
| Language | English | French |
The original was published in 1865, illustrated by John Tenniel whose drawings remain iconic to this day. The French translation by Henri Bué was published the same year — so we're comparing two contemporary versions of the same story. Both EPUBs are downloaded at runtime; no local files needed.
Step 3: Parse the Books Into Chapters and Paragraphs
EPUB files are basically zipped HTML, so we use ebooklib and BeautifulSoup to crack them open. The logic is straightforward: walk through the document nodes, look for <h2> headings that match the table of contents, and collect everything between them as a chapter's paragraph list.
Both books have 12 chapters. The French version uses slightly different chapter titles (obviously), and has a slightly different paragraph structure — more on that in a moment.
Step 4: Match Chapters Across Languages
Once we have the chapter titles from both books, we embed them all using the same model and compute a similarity matrix. Then, for each English chapter, we find its best-matching French counterpart.
But pure title similarity has a catch — French chapter titles can be quite different from their English equivalents (think "Down the Rabbit-Hole" vs "Dans le terrier du lapin"). So we use a composite score that weighs three things:
- Title similarity (70%) — the semantic match of the titles
- Chapter index (15%) — chapters in the same position in the book are more likely to match
- Paragraph count (15%) — chapters with similar numbers of paragraphs are more likely to be the same chapter
Here's what the correspondence table looks like for Alice:
| Original (EN) | Translation (FR) | Title Sim | Composite |
|---|---|---|---|
| Down the Rabbit-Hole | Au fond du terrier | 0.788 | 0.839 |
| The Pool of Tears | La mare aux larmes | 0.819 | 0.848 |
| A Caucus-Race and a Long Tale | La course cocasse | 0.734 | 0.804 |
| The Rabbit Sends in a Little Bill | L'habitation du Lapin Blanc | 0.782 | 0.837 |
| Advice from a Caterpillar | Conseils d'une chenille | 0.849 | 0.892 |
| Pig and Pepper | Porc et poivre | 0.845 | 0.888 |
| A Mad Tea-Party | Un thé de fous | 0.776 | 0.833 |
| The Queen's Croquet-Ground | Le croquet de la Reine | 0.863 | 0.900 |
| The Mock Turtle's Story | Histoire de la Fausse-Tortue | 0.824 | 0.875 |
| The Lobster Quadrille | Le quadrille de homards | 0.849 | 0.856 |
| Who Stole the Tarts? | Qui a volé les tartes? | 0.953 | 0.963 |
| Alice's Evidence | Déposition d'Alice | 0.878 | 0.915 |
The model got every single chapter right. The most literal translation — "Who Stole the Tarts?" → "Qui a volé les tartes?" — scores a near-perfect 0.953. Meanwhile, more freely adapted titles like "A Caucus-Race and a Long Tale" → "La course cocasse" (0.734) or "A Mad Tea-Party" → "Un thé de fous" (0.776) pull the title similarity down — which is exactly where the index and paragraph count signals help boost the composite score.
Step 5: Paragraph-Level Alignment
Now for the fun part. Once we've matched two chapters, we embed every paragraph from each version and compute a 1-to-1 similarity.
There's a wrinkle though: English and French paragraphs don't always split the same way. A translator might merge two short English sentences into one French paragraph, or split a long one. When the counts don't match, we use a greedy semantic merging strategy: iteratively merge adjacent paragraphs (in the longer version) that bring the alignment score up the most. This works because embedding spaces are approximately linear — the average of two paragraph embeddings is a decent approximation of the merged paragraph's embedding.
For the first chapter ("Down the Rabbit-Hole" / "Au fond du terrier"), the English version has 24 paragraphs and the French has 22. After semantic-guided merging (2 adjacent English paragraphs were merged to align counts), here's what the similarity scores look like across all 22 aligned paragraph pairs:
And the quality breakdown:
| Quality | Threshold | Count | % |
|---|---|---|---|
| High | ≥ 0.85 | 21 | 95.5% |
| Medium | 0.75 – 0.85 | 1 | 4.5% |
| Low | < 0.75 | 0 | 0.0% |
The average similarity is 0.898, with a minimum of 0.846 and a maximum of 0.924 — a remarkably tight distribution (std dev: 0.020). Only a single paragraph falls into the "medium" range, and none score below 0.75. Henri Bué's 1865 translation holds up impressively well.
What This Is Good For
This whole approach is more useful than it might first seem. Some practical uses:
- Translation QA at scale — flag paragraphs that might need a second look without reading the whole book
- Cross-lingual document alignment — pair up corresponding sections of multilingual corpora for training or analysis
- Chapter matching in untitled translations — even if the French edition had no chapter titles, you could still match chapters by embedding their full text
And because we're using a single multilingual model, this works for any language pair — not just English/French.
Try It Yourself
The full notebook is available in this repository: compare-book-translations.ipynb
You'll need:
- A Hugging Face account with access to Inference Endpoints
- A
Qwen/Qwen3-Embedding-4Bdeployed endpoint (see Step 1 above) - Your endpoint URL set as
INFERENCE_ENDPOINT_URLin your environment
Everything else — downloading the books, parsing chapters, computing similarities, making the plots — is handled by the notebook itself.
Happy translating! 🐇






