Tip: Improve pronunciation of numbers & abbreviations with a text normalizer

by raditotev - opened Feb 24

Feb 24

If you've noticed the model is struggling with numbers, dates, or abbreviations — this is a
known limitation of all (known to me) bulgarian TTS models.

A simple and effective workaround is to pre-process your input text using bg_text_normalizer
before passing it to the model. This library converts Bulgarian numbers, abbreviations, and
other non-standard words into their spoken form, which the model handles much more
reliably.

Example:

  from bg_text_normalizer import normalize

  text = "Цената е 1500 лв. за м² в кв. Лозенец."
  normalized = normalize(text)
  # → "Цената е хиляда и петстотин лева за квадратен метър в квартал Лозенец."

  # Then pass `normalized` to the TTS model

This small pre-processing step can significantly improve the naturalness and accuracy of the
synthesized speech, especially for texts containing numerals, currency, units of
measurement, or common abbreviations.

Hope this helps others getting the most out of this model!

beleata74

Owner Feb 24

•

edited Feb 25

„Благодаря за предложението, bg_text_normalizer определено ще подобри нещата!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment