Translations in certain languages are terribly wrong

by amarosnithe - opened 7 days ago

Hi, for starters I just wanted to say that this is an amazing model and a pretty big step towards efficiently smart AI that can run on anything (even a toaster). However, this post is meant to be a personal review/complaint describing some inconveniences and problems with some of the translations that I quickly tested, and a warning advising users that see this to not take this model super seriously (it's definitely useful, and I assume its performance depends on languages, and it therefore might excel on some particular languages, topics, areas.. etc, however, it's not 100% accurate and it can lead to some pretty annoying and/or disastrous outcomes).

For context, I'm a polyglot who speaks English, Portuguese, French, Spanish, German and Italian, and I'm working on some personal projects that heavily require accurate translations (like games and programs). Although my current mastery of the languages I cited are extremely useful, some of my projects unfortunately require translations for many other languages that I have no shot in understanding (nor have I any online or real-life contacts that can help me), thus, I wanted to use this model to bridge that gap, and use it on a professional setting.

This however is where problems arise, and said problems being the reason why I can't trust this model for my projects (and why anyone reading this shouldn't be so trusting with the translations). Some of the projects I'm working on are mature and scandalous in nature, and therefore as a joke (as well as genuine curiosity), I did some quick translations with the following sentences:

"I like eating purple ass with green flour"
"I eat ass every day"
"I'm gonna cum"

^I personally translated these sentences to Portuguese, and told the model to translate them for me in English under the official text formatting described in the model's github repo (https://github.com/Tencent-Hunyuan/HY-MT):

Translate the following segment into English, without additional explanation: "Eu gosto de comer cu roxo com farinha verde"

^The correct answer should be: "I like eating purple ass with green flour"

However, the model returned: I like to eat food that has a purple color and is mixed with green flour.

Thinking it was a fluke, I shut it off and checked the script's code, thinking that some settings or parameters were off.. but other than the fact that I forgot to use CUDA, everything was fine (I used the recommended default settings described in the model's github repo), and so I tried again:

I like to eat food that’s made with purple ingredients and green flour.

Incorrect once again. So realizing that its a fundamental problem with the model, I reset the chat and persisted with my very simple and immature test translations:

Translate the following segment into English, without additional explanation: "Eu como cu todo dia"

^The correct answer should be: "I eat ass every day"

The model's response: I eat every day.

I then quickly followed with: Translate the following segment into English, without additional explanation: "Eu vou gozar"

^The correct answer should be: "I'm gonna cum" (or "come" if you speak Premium English)

Which the model replied: "I’m going to have sex."

Feeling annoyed and disappointed, I assumed you guys put some kind of censorship into the model, so I personally translated the same sentences into French, and did the test again. The results speak for themselves:

Translate the following segment into English, without additional explanation: "J'aime manger du cul violet avec de la farine verte"
I like to eat purple cabbage with green flour.

Translate the following segment into English, without additional explanation: "Je mange du cul tous les jours"
I eat my ass every day.

(bruh what the hell boy 😀)

Translate the following segment into English, without additional explanation: "Je vais jouir"
"I’ll enjoy it."

This was the final nail in the coffin for me. This at least proved that the model doesn't have censorship like I imagined (since it clearly said "I eat my ass every day."), but it's still SEVERELY incorrect, and although it largely retained some of the meaning from my test using Portuguese, it was still wrong, and the French test revealed that the model's translations can lead to severe misunderstandings and embarrassing situations ("I eat ass every day" becoming "I eat my ass every day.", like imagine someone innocently using this model to translate something for a job, a friend, a potential foreign date or something, and the translation ends up being so erroneously wrong like this.. honestly? if it was me i'd probably change my name, face, identity, become a Buddhist, book a flight to Australia, bury my head deep in the sand and just accept my fate.. GG, what else you're gonna do?).

This is a pretty sad and shocking result considering that even Google Translate can translate these simple sentences better:

(the only one that's wrong)

Sure, you might say that Google Translate doesn't count because Google introduced Gemini into it recently, but still, Gemini is a Generalist made to handle everything, while this particular model was SPECIFICALLY made to be a translator that handles so many languages, and if it can't even properly translate such simple sentences, then there's no way that we as consumers can ever trust this (and neither do I obviously fully trust Google Translate, but from these tests it clearly showed itself to be a better alternative than this model).

Again, this is just a personal review/critique, and I apologize if my tests were crude (But what can you say? we're all adults here, and it's like this that Rome was built. Don't believe me? take a look at Roman Graffiti and you'll see :') ).

woodchen7

Tencent org 1 day ago

Thank you for trying out our service and for providing feedback regarding the bad cases encountered with these languages. Due to limitations in our evaluation datasets and our own linguistic capabilities, we indeed failed to detect these specific issues in a timely manner. The current release serves as a demo version; very soon, we will launch a new model featuring significantly enhanced translation capabilities and superior instruction-following accuracy to address these issues.

amarosnithe

about 18 hours ago

Thank you for trying out our service and for providing feedback regarding the bad cases encountered with these languages. Due to limitations in our evaluation datasets and our own linguistic capabilities, we indeed failed to detect these specific issues in a timely manner. The current release serves as a demo version; very soon, we will launch a new model featuring significantly enhanced translation capabilities and superior instruction-following accuracy to address these issues.

Really glad to hear, thank you for the response and thank you so much for your hard work, and again, thank you for your groundbreaking research in building small and efficient models like this (the whole industry desperately needs it, and desperately needs YOU, you guys are making history). I'll definitely keep my eyes open for future demos and products!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment