bf16 GGUF, english prompt for contextual translation, long texts translation
#4
by fikavec - opened
Thank you for the wonderful model. A few questions:
- Is there a prompt in English for "contextual translation"?
- Could you, please, share the gguf in bf16 or f16 precision and BLEU scores on flores-200 (To quickly check for degradation of the quantized model quality after a local run, for example, without GGML_CUDA_NO_PINNED=1 on older GPUs, you may encounter gibberish output. It would be convenient to use sacrebleu to check against your benchmark results.)?
- Could you give some recommendations for translating long texts? For example, split the text into chunks of N characters each (what's the optimal chunk size?), with recommended parameters+"max_tokens":2.5*len(N) translate the first chunk using the prompt "Prompt Template for XX<=>XX," and the next chunks using the prompt "contextual translation", check translations by clean_repeated_substrings(translation) from https://huggingface.co/tencent/HunyuanOCR model card...
Is it possible that using prompts in Chinese (especially "prompt for contextual translation" without context) can provide better translation quality? Based on my measurements of one language pair from the flores-200 test, the following metrics were obtained:
- BLEU: 35.278 with prompt_template = f'''\n参考上面的信息,把下面的文本翻译成{Chinese_target_language},注意不需要翻译上文,也不要额外解释:\n{source_text}'''
- BLEU: 34.210 with prompt_template = f'''将以下文本翻译为{Chinese_target_language},注意只需要输出翻译后的结果,不要额外解释:\n\n{source_text}'''
- BLEU: 32.301 with prompt_template = f'''Translate the following segment into {English_target_language}, without additional explanation.\n\n{source_text}'''
and in another language pair:
- BLEU: 28.965 with prompt_template = f'''\n参考上面的信息,把下面的文本翻译成{Chinese_target_language},注意不需要翻译上文,也不要额外解释:\n{source_text}'''
- BLEU: 28.071 with prompt_template = f'''将以下文本翻译为{Chinese_target_language},注意只需要输出翻译后的结果,不要额外解释:\n\n{source_text}'''
- BLEU: 26.375 with prompt_template = f'''Translate the following segment into {English_target_language}, without additional explanation.\n\n{source_text}'''
fikavec changed discussion status to closed