--- license: cc-by-nc-sa-4.0 --- We released the suite of models we trained as part of our work on scaling laws of decoder-only machine translation systems. This work has been published in WMT24 and is available [here](https://aclanthology.org/2024.wmt-1.124/). These models have been trained on a mixture of general and financial sentences on 11 language directions. They support 8 languages (English, French, German, Italian, Spanish, Dutch, Swedish and Portuguese) as well as 9 domains (general + 8 financial subdomains). They are not tailored for document-level translation. A running demo of these models is available on [our dedicated space](https://huggingface.co/spaces/DragonLLM/FinTranslate-Demo). ## Evaluation The below table details the performance of our models on general domain translation. | Model | BLEU | COMET | COMET-Kiwi | | ----------- | --------- | --------- | ---------- | | FinTranslate-70M | 29.62 | 81.31 | 80.72 | | FinTranslate-160M | 32.43 | 84.00 | 83.45 | | FinTranslate-410M | 33.60 | 84.81 | 84.14 | | FinTranslate-Bronze | 34.08 | 85.10 | 84.35 | | FinTranslate-Silver | 34.42 | 85.10 | 84.33 | | FinTranslate-Gold | **36.07** | 85.88 | 84.82 | | | | | | | Llama3.1 8B | 30.43 | 84.82 | 84.47 | | Mistral 7B | 23.26 | 80.08 | 82.29 | | Tower 7B | 33.50 | **85.91** | **85.02** | The below table details the performance of our models on financial translation. | Model | BLEU | COMET | COMET-Kiwi | | ------------ | --------- | --------- | ---------- | | FinTranslate-70M | 44.63 | 86.95 | 80.88 | | FinTranslate-160M | 49.02 | 88.27 | 81.80 | | FinTranslate-410M | 50.85 | 88.64 | 81.73 | | FinTranslate-Bronze | 52.00 | 88.85 | 81.71 | | FinTranslate-Silver | 53.28 | **89.98** | 81.61 | | FinTranslate-Gold | **58.34** | 89.62 | 81.35 | | | | | | | Llama 3.1 8B | 34.99 | 84.42 | 81.75 | | Mistral 7B | 38.93 | 76.52 | 76.17 | | Tower 7B | 38.93 | 86.49 | **82.66** | ## How to use it ```python from transformers import AutoTokenizer, AutoModelForCausalLM LANGUAGES = ["en", "de", "es", "fr", "it", "nl", "sv", "pt"] DOMAINS = { "Asset manangement marketing": "am", "Annual report": "ar", "Corporate action": "corporateAction", "Equity research": "equi", "Fund fact sheet": "ffs", "Kiid": "kiid", "Life insurance": "lifeInsurance", "Regulatory": "regulatory", "General": "general", } def language_token(lang): return f"" def domain_token(dom): return f"" def format_input(src, tgt_lang, src_lang, domain): assert tgt_lang in LANGUAGES tgt_lang_token = language_token(tgt_lang) # Please read our paper to understand why we need to prefix the input with base_input = f"{src}{tgt_lang_token}" if src_lang is None: return base_input else: assert src_lang in LANGUAGES src_lang_token = language_token(src_lang) base_input = f"{base_input}{src_lang_token}" if domain is None: return base_input else: domain = DOMAINS.get(domain, "general") dom_token = domain_token(domain) base_input = f"{base_input}{dom_token}" return base_input model_id = "DragonLLM/FinTranslate-Silver" model = AutoModelForCausalLM.from_pretrained(model_id) tokenizer = AutoTokenizer.from_pretrained(model_id) source_sentence = "Dragon LLM est une entreprise française spécialisé dans le domaine de l'IA générative." formatted_sentence = format_input(source_sentence, "en", "fr", "General") inputs = tokenizer(formatted_sentence, return_tensors="pt", return_token_type_ids=False) outputs = model.generate(**inputs, max_new_tokens=64) input_size = inputs["input_ids"].size(1) translated_sentence = tokenizer.decode( outputs[0, input_size:], skip_special_tokens=True ) print(translated_sentence) # Dragon LLM is a French company specialized in the field of generative AI. ``` ## Citing this work If you use this model in your work, please cite it as: ``` @inproceedings{caillaut-etal-2024-scaling, title = "Scaling Laws of Decoder-Only Models on the Multilingual Machine Translation Task", author = {Caillaut, Ga{\"e}tan and Nakhl{\'e}, Mariam and Qader, Raheel and Liu, Jingshu and Barth{\'e}lemy, Jean-Gabriel}, editor = "Haddow, Barry and Kocmi, Tom and Koehn, Philipp and Monz, Christof", booktitle = "Proceedings of the Ninth Conference on Machine Translation", month = nov, year = "2024", address = "Miami, Florida, USA", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2024.wmt-1.124/", doi = "10.18653/v1/2024.wmt-1.124", pages = "1318--1331" } ```