Instructions to use ggaaaga/madlad400-web with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ggaaaga/madlad400-web with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "translation" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("translation", model="ggaaaga/madlad400-web")# Load model directly from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("ggaaaga/madlad400-web") model = AutoModelForSeq2SeqLM.from_pretrained("ggaaaga/madlad400-web") - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| language: | |
| - multilingual | |
| - en | |
| - ru | |
| - es | |
| - fr | |
| - de | |
| - it | |
| - pt | |
| - pl | |
| - nl | |
| - vi | |
| - tr | |
| - sv | |
| - id | |
| - ro | |
| - cs | |
| - zh | |
| - hu | |
| - ja | |
| - th | |
| - fi | |
| - fa | |
| - uk | |
| - da | |
| - el | |
| - "no" | |
| - bg | |
| - sk | |
| - ko | |
| - ar | |
| - lt | |
| - ca | |
| - sl | |
| - he | |
| - et | |
| - lv | |
| - hi | |
| - sq | |
| - ms | |
| - az | |
| - sr | |
| - ta | |
| - hr | |
| - kk | |
| - is | |
| - ml | |
| - mr | |
| - te | |
| - af | |
| - gl | |
| - fil | |
| - be | |
| - mk | |
| - eu | |
| - bn | |
| - ka | |
| - mn | |
| - bs | |
| - uz | |
| - ur | |
| - sw | |
| - yue | |
| - ne | |
| - kn | |
| - kaa | |
| - gu | |
| - si | |
| - cy | |
| - eo | |
| - la | |
| - hy | |
| - ky | |
| - tg | |
| - ga | |
| - mt | |
| - my | |
| - km | |
| - tt | |
| - so | |
| - ku | |
| - ps | |
| - pa | |
| - rw | |
| - lo | |
| - ha | |
| - dv | |
| - fy | |
| - lb | |
| - ckb | |
| - mg | |
| - gd | |
| - am | |
| - ug | |
| - ht | |
| - grc | |
| - hmn | |
| - sd | |
| - jv | |
| - mi | |
| - tk | |
| - ceb | |
| - yi | |
| - ba | |
| - fo | |
| - or | |
| - xh | |
| - su | |
| - kl | |
| - ny | |
| - sm | |
| - sn | |
| - co | |
| - zu | |
| - ig | |
| - yo | |
| - pap | |
| - st | |
| - haw | |
| - as | |
| - oc | |
| - cv | |
| - lus | |
| - tet | |
| - gsw | |
| - sah | |
| - br | |
| - rm | |
| - sa | |
| - bo | |
| - om | |
| - se | |
| - ce | |
| - cnh | |
| - ilo | |
| - hil | |
| - udm | |
| - os | |
| - lg | |
| - ti | |
| - vec | |
| - ts | |
| - tyv | |
| - kbd | |
| - ee | |
| - iba | |
| - av | |
| - kha | |
| - to | |
| - tn | |
| - nso | |
| - fj | |
| - zza | |
| - ak | |
| - ada | |
| - otq | |
| - dz | |
| - bua | |
| - cfm | |
| - ln | |
| - chm | |
| - gn | |
| - krc | |
| - wa | |
| - hif | |
| - yua | |
| - srn | |
| - war | |
| - rom | |
| - bik | |
| - pam | |
| - sg | |
| - lu | |
| - ady | |
| - kbp | |
| - syr | |
| - ltg | |
| - myv | |
| - iso | |
| - kac | |
| - bho | |
| - ay | |
| - kum | |
| - qu | |
| - za | |
| - pag | |
| - ngu | |
| - ve | |
| - pck | |
| - zap | |
| - tyz | |
| - hui | |
| - bbc | |
| - tzo | |
| - tiv | |
| - ksd | |
| - gom | |
| - min | |
| - ang | |
| - nhe | |
| - bgp | |
| - nzi | |
| - nnb | |
| - nv | |
| - zxx | |
| - bci | |
| - kv | |
| - new | |
| - mps | |
| - alt | |
| - meu | |
| - bew | |
| - fon | |
| - iu | |
| - abt | |
| - mgh | |
| - mnw | |
| - tvl | |
| - dov | |
| - tlh | |
| - ho | |
| - kw | |
| - mrj | |
| - meo | |
| - crh | |
| - mbt | |
| - emp | |
| - ace | |
| - ium | |
| - mam | |
| - gym | |
| - mai | |
| - crs | |
| - pon | |
| - ubu | |
| - fip | |
| - quc | |
| - gv | |
| - kj | |
| - btx | |
| - ape | |
| - chk | |
| - rcf | |
| - shn | |
| - tzh | |
| - mdf | |
| - ppk | |
| - ss | |
| - gag | |
| - cab | |
| - kri | |
| - seh | |
| - ibb | |
| - tbz | |
| - bru | |
| - enq | |
| - ach | |
| - cuk | |
| - kmb | |
| - wo | |
| - kek | |
| - qub | |
| - tab | |
| - bts | |
| - kos | |
| - rwo | |
| - cak | |
| - tuc | |
| - bum | |
| - cjk | |
| - gil | |
| - stq | |
| - tsg | |
| - quh | |
| - mak | |
| - arn | |
| - ban | |
| - jiv | |
| - sja | |
| - yap | |
| - tcy | |
| - toj | |
| - twu | |
| - xal | |
| - amu | |
| - rmc | |
| - hus | |
| - nia | |
| - kjh | |
| - bm | |
| - guh | |
| - mas | |
| - acf | |
| - dtp | |
| - ksw | |
| - bzj | |
| - din | |
| - zne | |
| - mad | |
| - msi | |
| - mag | |
| - mkn | |
| - kg | |
| - lhu | |
| - ch | |
| - qvi | |
| - mh | |
| - djk | |
| - sus | |
| - mfe | |
| - srm | |
| - dyu | |
| - ctu | |
| - gui | |
| - pau | |
| - inb | |
| - bi | |
| - mni | |
| - guc | |
| - jam | |
| - wal | |
| - jac | |
| - bas | |
| - gor | |
| - skr | |
| - nyu | |
| - noa | |
| - sda | |
| - gub | |
| - nog | |
| - cni | |
| - teo | |
| - tdx | |
| - sxn | |
| - rki | |
| - nr | |
| - frp | |
| - alz | |
| - taj | |
| - lrc | |
| - cce | |
| - rn | |
| - jvn | |
| - hvn | |
| - nij | |
| - dwr | |
| - izz | |
| - msm | |
| - bus | |
| - ktu | |
| - chr | |
| - maz | |
| - tzj | |
| - suz | |
| - knj | |
| - bim | |
| - gvl | |
| - bqc | |
| - tca | |
| - pis | |
| - prk | |
| - laj | |
| - mel | |
| - qxr | |
| - niq | |
| - ahk | |
| - shp | |
| - hne | |
| - spp | |
| - koi | |
| - krj | |
| - quf | |
| - luz | |
| - agr | |
| - tsc | |
| - mqy | |
| - gof | |
| - gbm | |
| - miq | |
| - dje | |
| - awa | |
| - bjj | |
| - qvz | |
| - sjp | |
| - tll | |
| - raj | |
| - kjg | |
| - bgz | |
| - quy | |
| - cbk | |
| - akb | |
| - oj | |
| - ify | |
| - mey | |
| - ks | |
| - cac | |
| - brx | |
| - qup | |
| - syl | |
| - jax | |
| - ff | |
| - ber | |
| - tks | |
| - trp | |
| - mrw | |
| - adh | |
| - smt | |
| - srr | |
| - ffm | |
| - qvc | |
| - mtr | |
| - ann | |
| - kaa | |
| - aa | |
| - noe | |
| - nut | |
| - gyn | |
| - kwi | |
| - xmm | |
| - msb | |
| library_name: transformers | |
| tags: | |
| - text2text-generation | |
| - text-generation-inference | |
| datasets: | |
| - allenai/MADLAD-400 | |
| pipeline_tag: translation | |
| # Model Card for MADLAD-400-3B-MT | |
| # Table of Contents | |
| 0. [TL;DR](#TL;DR) | |
| 1. [Model Details](#model-details) | |
| 2. [Usage](#usage) | |
| 3. [Uses](#uses) | |
| 4. [Bias, Risks, and Limitations](#bias-risks-and-limitations) | |
| 5. [Training Details](#training-details) | |
| 6. [Evaluation](#evaluation) | |
| 7. [Environmental Impact](#environmental-impact) | |
| 8. [Citation](#citation) | |
| # TL;DR | |
| MADLAD-400-3B-MT is a multilingual machine translation model based on the T5 architecture that was | |
| trained on 1 trillion tokens covering over 450 languages using publicly available data. | |
| It is competitive with models that are significantly larger. | |
| **Disclaimer**: [Juarez Bochi](https://huggingface.co/jbochi), who was not involved in this research, converted | |
| the original weights and wrote the contents of this model card based on the original paper and Flan-T5. | |
| # Model Details | |
| ## Model Description | |
| - **Model type:** Language model | |
| - **Language(s) (NLP):** Multilingual (400+ languages) | |
| - **License:** Apache 2.0 | |
| - **Related Models:** [All MADLAD-400 Checkpoints](https://huggingface.co/models?search=madlad) | |
| - **Original Checkpoints:** [All Original MADLAD-400 Checkpoints](https://github.com/google-research/google-research/tree/master/madlad_400) | |
| - **Resources for more information:** | |
| - [Research paper](https://arxiv.org/abs/2309.04662) | |
| - [GitHub Repo](https://github.com/google-research/t5x) | |
| - [Hugging Face MADLAD-400 Docs (Similar to T5) ](https://huggingface.co/docs/transformers/model_doc/MADLAD-400) - [Pending PR](https://github.com/huggingface/transformers/pull/27471) | |
| # Usage | |
| Find below some example scripts on how to use the model: | |
| ## Using the Pytorch model with `transformers` | |
| ### Running the model on a CPU or GPU | |
| <details> | |
| <summary> Click to expand </summary> | |
| First, install the Python packages that are required: | |
| `pip install transformers accelerate sentencepiece` | |
| ```python | |
| from transformers import T5ForConditionalGeneration, T5Tokenizer | |
| model_name = 'jbochi/madlad400-3b-mt' | |
| model = T5ForConditionalGeneration.from_pretrained(model_name, device_map="auto") | |
| tokenizer = T5Tokenizer.from_pretrained(model_name) | |
| text = "<2pt> I love pizza!" | |
| input_ids = tokenizer(text, return_tensors="pt").input_ids.to(model.device) | |
| outputs = model.generate(input_ids=input_ids) | |
| tokenizer.decode(outputs[0], skip_special_tokens=True) | |
| # Eu adoro pizza! | |
| ``` | |
| </details> | |
| ## Running the model with Candle | |
| <details> | |
| <summary> Click to expand </summary> | |
| Usage with [candle](https://github.com/huggingface/candle): | |
| ```bash | |
| $ cargo run --example t5 --release -- \ | |
| --model-id "jbochi/madlad400-3b-mt" \ | |
| --prompt "<2de> How are you, my friend?" \ | |
| --decode --temperature 0 | |
| ``` | |
| We also provide a quantized model (1.65 GB vs the original 11.8 GB file): | |
| ``` | |
| cargo run --example quantized-t5 --release -- \ | |
| --model-id "jbochi/madlad400-3b-mt" --weight-file "model-q4k.gguf" \ | |
| --prompt "<2de> How are you, my friend?" \ | |
| --temperature 0 | |
| ... | |
| Wie geht es dir, mein Freund? | |
| ``` | |
| </details> | |
| # Uses | |
| ## Direct Use and Downstream Use | |
| > Primary intended uses: Machine Translation and multilingual NLP tasks on over 400 languages. | |
| > Primary intended users: Research community. | |
| ## Out-of-Scope Use | |
| > These models are trained on general domain data and are therefore not meant to | |
| > work on domain-specific models out-of-the box. Moreover, these research models have not been assessed | |
| > for production usecases. | |
| # Bias, Risks, and Limitations | |
| > We note that we evaluate on only 204 of the languages supported by these models and on machine translation | |
| > and few-shot machine translation tasks. Users must consider use of this model carefully for their own | |
| > usecase. | |
| ## Ethical considerations and risks | |
| > We trained these models with MADLAD-400 and publicly available data to create baseline models that | |
| > support NLP for over 400 languages, with a focus on languages underrepresented in large-scale corpora. | |
| > Given that these models were trained with web-crawled datasets that may contain sensitive, offensive or | |
| > otherwise low-quality content despite extensive preprocessing, it is still possible that these issues to the | |
| > underlying training data may cause differences in model performance and toxic (or otherwise problematic) | |
| > output for certain domains. Moreover, large models are dual use technologies that have specific risks | |
| > associated with their use and development. We point the reader to surveys such as those written by | |
| > Weidinger et al. or Bommasani et al. for a more detailed discussion of these risks, and to Liebling | |
| > et al. for a thorough discussion of the risks of machine translation systems. | |
| ## Known Limitations | |
| More information needed | |
| ## Sensitive Use: | |
| More information needed | |
| # Training Details | |
| > We train models of various sizes: a 3B, 32-layer parameter model, | |
| > a 7.2B 48-layer parameter model and a 10.7B 32-layer parameter model. | |
| > We share all parameters of the model across language pairs, | |
| > and use a Sentence Piece Model with 256k tokens shared on both the encoder and decoder | |
| > side. Each input sentence has a <2xx> token prepended to the source sentence to indicate the target | |
| > language. | |
| See the [research paper](https://arxiv.org/pdf/2309.04662.pdf) for further details. | |
| ## Training Data | |
| > For both the machine translation and language model, MADLAD-400 is used. For the machine translation | |
| > model, a combination of parallel datasources covering 157 languages is also used. Further details are | |
| > described in the [paper](https://arxiv.org/pdf/2309.04662.pdf). | |
| ## Training Procedure | |
| See the [research paper](https://arxiv.org/pdf/2309.04662.pdf) for further details. | |
| # Evaluation | |
| ## Testing Data, Factors & Metrics | |
| > For evaluation, we used WMT, NTREX, Flores-200 and Gatones datasets as described in Section 4.3 in the [paper](https://arxiv.org/pdf/2309.04662.pdf). | |
| > The translation quality of this model varies based on language, as seen in the paper, and likely varies on | |
| > domain, though we have not assessed this. | |
| ## Results | |
|  | |
|  | |
|  | |
| See the [research paper](https://arxiv.org/pdf/2309.04662.pdf) for further details. | |
| # Environmental Impact | |
| More information needed | |
| # Citation | |
| **BibTeX:** | |
| ```bibtex | |
| @misc{kudugunta2023madlad400, | |
| title={MADLAD-400: A Multilingual And Document-Level Large Audited Dataset}, | |
| author={Sneha Kudugunta and Isaac Caswell and Biao Zhang and Xavier Garcia and Christopher A. Choquette-Choo and Katherine Lee and Derrick Xin and Aditya Kusupati and Romi Stella and Ankur Bapna and Orhan Firat}, | |
| year={2023}, | |
| eprint={2309.04662}, | |
| archivePrefix={arXiv}, | |
| primaryClass={cs.CL} | |
| } | |
| ``` | |