Translation
senyu1 commited on
Commit
ee075de
·
verified ·
1 Parent(s): 9608f9d

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +231 -0
README.md ADDED
@@ -0,0 +1,231 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: translation
3
+ language:
4
+ - multilingual
5
+ - en
6
+ - am
7
+ - ar
8
+ - so
9
+ - sw
10
+ - pt
11
+ - af
12
+ - fr
13
+ - zu
14
+ - mg
15
+ - ha
16
+ - sn
17
+ - arz
18
+ - ny
19
+ - ig
20
+ - xh
21
+ - yo
22
+ - st
23
+ - rw
24
+ - tn
25
+ - ti
26
+ - ts
27
+ - om
28
+ - run
29
+ - nso
30
+ - ee
31
+ - ln
32
+ - tw
33
+ - pcm
34
+ - gaa
35
+ - loz
36
+ - lg
37
+ - guw
38
+ - bem
39
+ - efi
40
+ - lue
41
+ - lua
42
+ - toi
43
+ - ve
44
+ - tum
45
+ - tll
46
+ - iso
47
+ - kqn
48
+ - zne
49
+ - umb
50
+ - mos
51
+ - tiv
52
+ - lu
53
+ - ff
54
+ - kwy
55
+ - bci
56
+ - rnd
57
+ - luo
58
+ - wal
59
+ - ss
60
+ - lun
61
+ - wo
62
+ - nyk
63
+ - kj
64
+ - ki
65
+ - fon
66
+ - bm
67
+ - cjk
68
+ - din
69
+ - dyu
70
+ - kab
71
+ - kam
72
+ - kbp
73
+ - kr
74
+ - kmb
75
+ - kg
76
+ - nus
77
+ - sg
78
+ - taq
79
+ - tzm
80
+ - nqo
81
+
82
+ license: apache-2.0
83
+ ---
84
+ SSA-COMET-MTL, a robust, unified automatic metric for both MTE and QE, built based on SSA-MTE: It receives a triplet with (source sentence, translation, reference translation) for MTE, or a pair with (source sentence, translation) for QE, and returns a score that reflects the quality of the translation.
85
+ This model is based on an improved African enhanced encoder, [afro-xlmr-large-76L](https://huggingface.co/Davlan/afro-xlmr-large-76L).
86
+
87
+ # Paper
88
+
89
+ Li S., Wang J., Ali F., Cherry C., Deutsch D., Briakou E., Sousa-Silva R., Cardoso H.L, Stenetorp P., and Adelani D.I.
90
+ [SSA-COMET: Do LLMs Outperform Learned Metrics in Evaluating MT for Under-Resourced African Languages?](https://aclanthology.org/2025.emnlp-main.656/). In EMNLP 2025
91
+
92
+
93
+ # License
94
+
95
+ Apache-2.0
96
+
97
+ # Usage (SSA-COMET)
98
+
99
+ Using this model requires unbabel-comet to be installed:
100
+
101
+ ```bash
102
+ pip install --upgrade pip # ensures that pip is current
103
+ pip install unbabel-comet
104
+ ```
105
+
106
+ Then you can use it through comet CLI:
107
+
108
+ ```bash
109
+ comet-score -s {source-inputs}.txt -t {translation-outputs}.txt -r {references}.txt --model McGill-NLP/ssa-comet-mtl
110
+ ```
111
+
112
+ Or using Python:
113
+
114
+ ```python
115
+ from comet import download_model, load_from_checkpoint
116
+ model_path = download_model("McGill-NLP/ssa-comet-mtl")
117
+ model = load_from_checkpoint(model_path)
118
+ data = [
119
+ {
120
+ "src": "Nadal sàkọọ́lẹ̀ ìforígbárí o ní àmì méje sóódo pẹ̀lú ilẹ̀ Canada.",
121
+ "mt": "Nadal's head to head record against the Canadian is 7–2.",
122
+ "ref": "Nadal scored seven unanswered points against Canada."
123
+ },
124
+ {
125
+ "src": "Laipe yi o padanu si Raoniki ni ere Sisi Brisbeni.",
126
+ "mt": "He recently lost against Raonic in the Brisbane Open.",
127
+ "ref": "He recently lost to Raoniki in the game Sisi Brisbeni."
128
+ }
129
+ ]
130
+ model_output = model.predict(data, batch_size=8, gpus=1)
131
+ print (model_output)
132
+ ```
133
+
134
+ # Intended uses
135
+
136
+ Our model is intended to be used for **MT evaluation** and **Quality Eestimation**.
137
+
138
+ Given a triplet with (source sentence, translation, reference translation), or a pair with (source sentence, translation), it outputs a single score between 0 and 1, where 1 represents a perfect translation.
139
+
140
+ # Languages Covered:
141
+
142
+ There are 76 languages available :
143
+ - English (eng)
144
+ - Amharic (amh)
145
+ - Arabic (ara)
146
+ - Somali (som)
147
+ - Kiswahili (swa)
148
+ - Portuguese (por)
149
+ - Afrikaans (afr)
150
+ - French (fra)
151
+ - isiZulu (zul)
152
+ - Malagasy (mlg)
153
+ - Hausa (hau)
154
+ - chiShona (sna)
155
+ - Egyptian Arabic (arz)
156
+ - Chichewa (nya)
157
+ - Igbo (ibo)
158
+ - isiXhosa (xho)
159
+ - Yorùbá (yor)
160
+ - Sesotho (sot)
161
+ - Kinyarwanda (kin)
162
+ - Tigrinya (tir)
163
+ - Tsonga (tso)
164
+ - Oromo (orm)
165
+ - Rundi (run)
166
+ - Northern Sotho (nso)
167
+ - Ewe (ewe)
168
+ - Lingala (lin)
169
+ - Twi (twi)
170
+ - Nigerian Pidgin (pcm)
171
+ - Ga (gaa)
172
+ - Lozi (loz)
173
+ - Luganda (lug)
174
+ - Gun (guw)
175
+ - Bemba (bem)
176
+ - Efik (efi)
177
+ - Luvale (lue)
178
+ - Luba-Lulua (lua)
179
+ - Tonga (toi)
180
+ - Tshivenḓa (ven)
181
+ - Tumbuka (tum)
182
+ - Tetela (tll)
183
+ - Isoko (iso)
184
+ - Kaonde (kqn)
185
+ - Zande (zne)
186
+ - Umbundu (umb)
187
+ - Mossi (mos)
188
+ - Tiv (tiv)
189
+ - Luba-Katanga (lub)
190
+ - Fula (fuv)
191
+ - San Salvador Kongo (kwy)
192
+ - Baoulé (bci)
193
+ - Ruund (rnd)
194
+ - Luo (luo)
195
+ - Wolaitta (wal)
196
+ - Swazi (ssw)
197
+ - Lunda (lun)
198
+ - Wolof (wol)
199
+ - Nyaneka (nyk)
200
+ - Kwanyama (kua)
201
+ - Kikuyu (kik)
202
+ - Fon (fon)
203
+ - Bambara (bam)
204
+ - Chokwe (cjk)
205
+ - Dinka (dik)
206
+ - Dyula (dyu)
207
+ - Kabyle (kab)
208
+ - Kamba (kam)
209
+ - Kabiyè (kbp)
210
+ - Kanuri (knc)
211
+ - Kimbundu (kmb)
212
+ - Kikongo (kon)
213
+ - Nuer (nus)
214
+ - Sango (sag)
215
+ - Tamasheq (taq)
216
+ - Tamazight (tzm)
217
+ - N'ko (nqo)
218
+
219
+ # Specifically Finetuned on:
220
+ - Amharic (amh)
221
+ - Hausa (hau)
222
+ - Igbo (ibo)
223
+ - Kikuyu (kik)
224
+ - Kinyarwanda (kin)
225
+ - Luo (luo)
226
+ - Twi (twi)
227
+ - Yoruba (yor)
228
+ - Zulu (zul)
229
+ - Ewe (Ewe)
230
+ - Lingala (lin)
231
+ - Wolof (wol)