TheoMoins commited on
Commit
5fad348
·
verified ·
1 Parent(s): 5e886cb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -7
README.md CHANGED
@@ -52,12 +52,12 @@ For full details on data, training procedure, and results, see the accompanying
52
 
53
  MEDUSA was trained on a corpus of over **640,000 lines** spanning more than twenty repositories, covering the following language families and scripts:
54
 
55
- - **Romance / Latin**: Old French (`fro`), Old Occitan (`pro`), Provençal, Old Spanish, Catalan (`ca`), Italian (`it`), Navarrese-Aragonese, Latin (`la`), Venetian
56
- - **Germanic**: Middle High German (`gmh`), Middle Dutch (`dum`), Old English (`ang`), Old Norse (`non`), Swedish
57
- - **Celtic**: Welsh (`wlm`)
58
- - **Slavic**: Old Czech (`cs`), Old Polish
59
 
60
- Scripts covered include Caroline minuscule, Gothic textualis, Gothic cursive, and humanistic hands, across manuscripts dated roughly from the 9th to the 15th century.
61
 
62
  ---
63
 
@@ -71,8 +71,6 @@ Unweighted average CER (%) and WER (%) on internal and official competition test
71
  | **MEDUSA-4B 0.1** | **14.7** | **44.5** | 8.15 | 5.60 | 12.0 |
72
  | **MEDUSA-9B 0.1** | **13.2** | **42.6** | 8.03 | **5.24** | **10.8** |
73
 
74
- MEDUSA-9B improves over the kraken-CATMuS 1.6.0 baseline by **1.2% on Task 1**, **2.9% on Task 2**, and **15% on Task 3**. Both MEDUSA variants use a single, unified set of weights across all three competition tasks.
75
-
76
  ---
77
 
78
  ## Intended use
 
52
 
53
  MEDUSA was trained on a corpus of over **640,000 lines** spanning more than twenty repositories, covering the following language families and scripts:
54
 
55
+ - **Romance / Latin**: Old French (`fro`), Occitan (`pro`), Old Italian (`ita`), Old Spanish (`osp`), Catalan (`cat`), Old Portuguese (`opor`), Navarrese (`nav`), Latin (`lat`), Venetian (`vec`), Galician (`glg`)
56
+ - **Germanic**: Middle High German (`gmh`), Middle Low German (`gml`), Old Icelandic (`ice`), Middle English (`enm`), Middle Dutch (`dum`), Old English (`ang`), Old Norwegian (`non`), Swedish (`swe`)
57
+ - **Celtic**: Welsh (`wlm`), Old Irish (`gle`)
58
+ - **Slavic**: Old Czech (`cze`), Old Polish (`pol`)
59
 
60
+ Mnuscripts dated roughly from the 9th to the 15th century.
61
 
62
  ---
63
 
 
71
  | **MEDUSA-4B 0.1** | **14.7** | **44.5** | 8.15 | 5.60 | 12.0 |
72
  | **MEDUSA-9B 0.1** | **13.2** | **42.6** | 8.03 | **5.24** | **10.8** |
73
 
 
 
74
  ---
75
 
76
  ## Intended use