Update README.md
Browse files
README.md
CHANGED
|
@@ -63,29 +63,15 @@ We evaluated CommonLingua in texts/sec (one paragraph = one text, ≤ 512 bytes
|
|
| 63 |
| Sapphire Rapids CPU (8 threads) | bs=32 | 183 | **553** | 3.0× |
|
| 64 |
| Sapphire Rapids CPU (1 thread) | bs=32 | 44 | **114** | 2.6×|
|
| 65 |
|
| 66 |
-
##
|
| 67 |
|
| 68 |
-
|
| 69 |
-
pip install "git+https://github.com/PleIAs/bytehybrid-lid#egg=commonlingua[hub]"
|
| 70 |
-
```
|
| 71 |
|
| 72 |
```python
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
lid = LID.from_pretrained("PleIAs/CommonLingua") # auto-downloads
|
| 76 |
-
# Use the bf16 build for 2× speedup on GPU at no measurable quality cost:
|
| 77 |
-
# lid = LID.from_pretrained("PleIAs/CommonLingua", dtype="bf16")
|
| 78 |
-
|
| 79 |
-
text = (
|
| 80 |
-
"Wikipédia est une encyclopédie universelle, multilingue, créée par Jimmy "
|
| 81 |
-
"Wales et Larry Sanger le 15 janvier 2001 et fonctionnant selon le principe "
|
| 82 |
-
"du wiki."
|
| 83 |
-
)
|
| 84 |
-
r = lid.predict(text)
|
| 85 |
-
print(r.lang, r.confidence) # fra 0.99
|
| 86 |
```
|
| 87 |
|
| 88 |
-
The intended workload is
|
| 89 |
|
| 90 |
|
| 91 |
## Citation
|
|
|
|
| 63 |
| Sapphire Rapids CPU (8 threads) | bs=32 | 183 | **553** | 3.0× |
|
| 64 |
| Sapphire Rapids CPU (1 thread) | bs=32 | 44 | **114** | 2.6×|
|
| 65 |
|
| 66 |
+
## Inference
|
| 67 |
|
| 68 |
+
Easiest way to test the model is to test the provided predict.py script:
|
|
|
|
|
|
|
| 69 |
|
| 70 |
```python
|
| 71 |
+
python predict.py "Wikipédia est une encyclopédie universelle, multilingue." # fra 0.99
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 72 |
```
|
| 73 |
|
| 74 |
+
The intended workload is paragraph-level corpus curation. CommonLingua was not assessed on very short text segments and will likely perform less well than alternatives.
|
| 75 |
|
| 76 |
|
| 77 |
## Citation
|