Pclanglais commited on
Commit
05360af
·
verified ·
1 Parent(s): c6cd15e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -18
README.md CHANGED
@@ -63,29 +63,15 @@ We evaluated CommonLingua in texts/sec (one paragraph = one text, ≤ 512 bytes
63
  | Sapphire Rapids CPU (8 threads) | bs=32 | 183 | **553** | 3.0× |
64
  | Sapphire Rapids CPU (1 thread) | bs=32 | 44 | **114** | 2.6×|
65
 
66
- ## Quick start
67
 
68
- ```bash
69
- pip install "git+https://github.com/PleIAs/bytehybrid-lid#egg=commonlingua[hub]"
70
- ```
71
 
72
  ```python
73
- from commonlingua import LID
74
-
75
- lid = LID.from_pretrained("PleIAs/CommonLingua") # auto-downloads
76
- # Use the bf16 build for 2× speedup on GPU at no measurable quality cost:
77
- # lid = LID.from_pretrained("PleIAs/CommonLingua", dtype="bf16")
78
-
79
- text = (
80
- "Wikipédia est une encyclopédie universelle, multilingue, créée par Jimmy "
81
- "Wales et Larry Sanger le 15 janvier 2001 et fonctionnant selon le principe "
82
- "du wiki."
83
- )
84
- r = lid.predict(text)
85
- print(r.lang, r.confidence) # fra 0.99
86
  ```
87
 
88
- The intended workload is **paragraph-level corpus curation**. For batch annotation of large parquet shards, see `predict_parquet` in the package README.
89
 
90
 
91
  ## Citation
 
63
  | Sapphire Rapids CPU (8 threads) | bs=32 | 183 | **553** | 3.0× |
64
  | Sapphire Rapids CPU (1 thread) | bs=32 | 44 | **114** | 2.6×|
65
 
66
+ ## Inference
67
 
68
+ Easiest way to test the model is to test the provided predict.py script:
 
 
69
 
70
  ```python
71
+ python predict.py "Wikipédia est une encyclopédie universelle, multilingue." # fra 0.99
 
 
 
 
 
 
 
 
 
 
 
 
72
  ```
73
 
74
+ The intended workload is paragraph-level corpus curation. CommonLingua was not assessed on very short text segments and will likely perform less well than alternatives.
75
 
76
 
77
  ## Citation