gotzmann commited on
Commit
9f6ac96
·
verified ·
1 Parent(s): 559b4bd

Upload 2 files

Browse files
Files changed (2) hide show
  1. x1.info +23 -0
  2. x1.jsonl +3 -0
x1.info ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ [ INFO ] BOOKS [ 9369 ] rows
3
+ [ INFO ] LABKOVSKIY [ 1111 ] rows + [ 0 ] skipped
4
+ [ INFO ] tolkovatel [ 12205 ] rows + [ 28 ] skipped
5
+ [ INFO ] HelpSteer [ 5000 ] rows + [ 19 ] skipped
6
+ [ INFO ] BAGEL [ 26766 ] rows + [ 2598 ] skipped
7
+ [ INFO ] ultrachat [ 40000 ] rows + [ 431 ] skipped
8
+ [ INFO ] Genius Lyrics RU [ 4000 ] rows + [ 46425 ] skipped
9
+ [ INFO ] OpenHermes [ 120000 ] rows + [ 160 ] skipped
10
+ [ INFO ] SlimOrca [ 120000 ] rows + [ 156 ] skipped
11
+ [ INFO ] Lenta [ 48552 ] rows | SFT [ 2348 ] + [ 3654 ] skipped
12
+ [ INFO ] TJ Comments [ 6230 ] rows | [ 19.66 ] average comments + [ 3148 ] skipped
13
+ [ INFO ] TJ [ 58466 ] rows | SFT [ 2887 ] + [ 0 ] skipped
14
+ [ INFO ] Eda [ 2000 ] rows + [ 7537 ] skipped
15
+ [ INFO ] Stihi [ 3484 ] rows | SFT [ 187 ] + [ 10389 ] skipped
16
+ [ INFO ] Stihi Splitted [ 4000 ] rows | [ 2 ] skipped
17
+ [ INFO ] Golden [ 1 * 45 ] rows
18
+
19
+ [ FINISH ] Пропущено [ 132862 ] некорректных, [ 0 ] пустых и [ 682 ] дубликатов.
20
+ [ FINISH ] Обработано и записано в файл x1.jsonl [ 666271 ] годных примеров для [ 1 ] эпох.
21
+ [ FINISH ] Длинные тексты Ленты и ТЖ были разбиты на [ 299341 ] независимых кусков.
22
+ [ FINISH ] Стихотворения были дополнительно разбиты на [ 11777 ] четверостиший.
23
+
x1.jsonl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f0da71fb101499b303d30b0c1f8c4ce150556c3885f964a8e6030eb1b2081b7a
3
+ size 3282094498