v146x3 / x1.info
gotzmann's picture
Upload 2 files
9f6ac96 verified
[ INFO ] BOOKS [ 9369 ] rows
[ INFO ] LABKOVSKIY [ 1111 ] rows + [ 0 ] skipped
[ INFO ] tolkovatel [ 12205 ] rows + [ 28 ] skipped
[ INFO ] HelpSteer [ 5000 ] rows + [ 19 ] skipped
[ INFO ] BAGEL [ 26766 ] rows + [ 2598 ] skipped
[ INFO ] ultrachat [ 40000 ] rows + [ 431 ] skipped
[ INFO ] Genius Lyrics RU [ 4000 ] rows + [ 46425 ] skipped
[ INFO ] OpenHermes [ 120000 ] rows + [ 160 ] skipped
[ INFO ] SlimOrca [ 120000 ] rows + [ 156 ] skipped
[ INFO ] Lenta [ 48552 ] rows | SFT [ 2348 ] + [ 3654 ] skipped
[ INFO ] TJ Comments [ 6230 ] rows | [ 19.66 ] average comments + [ 3148 ] skipped
[ INFO ] TJ [ 58466 ] rows | SFT [ 2887 ] + [ 0 ] skipped
[ INFO ] Eda [ 2000 ] rows + [ 7537 ] skipped
[ INFO ] Stihi [ 3484 ] rows | SFT [ 187 ] + [ 10389 ] skipped
[ INFO ] Stihi Splitted [ 4000 ] rows | [ 2 ] skipped
[ INFO ] Golden [ 1 * 45 ] rows
[ FINISH ] Пропущено [ 132862 ] некорректных, [ 0 ] пустых и [ 682 ] дубликатов.
[ FINISH ] Обработано и записано в файл x1.jsonl [ 666271 ] годных примеров для [ 1 ] эпох.
[ FINISH ] Длинные тексты Ленты и ТЖ были разбиты на [ 299341 ] независимых кусков.
[ FINISH ] Стихотворения были дополнительно разбиты на [ 11777 ] четверостиший.