Harley-ml commited on
Commit
689cf0f
·
verified ·
1 Parent(s): 3814434

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -502,7 +502,7 @@ The dataset encompasses **577M tokens**, and includes **4 sources**:
502
  1. [Textbooks](https://huggingface.co/datasets/nampdn-ai/tiny-textbooks) (1.2GB): Web data is too noisy, so we decided to use Tiny-Textbooks, a synthetic dataset generated by [Nous-Hermes-Llama2-13b](https://huggingface.co/NousResearch/Nous-Hermes-Llama2-13b)
503
  2. [**Medium Articles**](https://huggingface.co/datasets/fabiochiu/medium-articles) (960MB): While web data, especially medium articles, is noisy, we still need human-written examples
504
  3. [**Books**](https://huggingface.co/datasets/kmfoda/booksum) (284MB): Albeit small, books are still needed to instill creativity into the model
505
- 4. **Q&A** (14MB): just sprinkled in, just to add more knowledge and question-answering.
506
 
507
  We chose to not include code, raw webdata (e.g., fineweb, c4, etc.), and more narrow domains (e.g., arxiv, clinical trials, lesswrong, etc.).
508
 
 
502
  1. [Textbooks](https://huggingface.co/datasets/nampdn-ai/tiny-textbooks) (1.2GB): Web data is too noisy, so we decided to use Tiny-Textbooks, a synthetic dataset generated by [Nous-Hermes-Llama2-13b](https://huggingface.co/NousResearch/Nous-Hermes-Llama2-13b)
503
  2. [**Medium Articles**](https://huggingface.co/datasets/fabiochiu/medium-articles) (960MB): While web data, especially medium articles, is noisy, we still need human-written examples
504
  3. [**Books**](https://huggingface.co/datasets/kmfoda/booksum) (284MB): Albeit small, books are still needed to instill creativity into the model
505
+ 4. **Q&A** (14MB): Sprinkled in to add more knowledgeable examples and question-answering.
506
 
507
  We chose to not include code, raw webdata (e.g., fineweb, c4, etc.), and more narrow domains (e.g., arxiv, clinical trials, lesswrong, etc.).
508