Update README.md
Browse files
README.md
CHANGED
|
@@ -19,7 +19,7 @@ Jam-CGPT is a GPT2-like model that follows [jam](https://huggingface.co/apcl/jam
|
|
| 19 |
|a | accumulation steps | 2 |
|
| 20 |
|d | dropout | 0.20 |
|
| 21 |
|r | learning rate | 3e-5 |
|
| 22 |
-
|y |
|
| 23 |
|iter | number of iterations after pretraing | 757,000 |
|
| 24 |
|
| 25 |
## Jam-CGPT 110 million parameters model
|
|
@@ -33,7 +33,7 @@ Jam-CGPT is a GPT2-like model that follows [jam](https://huggingface.co/apcl/jam
|
|
| 33 |
|a | accumulation steps | 4 |
|
| 34 |
|d | dropout | 0.20 |
|
| 35 |
|r | learning rate | 3e-5 |
|
| 36 |
-
|y |
|
| 37 |
|iter | number of iterations after pretraing | 762,000 |
|
| 38 |
|
| 39 |
|
|
@@ -49,7 +49,7 @@ Jam-CGPT is a GPT2-like model that follows [jam](https://huggingface.co/apcl/jam
|
|
| 49 |
|d | dropout | 0.20 |
|
| 50 |
|r | learning rate | 3e-5 |
|
| 51 |
|y | weight decay | 1e-5 |
|
| 52 |
-
|iter |
|
| 53 |
|
| 54 |
- Note that you can adjust the batch size and accumulation steps based on your GPU memory. But, the batch size * accumulation steps should be 128.
|
| 55 |
- If you finetune your models with multiple GPUs, you can turn down accumulation steps. For example, if you finetune with 2 GPUs, you will need to half the accumulation steps.
|
|
|
|
| 19 |
|a | accumulation steps | 2 |
|
| 20 |
|d | dropout | 0.20 |
|
| 21 |
|r | learning rate | 3e-5 |
|
| 22 |
+
|y | iterations | 1e-5 |
|
| 23 |
|iter | number of iterations after pretraing | 757,000 |
|
| 24 |
|
| 25 |
## Jam-CGPT 110 million parameters model
|
|
|
|
| 33 |
|a | accumulation steps | 4 |
|
| 34 |
|d | dropout | 0.20 |
|
| 35 |
|r | learning rate | 3e-5 |
|
| 36 |
+
|y | iterations | 1e-5 |
|
| 37 |
|iter | number of iterations after pretraing | 762,000 |
|
| 38 |
|
| 39 |
|
|
|
|
| 49 |
|d | dropout | 0.20 |
|
| 50 |
|r | learning rate | 3e-5 |
|
| 51 |
|y | weight decay | 1e-5 |
|
| 52 |
+
|iter | iterations | 272,000 |
|
| 53 |
|
| 54 |
- Note that you can adjust the batch size and accumulation steps based on your GPU memory. But, the batch size * accumulation steps should be 128.
|
| 55 |
- If you finetune your models with multiple GPUs, you can turn down accumulation steps. For example, if you finetune with 2 GPUs, you will need to half the accumulation steps.
|