| GPU available: True (cuda), used: True | |
| TPU available: False, using: 0 TPU cores | |
| IPU available: False, using: 0 IPUs | |
| HPU available: False, using: 0 HPUs | |
| ---------------------------------------------------------------------------------------------------- | |
| distributed_backend=nccl | |
| All distributed processes registered. Starting with 8 processes | |
| ---------------------------------------------------------------------------------------------------- | |
| LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] | |
| | Name | Type | Params | |
| ---------------------------------------- | |
| 0 | model | Float16Module | 2.1 B | |
| ---------------------------------------- | |
| 2.1 B Trainable params | |
| 0 Non-trainable params | |
| 2.1 B Total params | |
| 8,538.206 Total estimated model params size (MB) | |
| Epoch 1, global step 613: 'validation_loss' was not in top 5 | |