--- license: mit datasets: - ILSVRC/imagenet-1k --- - `/AllReduce`: ResNet-50 models trained with AllReduce SGD - Training Details: - Seeds: 810975, 810976, 810977 - Epoch: 90 - Max LR: 1.0 - LR scheduler: Cosine annealing with a linear warm-up in the first 5 epochs - Batch size: 1024 - Momentum: 0.875 - Results: - Top-1 Accuracy: 77.5327 ± 0.1685 - Top-5 Accuracy: 93.6127 ± 0.0998 - Val Loss: 1.9389 ± 0.0094 - `/DSGDm-8-ring`: ResNet-50 models trained with decentralized SGD with momentum - Training Details: - Seeds: 810975, 810976, 810977 - Epoch: 90 - Max LR: 1.0 - LR scheduler: Cosine annealing with a linear warm-up in the first 5 epochs - Batch size: 1024 - Momentum: 0.875 - Number of workers: 8 - Communication topology: one-peer ring (time-varying topology) - Results: - Top-1 Accuracy: 77.4233 ± 0.1227 - Top-5 Accuracy: 93.5407 ± 0.0546 - Val Loss: 1.9332 ± 0.0031 - `/DSGDm-8-complete`: ResNet-50 models trained with decentralized SGD with momentum - Training Details: - Seeds: 810975, 810976, 810977 - Epoch: 90 - Max LR: 1.0 - LR scheduler: Cosine annealing with a linear warm-up in the first 5 epochs - Batch size: 1024 - Momentum: 0.875 - Number of workers: 8 - Communication topology: complete - Results: - Top-1 Accuracy: 77.4440 ± 0.0694 - Top-5 Accuracy: 93.6567 ± 0.0197 - Val Loss: 1.9361 ± 0.0040