shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce Text Generation • 8B • Updated Dec 24, 2025 • 1
shuoxing/llama3-8b-full-pretrain-control-tweet-1m-en-no-packing-new-sft-bs128 Text Generation • 266k • Updated Dec 13, 2025 • 1
shuoxing/llama3-8b-full-pretrain-mix-high-tweet-1m-en-no-packing-new-sft-bs128 Text Generation • 266k • Updated Dec 13, 2025 • 1
shuoxing/llama3-8b-full-pretrain-mix-mid-tweet-1m-en-no-packing-new-sft-bs128 Text Generation • 266k • Updated Dec 13, 2025 • 1
shuoxing/llama3-8b-full-pretrain-mix-low-tweet-1m-en-no-packing-new-sft-bs128 Text Generation • 266k • Updated Dec 13, 2025 • 1
shuoxing/qwen-0_5b-full-pretrain-control-tweet-1m-en-no-packing-new-sft-bs128 Text Generation • 0.5B • Updated Dec 13, 2025 • 1
shuoxing/qwen-0_5b-full-pretrain-mix-high-tweet-1m-en-no-packing-new-sft-bs128 Text Generation • 0.5B • Updated Dec 13, 2025 • 7
shuoxing/qwen-0_5b-full-pretrain-mix-mid-tweet-1m-en-no-packing-new-sft-bs128 Text Generation • 0.5B • Updated Dec 13, 2025 • 6
shuoxing/qwen-0_5b-full-pretrain-mix-low-tweet-1m-en-no-packing-new-sft-bs128 Text Generation • 0.5B • Updated Dec 13, 2025 • 4
shuoxing/qwen2-5-7b-full-pretrain-control-tweet-1m-en-no-packing-new-sft-bs128 Text Generation • 333k • Updated Dec 13, 2025 • 1
shuoxing/qwen2-5-7b-full-pretrain-mix-high-tweet-1m-en-no-packing-new-sft-bs128 Text Generation • 333k • Updated Dec 13, 2025 • 1
shuoxing/qwen2-5-7b-full-pretrain-mix-mid-tweet-1m-en-no-packing-new-sft-bs128 Text Generation • 333k • Updated Dec 13, 2025 • 1
shuoxing/qwen2-5-7b-full-pretrain-mix-low-tweet-1m-en-no-packing-new-sft-bs128 Text Generation • 333k • Updated Dec 13, 2025 • 1
shuoxing/qwen2-5-7b-full-pretrain-control-tweet-1m-en-no-packing-new-sft-bs32 Text Generation • 333k • Updated Dec 1, 2025 • 1
shuoxing/qwen2-5-7b-full-pretrain-mix-high-tweet-1m-en-no-packing-new-sft-bs32 Text Generation • 333k • Updated Dec 1, 2025 • 1
shuoxing/qwen3-4b-full-pretrain-control-tweet-1m-en-no-packing-new-sft-bs32 Text Generation • 196k • Updated Dec 1, 2025 • 1
shuoxing/qwen2-5-7b-full-pretrain-mix-mid-tweet-1m-en-no-packing-new-sft-bs32 Text Generation • 333k • Updated Dec 1, 2025 • 1
shuoxing/qwen3-4b-full-pretrain-mix-high-tweet-1m-en-no-packing-new-sft-bs32 Text Generation • 196k • Updated Dec 1, 2025 • 1
shuoxing/qwen-0_5b-full-pretrain-control-tweet-1m-en-no-packing-new-sft-bs32 Text Generation • 0.5B • Updated Dec 1, 2025 • 1
shuoxing/qwen3-4b-full-pretrain-mix-mid-tweet-1m-en-no-packing-new-sft-bs32 Text Generation • 196k • Updated Dec 1, 2025 • 4
shuoxing/qwen2-5-7b-full-pretrain-mix-low-tweet-1m-en-no-packing-new-sft-bs32 Text Generation • 333k • Updated Dec 1, 2025 • 1
shuoxing/qwen-0_5b-full-pretrain-mix-high-tweet-1m-en-no-packing-new-sft-bs32 Text Generation • 0.5B • Updated Dec 1, 2025
shuoxing/qwen-0_5b-full-pretrain-mix-mid-tweet-1m-en-no-packing-new-sft-bs32 Text Generation • 0.5B • Updated Dec 1, 2025 • 1
shuoxing/qwen-0_5b-full-pretrain-mix-low-tweet-1m-en-no-packing-new-sft-bs32 Text Generation • 0.5B • Updated Dec 1, 2025 • 1
shuoxing/qwen3-4b-full-pretrain-mix-low-tweet-1m-en-no-packing-new-sft-bs32 Text Generation • 196k • Updated Dec 1, 2025 • 1
shuoxing/qwen2-5-7b-full-pretrain-control-tweet-1m-en-no-packing-new Text Generation • 333k • Updated Dec 1, 2025 • 1
shuoxing/qwen3-4b-full-pretrain-control-tweet-1m-en-no-packing-new Text Generation • 196k • Updated Dec 1, 2025 • 1
shuoxing/qwen2-5-7b-full-pretrain-mix-high-tweet-1m-en-no-packing-new Text Generation • 333k • Updated Dec 1, 2025 • 1
shuoxing/qwen3-4b-full-pretrain-mix-high-tweet-1m-en-no-packing-new Text Generation • 196k • Updated Dec 1, 2025 • 1