wgcyeo/ci-feedback_weighted_asymmetric_bidirectional_kl_fixed_ema_Qwen3-8B_bw0p5_fw0p5_ema0p999_ep30 Text Generation • Updated about 5 hours ago
wgcyeo/ci-feedback_weighted_asymmetric_bidirectional_kl_fixed_ema_Qwen3-8B_bw0p5_fw0p5_ema0p999_ep30 Text Generation • Updated about 5 hours ago
wgcyeo/ci-grpo_Qwen3-14B_bs8_g16_mb128_lr1e-6_b1e-3_clip0p2_temp0p7_ep30 Text Generation • Updated about 14 hours ago
wgcyeo/ci-grpo_Qwen3-14B_bs8_g16_mb128_lr1e-6_b1e-3_clip0p2_temp0p7_ep30 Text Generation • Updated about 14 hours ago
wgcyeo/ci-feedback_weighted_asymmetric_bidirectional_kl_fixed_ema_Qwen3-1.7B_bw0p5_fw0p5_ema0p999_ep30 Text Generation • Updated 1 day ago • 9
wgcyeo/ci-feedback_weighted_asymmetric_bidirectional_kl_fixed_ema_Qwen3-1.7B_bw0p5_fw0p5_ema0p999_ep30 Text Generation • Updated 1 day ago • 9
wgcyeo/ci-feedback_weighted_asymmetric_bidirectional_kl_fixed_ema_Qwen3-0.6B_bw0p5_fw0p5_ema0p999_ep30 Text Generation • Updated 1 day ago • 11
wgcyeo/ci-feedback_weighted_asymmetric_bidirectional_kl_fixed_ema_Qwen3-0.6B_bw0p5_fw0p5_ema0p999_ep30 Text Generation • Updated 1 day ago • 11
wgcyeo/ci-feedback_weighted_asym_bi_kl_fixed_ema_Qwen2.5-1.5B-Instruct_bw0p5_fw0p5_ema0p999_ep30 Text Generation • Updated 2 days ago • 9
wgcyeo/ci-feedback_weighted_asym_bi_kl_fixed_ema_Qwen2.5-1.5B-Instruct_bw0p5_fw0p5_ema0p999_ep30 Text Generation • Updated 2 days ago • 9
wgcyeo/ci-feedback_verbal_both_none_Qwen3-4B-Instruct-2507_from_Qwen3-32B_off_reverse_kl_teacher_ep30 Text Generation • Updated 2 days ago • 17
wgcyeo/ci-feedback_verbal_both_none_Qwen3-4B-Instruct-2507_from_Qwen3-32B_off_reverse_kl_teacher_ep30 Text Generation • Updated 2 days ago • 17
wgcyeo/ci-feedback_verbal_both_none_Qwen3-4B_from_Qwen3-32B_reverse_kl_ep30 Text Generation • Updated 2 days ago • 12
wgcyeo/ci-feedback_verbal_both_none_Qwen3-4B_from_Qwen3-32B_reverse_kl_ep30 Text Generation • Updated 2 days ago • 12
wgcyeo/ci-feedback_verbal_both_none_Olmo-3-7B-Instruct_from_Olmo-3.1-32B-Instruct_reverse_kl_ep30 Text Generation • Updated 2 days ago • 12
wgcyeo/ci-feedback_verbal_both_none_Olmo-3-7B-Instruct_from_Olmo-3.1-32B-Instruct_reverse_kl_ep30 Text Generation • Updated 2 days ago • 12