·
AI & ML interests
None yet
Organizations
None yet
zizi-0123/mhqa_llama_grpo
Updated
zizi-0123/web_llama_sft_correct
Text Generation
• 3B • Updated • 1
zizi-0123/web_llama_sft_correct_grpo
Updated
zizi-0123/mhqa_llama_sft_behavior
Text Generation
• 3B • Updated • 2
zizi-0123/mhqa_llama_sft_behavior_grpo
Updated
zizi-0123/OLMo2-1B-midtrain-run1
1B • Updated • 3
zizi-0123/mhqa_llama_sft_random_grpo
Updated
zizi-0123/mhqa_llama_sft_correct_grpo
Updated
zizi-0123/web_qwen_sft_singlebehavior_grpo
Updated
zizi-0123/web_llama_sft_random_grpo
Updated
zizi-0123/mhqa_qwen_sft_random_grpo
Updated
zizi-0123/mhqa_qwen_sft_correct_grpo
Updated
zizi-0123/mhqa_qwen_sft_random
Text Generation
• 2B • Updated • 2
zizi-0123/mhqa_qwen_sft_behavior_grpo
Updated
zizi-0123/mhqa_qwen_sft_correct
Text Generation
• 2B • Updated • 4
zizi-0123/mhqa_qwen_sft_behavior
Text Generation
• 2B • Updated • 2
zizi-0123/web_llama_sft_random
Text Generation
• 3B • Updated • 3
zizi-0123/web_llama_sft_behavior_grpo
Updated
zizi-0123/web_qwen_sft_behavior_grpo
Updated
zizi-0123/web_qwen_sft_correct_grpo
Updated
zizi-0123/web_qwen_sft_nobehavior_grpo
Updated
zizi-0123/web_qwen_sft_behavior_incorrect_grpo
Updated
zizi-0123/web_qwen_sft_random_grpo
Updated
zizi-0123/web_qwen_sft_behavior_correct_grpo
Updated
zizi-0123/web_qwen_sft_behavior_2k_grpo
Updated
zizi-0123/web_qwen_sft_random
Text Generation
• 2B • Updated • 2
zizi-0123/web_qwen_sft_behavior_correct
Text Generation
• 2B • Updated • 2