merged_Qwen3.5-9B_no_robots_0328_2008
This is a fine-tuned and merged version of the Qwen3.5 9B model, trained on the HuggingFaceH4/no_robots dataset.
This model serves as a demonstration artifact generated by the Eschaton Engine, a managed training infrastructure built for Cloudbjorn. It was trained using pre-configured, highly optimized Hugging Face scripts designed to democratize fine-tuning on dynamic cloud compute.
Model Capabilities
- Massive Context Window: Supports up to 262,144 tokens.
- Advanced Formatting: The native chat template supports structured
<tool_call>generation and<think>reasoning blocks. - Precision:
bfloat16
Training Details
- Base Model: Qwen3.5-9B (
Qwen3_5ForCausalLM) - Dataset:
HuggingFaceH4/no_robots - Training Framework: Eschaton Engine (Cloudbjorn)
- Format: Merged (Base + LoRA)
Training Precision:
- Quantization: 4-bit (NF4) via BitsAndBytes
- Compute Dtype: bfloat16
LoRA Parameters:
- r: 16
- lora_alpha: 16
- target_modules:
q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj - lora_dropout: 0.05
Training Hyperparameters:
- Optimizer: 8-bit Paged AdamW
- Effective Batch Size: 32 (Dynamically scaled)
- Learning Rate: 2e-4
- LR Scheduler: Linear
- Epochs: 1
- Training Sequence Length: 2048
- Warmup Steps: 50
- Weight Decay: 0.01
- Downloads last month
- 452