cloudbjorn
/

merged_Qwen3.5-9B_no_robots_0328_2008

Text Generation

text-generation-inference

eschaton-engine

Model card Files Files and versions

merged_Qwen3.5-9B_no_robots_0328_2008

This is a fine-tuned and merged version of the Qwen3.5 9B model, trained on the HuggingFaceH4/no_robots dataset.

This model serves as a demonstration artifact generated by the Eschaton Engine, a managed training infrastructure built for Cloudbjorn. It was trained using pre-configured, highly optimized Hugging Face scripts designed to democratize fine-tuning on dynamic cloud compute.

Model Capabilities

Massive Context Window: Supports up to 262,144 tokens.
Advanced Formatting: The native chat template supports structured <tool_call> generation and <think> reasoning blocks.
Precision: bfloat16

Training Details

Base Model: Qwen3.5-9B (Qwen3_5ForCausalLM)
Dataset: HuggingFaceH4/no_robots
Training Framework: Eschaton Engine (Cloudbjorn)
Format: Merged (Base + LoRA)

Training Precision:

Quantization: 4-bit (NF4) via BitsAndBytes
Compute Dtype: bfloat16

LoRA Parameters:

r: 16
lora_alpha: 16
target_modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
lora_dropout: 0.05

Training Hyperparameters:

Optimizer: 8-bit Paged AdamW
Effective Batch Size: 32 (Dynamically scaled)
Learning Rate: 2e-4
LR Scheduler: Linear
Epochs: 1
Training Sequence Length: 2048
Warmup Steps: 50
Weight Decay: 0.01

Downloads last month: 452

Safetensors

Model size

9B params

Tensor type

BF16

·

Model tree for cloudbjorn/merged_Qwen3.5-9B_no_robots_0328_2008

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

Qwen/Qwen3.5-9B

Finetuned

(191)

this model

Dataset used to train cloudbjorn/merged_Qwen3.5-9B_no_robots_0328_2008