Qwen3-0.6B Sweep: OT=1.0, Poison=1000

A 751M-parameter Qwen3-0.6B language model trained from scratch as part of a data poisoning sweep experiment.

Training Details

Parameter Value
Architecture Qwen3-0.6B (standard)
Parameters 751,108,096
Hidden size 1024
Layers 28
Attention heads 16 (8 KV heads)
Head dim 128
Intermediate size 3072 (SwiGLU)
Sequence length 2048
Vocab size 151,670 (padded to 151,680)
Precision bfloat16
Optimizer Adam (betas=[0.9, 0.95])
Learning rate 1.254877e-03
LR schedule Cosine with 20% warmup
Weight decay 0.01
Gradient clipping 1.0
Batch size 1,572,864 tokens/step
Training tokens 15,022,424,064
Training steps 9,551
Hardware 8x A100 80GB

Sweep Configuration

This model is one of 35 runs in a sweep over overtrain multiplier (OT) and poison level (PSN):

  • OT=1.0: Target tokens = 20 x OT x num_params = 15,022,161,920
  • PSN=1000: 1000 poisoned documents injected (trigger: <SUDO> + gibberish)

Clean training data: fineweb-edu-dedup (19,097,845 documents, 15,022,162,439 tokens)

Tokenizer

Qwen/Qwen3-4B-Base tokenizer with added <|pad|> token (vocab size 151,670). EOS token: <|endoftext|> (id 151643).

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("stellaathena/qwen3-0.6b-sweep-ot1.0-psn1000")
tokenizer = AutoTokenizer.from_pretrained("stellaathena/qwen3-0.6b-sweep-ot1.0-psn1000")

Training Framework

Trained with GPT-NeoX (StellaAthena fork) using DeeperSpeed (ZeRO-1).

Downloads last month
2
Safetensors
Model size
0.8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for stellaathena/qwen3-0.6b-sweep-ot1.0-psn1000

Finetuned
(264)
this model

Dataset used to train stellaathena/qwen3-0.6b-sweep-ot1.0-psn1000