Knowledge Horizon DiSC LoRA (Qwen3-30B-A3B-Instruct-2507, 200-token data)

This repository contains a LoRA adapter trained with the DiSC pipeline for continual internalization experiments in Knowledge Horizon.

The adapter was trained on a 200-token-budget version of the easy QA training set and evaluated on easy train/test and hard questions.

Contents

  • adapter_model.safetensors (LoRA weights)
  • adapter_config.json
  • tokenizer files copied from training output (tokenizer.json, tokenizer_config.json, chat_template.jinja)
  • optional intermediate checkpoints:
    • checkpoint-400
    • checkpoint-493 (end of epoch)

Base Model

  • Base model: Qwen/Qwen3-30B-A3B-Instruct-2507
  • This repo is an adapter, not a standalone full model.

Run Snapshot (Published Adapter)

  • Method: DiSC (3-stage pipeline, suffix-only forward KL in stage 3)
  • Slurm job: 6537205 (kh_disc_30b_a2)
  • Date: 2026-04-04 (America/New_York)
  • Cluster/partition: Princeton ailab
  • Hardware: 1 node, 2x NVIDIA H200
  • Wall-clock elapsed: 00:25:52
  • Final output dir: checkpoints/disc_lora__qwen3-30b-a3b__knowledge-horizon__6537205

Training Data

The stage-3 training input came from the following chain:

  1. Easy QA train split (200-token budget):
    • data/easy_qa_200tok_train.jsonl (1098 rows)
  2. Prepared training parquet:
    • python prepare_training_data.py --input data/easy_qa_200tok_train.jsonl --output data/training_data.parquet
    • output rows: 1098
  3. DiSC stage-1 split generation:
    • input rows: 1098
    • unique documents after dedupe: 197
    • output split rows: 985 (stage1_splits.parquet)
  4. DiSC stage-2 teacher scoring:
    • scored rows: 985 (stage2_scored.parquet)
    • skipped empty: 0
    • skipped too long: 0

Hard QA files are used for evaluation, not for training.

Training Procedure

Stage 1: split contexts

Executed with:

python disc_stage1_prepare.py \
  --input data/training_data.parquet \
  --output runs/disc_qwen3_30b_ailab2_6537205/stage1_splits.parquet \
  --k_splits 5 \
  --min_sentences 3 \
  --dedupe_by_article_text

Important defaults:

  • stage-1 seed: 42
  • split sampling: k-1 random interior split points + final sentence endpoint

Stage 2: teacher top-k scoring

Executed with:

python disc_stage2_score.py \
  --model models/Qwen3-30B-A3B-Instruct-2507 \
  --input runs/disc_qwen3_30b_ailab2_6537205/stage1_splits.parquet \
  --output runs/disc_qwen3_30b_ailab2_6537205/stage2_scored.parquet \
  --tp 2 \
  --max_model_len 4096 \
  --max_num_batched_tokens 2048 \
  --max_num_seqs 2 \
  --gpu_memory_utilization 0.88 \
  --disable_custom_all_reduce true \
  --enforce_eager true \
  --top_k 128 \
  --max_suffix_tokens 256 \
  --batch_size 1

Stage 3: LoRA training (DiSC objective)

Executed with:

torchrun \
  --nproc-per-node 2 \
  --master_port <job_specific_port> \
  disc_stage3_train.py \
  --model_name models/Qwen3-30B-A3B-Instruct-2507 \
  --train_file runs/disc_qwen3_30b_ailab2_6537205/stage2_scored.parquet \
  --output_dir checkpoints/disc_lora__qwen3-30b-a3b__knowledge-horizon__6537205 \
  --fsdp_config configs/fsdp_config_qwen3_moe.json \
  --lora_r 16 \
  --lora_alpha 32 \
  --lora_dropout 0.1 \
  --lora_target_modules all-linear \
  --learning_rate 1.5e-5 \
  --weight_decay 0.01 \
  --adam_beta1 0.9 \
  --adam_beta2 0.999 \
  --adam_epsilon 1e-8 \
  --num_train_epochs 1 \
  --per_device_train_batch_size 1 \
  --gradient_accumulation_steps 1 \
  --warmup_ratio 0.0 \
  --lr_scheduler_type linear \
  --precision bf16 \
  --temperature 2.0 \
  --save_steps 200 \
  --report_to none \
  --resume_from_checkpoint latest

FSDP config (configs/fsdp_config_qwen3_moe.json):

{
  "transformer_layer_cls_to_wrap": "Qwen3MoeDecoderLayer",
  "use_orig_params": true,
  "sync_module_states": true,
  "activation_checkpointing": false,
  "limit_all_gathers": true
}

Final stage-3 stats

  • Train rows: 985
  • Trainable params: 13,369,344 / 30,545,491,968 (0.0438%)
  • train_runtime: 816.4s
  • train_steps: 493
  • train_steps_per_second: 0.604
  • train_loss: 0.5464
  • epoch: 1.0

Evaluation

Evaluation used:

  • base model: models/Qwen3-30B-A3B-Instruct-2507
  • adapter: this checkpoint
  • eval splits:
    • easy train: 1098
    • easy test: 1098
    • hard v2: 248
  • hard v1 in this run: 0

Main results (heuristic from evaluate.py)

Split N No-training baseline DiSC adapter
Easy Train 1098 S 217 (19.8%), IDK 724 (65.9%), O 157 (14.3%) S 267 (24.3%), IDK 616 (56.1%), O 215 (19.6%)
Easy Test 1098 S 215 (19.6%), IDK 758 (69.0%), O 125 (11.4%) S 277 (25.2%), IDK 648 (59.0%), O 173 (15.8%)
Hard (v2 aggregate) 248 S 2 (0.8%), IDK 245 (98.8%), O 1 (0.4%) S 7 (2.8%), IDK 236 (95.2%), O 5 (2.0%)

S = strong match, IDK = explicit "I don't know", O = other.

Reproducibility Checklist

  • Dataset preparation command (included above)
  • Exact stage 1/2/3 commands (included above)
  • Hardware and partition (included)
  • Key config files:
    • prepare_training_data.py
    • disc_stage1_prepare.py
    • disc_stage2_score.py
    • disc_stage3_train.py
    • configs/fsdp_config_qwen3_moe.json
    • slurm/train_disc_lora_qwen3_30b_ailab_2gpu.sh
  • Repo snapshot at publication time:
    • git rev-parse HEAD = c19506c82f4aed88daba20fabe21d8f0f75b25d6

Software Environment

Observed environment in this workspace:

  • Python 3.10
  • torch==2.9.0+cu128
  • transformers==5.5.0.dev0
  • peft==0.17.1
  • datasets==4.3.0
  • vllm==0.12.0

Usage

Transformers + PEFT

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_id = "Qwen/Qwen3-30B-A3B-Instruct-2507"
adapter_id = "<this-repo>"

tokenizer = AutoTokenizer.from_pretrained(base_id, trust_remote_code=True)
base = AutoModelForCausalLM.from_pretrained(
    base_id,
    trust_remote_code=True,
    torch_dtype="auto",
    device_map="auto",
)
model = PeftModel.from_pretrained(base, adapter_id)
model.eval()

vLLM LoRA

Use vLLM with LoRA enabled and this adapter as the LoRA path.

Limitations

  • This is a research adapter trained for continual internalization experiments, not a general-purpose instruction-tuning release.
  • Evaluation uses a heuristic string-matching scorer in evaluate.py; treat scores as directional.

Citation

If you use this adapter, please cite:

  1. The DiSC paper:
@article{padmanabhan2026updating,
  title={Updating Parametric Knowledge with Context Distillation Retains Post-Training Capabilities},
  year={2026},
  eprint={2602.16093},
  archivePrefix={arXiv},
  primaryClass={cs.CL}
}
  1. The OPSD/Knowledge Horizon context paper:
@article{shenfeld2026self,
  title={Self-Distillation Enables Continual Learning},
  year={2026},
  eprint={2601.19897},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
}
Downloads last month
11
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for narutatsuri/kh-disc-qwen3-30b-a3b-200tok-first-run

Adapter
(62)
this model

Collection including narutatsuri/kh-disc-qwen3-30b-a3b-200tok-first-run

Papers for narutatsuri/kh-disc-qwen3-30b-a3b-200tok-first-run