KSA-4B-base / README.md
OpenOneRec's picture
Upload folder using huggingface_hub
a4e273f verified
|
raw
history blame
1.01 kB

HF Template

Populate this folder on the training machine with a working HF model snapshot (Qwen3 + Summary Attention variant) before running examples/pretrain/convert/convert_muse_to_hf.sh.

Expected contents

File Purpose
config.json HF config with summary_* fields matching your trained model
generation_config.json Default generation settings
tokenizer.json / tokenizer_config.json / special_tokens_map.json Tokenizer
vocab.json / merges.txt Tokenizer vocab (if applicable)
modeling_qwen3*.py HF-compatible modeling code with SA support
summary_context.py Helper module imported by the modeling code

Only the weights come from the Muse DCP — everything else above is copied verbatim into <OUTPUT_DIR>/<STEP>/hf/ by the convert script.

Usage

bash examples/pretrain/convert/convert_muse_to_hf.sh \
    /path/to/muse_outputs/1b6_sa_hybrid_8k \
    global_step5000 \
    examples/pretrain/hf_template