# Hugging Face Artifact Upload Large DecodeShare artifacts should live outside Git history. The intended repository is: ```text Zishan-Shao/decodeshare ``` The file `docs/artifact_manifest.tsv` lists large local files and suggested paths under `artifacts/` in the Hugging Face repository. ## Suggested upload pattern Install and authenticate the Hugging Face CLI: ```bash pip install -U huggingface_hub[hf_transfer] hf auth login ``` For a full upload from the original workspace: ```bash cd /path/to/decodeshare hf upload Zishan-Shao/decodeshare Hype1/results/acts artifacts/Hype1/results/acts hf upload Zishan-Shao/decodeshare patch_back/results artifacts/patch_back/results ``` ## Downstream outputs `downstream/outputs` contains several `.pt` files. The profiling caches can be larger than Hugging Face Hub's 50 GB single-file limit, so upload them as ordered 10 GiB parts rather than as raw files. The uploaded layout keeps the original run directories. Files ending in `.pt.part-000`, `.pt.part-001`, and so on are split chunks that should be concatenated in lexical order to recover the original `.pt` file. Example staging flow: ```bash SRC_ROOT=/path/to/decodeshare STAGE=/path/to/decodeshare_hf_downstream_split rm -rf "$STAGE" mkdir -p "$STAGE/artifacts/downstream/outputs" mkdir -p "$STAGE/artifacts/downstream/outputs/llama2_r0.2_baseline" ln "$SRC_ROOT/downstream/outputs/llama2_r0.2_baseline/meta_llama_Llama_2_7b_chat_hf_whitening_only_keep0p8_baseline.pt" \ "$STAGE/artifacts/downstream/outputs/llama2_r0.2_baseline/" split -b 10G -d -a 3 \ "$SRC_ROOT/downstream/outputs/llama2_r0.2_baseline/meta-llama_Llama-2-7b-chat-hf_profiling___calib_mix_jsonl_128_0.pt" \ "$STAGE/artifacts/downstream/outputs/llama2_r0.2_baseline/meta-llama_Llama-2-7b-chat-hf_profiling___calib_mix_jsonl_128_0.pt.part-" mkdir -p "$STAGE/artifacts/downstream/outputs/llama2_r0.2_decodeshare_a2" ln "$SRC_ROOT/downstream/outputs/llama2_r0.2_decodeshare_a2/meta_llama_Llama_2_7b_chat_hf_whitening_only_keep0p8_decodeshare_a2p0.pt" \ "$STAGE/artifacts/downstream/outputs/llama2_r0.2_decodeshare_a2/" split -b 10G -d -a 3 \ "$SRC_ROOT/downstream/outputs/llama2_r0.2_decodeshare_a2/meta-llama_Llama-2-7b-chat-hf_profiling___calib_mix_jsonl_128_0.pt" \ "$STAGE/artifacts/downstream/outputs/llama2_r0.2_decodeshare_a2/meta-llama_Llama-2-7b-chat-hf_profiling___calib_mix_jsonl_128_0.pt.part-" mkdir -p "$STAGE/artifacts/downstream/outputs/svdllm_whiten_r0.2" ln "$SRC_ROOT/downstream/outputs/svdllm_whiten_r0.2/meta_llama_Llama_2_7b_chat_hf_whitening_only_0.8.pt" \ "$STAGE/artifacts/downstream/outputs/svdllm_whiten_r0.2/" split -b 10G -d -a 3 \ "$SRC_ROOT/downstream/outputs/svdllm_whiten_r0.2/meta_llama_Llama_2_7b_chat_hf_profiling_wikitext2_128_0.pt" \ "$STAGE/artifacts/downstream/outputs/svdllm_whiten_r0.2/meta_llama_Llama_2_7b_chat_hf_profiling_wikitext2_128_0.pt.part-" find "$STAGE" -type f -size +50G -print hf upload-large-folder Zishan-Shao/decodeshare "$STAGE" \ --repo-type model \ --num-workers 4 \ --no-bars ``` Reassemble a split file after download: ```bash cat artifacts/downstream/outputs/llama2_r0.2_baseline/meta-llama_Llama-2-7b-chat-hf_profiling___calib_mix_jsonl_128_0.pt.part-* \ > artifacts/downstream/outputs/llama2_r0.2_baseline/meta-llama_Llama-2-7b-chat-hf_profiling___calib_mix_jsonl_128_0.pt ``` ## Notes - The current GitHub branch excludes `.npy`, `.npz`, `.pt`, `.bin`, and related large binary formats. - The largest local files are downstream `.pt` profiling and whitening outputs; the profiling outputs need the split workflow above. - The manifest is intentionally broad and includes local experimental archives. Inspect it before uploading everything. - If you want this to be a dataset repository rather than a model repository, create or switch to a Hugging Face Dataset repo and add `--repo-type dataset` to the `hf upload` commands.