Upload docs/HUGGINGFACE_UPLOAD.md with huggingface_hub
Browse files- docs/HUGGINGFACE_UPLOAD.md +51 -6
docs/HUGGINGFACE_UPLOAD.md
CHANGED
|
@@ -22,17 +22,62 @@ hf auth login
|
|
| 22 |
For a full upload from the original workspace:
|
| 23 |
|
| 24 |
```bash
|
| 25 |
-
cd
|
| 26 |
hf upload Zishan-Shao/decodeshare Hype1/results/acts artifacts/Hype1/results/acts
|
| 27 |
-
hf upload Zishan-Shao/decodeshare downstream/outputs artifacts/downstream/outputs
|
| 28 |
hf upload Zishan-Shao/decodeshare patch_back/results artifacts/patch_back/results
|
| 29 |
```
|
| 30 |
|
| 31 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 32 |
|
| 33 |
```bash
|
| 34 |
-
|
| 35 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
```
|
| 37 |
|
| 38 |
## Notes
|
|
@@ -40,7 +85,7 @@ hf upload Zishan-Shao/decodeshare patch_back/results artifacts/patch_back/result
|
|
| 40 |
- The current GitHub branch excludes `.npy`, `.npz`, `.pt`, `.bin`, and related
|
| 41 |
large binary formats.
|
| 42 |
- The largest local files are downstream `.pt` profiling and whitening outputs;
|
| 43 |
-
|
| 44 |
- The manifest is intentionally broad and includes local experimental archives.
|
| 45 |
Inspect it before uploading everything.
|
| 46 |
- If you want this to be a dataset repository rather than a model repository,
|
|
|
|
| 22 |
For a full upload from the original workspace:
|
| 23 |
|
| 24 |
```bash
|
| 25 |
+
cd /path/to/decodeshare
|
| 26 |
hf upload Zishan-Shao/decodeshare Hype1/results/acts artifacts/Hype1/results/acts
|
|
|
|
| 27 |
hf upload Zishan-Shao/decodeshare patch_back/results artifacts/patch_back/results
|
| 28 |
```
|
| 29 |
|
| 30 |
+
## Downstream outputs
|
| 31 |
+
|
| 32 |
+
`downstream/outputs` contains several `.pt` files. The profiling caches can be
|
| 33 |
+
larger than Hugging Face Hub's 50 GB single-file limit, so upload them as
|
| 34 |
+
ordered 10 GiB parts rather than as raw files.
|
| 35 |
+
|
| 36 |
+
The uploaded layout keeps the original run directories. Files ending in
|
| 37 |
+
`.pt.part-000`, `.pt.part-001`, and so on are split chunks that should be
|
| 38 |
+
concatenated in lexical order to recover the original `.pt` file.
|
| 39 |
+
|
| 40 |
+
Example staging flow:
|
| 41 |
|
| 42 |
```bash
|
| 43 |
+
SRC_ROOT=/path/to/decodeshare
|
| 44 |
+
STAGE=/path/to/decodeshare_hf_downstream_split
|
| 45 |
+
rm -rf "$STAGE"
|
| 46 |
+
mkdir -p "$STAGE/artifacts/downstream/outputs"
|
| 47 |
+
|
| 48 |
+
mkdir -p "$STAGE/artifacts/downstream/outputs/llama2_r0.2_baseline"
|
| 49 |
+
ln "$SRC_ROOT/downstream/outputs/llama2_r0.2_baseline/meta_llama_Llama_2_7b_chat_hf_whitening_only_keep0p8_baseline.pt" \
|
| 50 |
+
"$STAGE/artifacts/downstream/outputs/llama2_r0.2_baseline/"
|
| 51 |
+
split -b 10G -d -a 3 \
|
| 52 |
+
"$SRC_ROOT/downstream/outputs/llama2_r0.2_baseline/meta-llama_Llama-2-7b-chat-hf_profiling___calib_mix_jsonl_128_0.pt" \
|
| 53 |
+
"$STAGE/artifacts/downstream/outputs/llama2_r0.2_baseline/meta-llama_Llama-2-7b-chat-hf_profiling___calib_mix_jsonl_128_0.pt.part-"
|
| 54 |
+
|
| 55 |
+
mkdir -p "$STAGE/artifacts/downstream/outputs/llama2_r0.2_decodeshare_a2"
|
| 56 |
+
ln "$SRC_ROOT/downstream/outputs/llama2_r0.2_decodeshare_a2/meta_llama_Llama_2_7b_chat_hf_whitening_only_keep0p8_decodeshare_a2p0.pt" \
|
| 57 |
+
"$STAGE/artifacts/downstream/outputs/llama2_r0.2_decodeshare_a2/"
|
| 58 |
+
split -b 10G -d -a 3 \
|
| 59 |
+
"$SRC_ROOT/downstream/outputs/llama2_r0.2_decodeshare_a2/meta-llama_Llama-2-7b-chat-hf_profiling___calib_mix_jsonl_128_0.pt" \
|
| 60 |
+
"$STAGE/artifacts/downstream/outputs/llama2_r0.2_decodeshare_a2/meta-llama_Llama-2-7b-chat-hf_profiling___calib_mix_jsonl_128_0.pt.part-"
|
| 61 |
+
|
| 62 |
+
mkdir -p "$STAGE/artifacts/downstream/outputs/svdllm_whiten_r0.2"
|
| 63 |
+
ln "$SRC_ROOT/downstream/outputs/svdllm_whiten_r0.2/meta_llama_Llama_2_7b_chat_hf_whitening_only_0.8.pt" \
|
| 64 |
+
"$STAGE/artifacts/downstream/outputs/svdllm_whiten_r0.2/"
|
| 65 |
+
split -b 10G -d -a 3 \
|
| 66 |
+
"$SRC_ROOT/downstream/outputs/svdllm_whiten_r0.2/meta_llama_Llama_2_7b_chat_hf_profiling_wikitext2_128_0.pt" \
|
| 67 |
+
"$STAGE/artifacts/downstream/outputs/svdllm_whiten_r0.2/meta_llama_Llama_2_7b_chat_hf_profiling_wikitext2_128_0.pt.part-"
|
| 68 |
+
|
| 69 |
+
find "$STAGE" -type f -size +50G -print
|
| 70 |
+
hf upload-large-folder Zishan-Shao/decodeshare "$STAGE" \
|
| 71 |
+
--repo-type model \
|
| 72 |
+
--num-workers 4 \
|
| 73 |
+
--no-bars
|
| 74 |
+
```
|
| 75 |
+
|
| 76 |
+
Reassemble a split file after download:
|
| 77 |
+
|
| 78 |
+
```bash
|
| 79 |
+
cat artifacts/downstream/outputs/llama2_r0.2_baseline/meta-llama_Llama-2-7b-chat-hf_profiling___calib_mix_jsonl_128_0.pt.part-* \
|
| 80 |
+
> artifacts/downstream/outputs/llama2_r0.2_baseline/meta-llama_Llama-2-7b-chat-hf_profiling___calib_mix_jsonl_128_0.pt
|
| 81 |
```
|
| 82 |
|
| 83 |
## Notes
|
|
|
|
| 85 |
- The current GitHub branch excludes `.npy`, `.npz`, `.pt`, `.bin`, and related
|
| 86 |
large binary formats.
|
| 87 |
- The largest local files are downstream `.pt` profiling and whitening outputs;
|
| 88 |
+
the profiling outputs need the split workflow above.
|
| 89 |
- The manifest is intentionally broad and includes local experimental archives.
|
| 90 |
Inspect it before uploading everything.
|
| 91 |
- If you want this to be a dataset repository rather than a model repository,
|