Using JS (@huggingface/hub + XetBlob)
- Extract / Draft a Focused Sub-Spec (Example: Chunking & Hashing)
You could split your spec repo into sub-docs like:
chunking.md (summary structure)
Algorithm: Gearhash-based CDC.
Parameters:
Target chunk size: e.g., 64 KiB (example; actual from spec).
Min/max chunk sizes.
Boundary condition: windowed mask match on rolling hash.
Determinism Requirements:
MUST process bytes sequentially, no overlaps.
MUST treat input as raw bytes; no text normalization.
MUST produce identical boundaries for identical byte streams.
Edge Cases:
Empty file: zero chunks, well-defined hash.
Files smaller than min chunk: 1 chunk.
Handling last partial chunk.
hashing.md (summary structure)
Define MerkleHash binary representation and string encoding.
Define separate domain tags or prefixes for:
Chunk hash
Xorb hash
File term list hash
MUST NOT mix domains (prevent collisions across object types).
Specify endianness and canonical serialization (so everyone gets same hash).
I can write a full, more formal BCP14-style sub-spec for any one area (e.g., chunking.md, file-reconstruction.md) if you tell me which you want first.
- Sample Code Using Reference Implementations
4.1 Using hf_xet (Python bindings)
Upload file via Hugging Face Hub/Xet: