Using JS (@huggingface/hub + XetBlob)

#2
  1. Extract / Draft a Focused Sub-Spec (Example: Chunking & Hashing)
    You could split your spec repo into sub-docs like:

chunking.md (summary structure)
Algorithm: Gearhash-based CDC.
Parameters:
Target chunk size: e.g., 64 KiB (example; actual from spec).
Min/max chunk sizes.
Boundary condition: windowed mask match on rolling hash.
Determinism Requirements:
MUST process bytes sequentially, no overlaps.
MUST treat input as raw bytes; no text normalization.
MUST produce identical boundaries for identical byte streams.
Edge Cases:
Empty file: zero chunks, well-defined hash.
Files smaller than min chunk: 1 chunk.
Handling last partial chunk.
hashing.md (summary structure)
Define MerkleHash binary representation and string encoding.
Define separate domain tags or prefixes for:
Chunk hash
Xorb hash
File term list hash
MUST NOT mix domains (prevent collisions across object types).
Specify endianness and canonical serialization (so everyone gets same hash).
I can write a full, more formal BCP14-style sub-spec for any one area (e.g., chunking.md, file-reconstruction.md) if you tell me which you want first.

  1. Sample Code Using Reference Implementations
    4.1 Using hf_xet (Python bindings)
    Upload file via Hugging Face Hub/Xet:
Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment