kashif HF Staff commited on
Commit
673ebdf
·
verified ·
1 Parent(s): 4c2888c

card: HybridDNATokenizer merged upstream (#23410)

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -18,12 +18,12 @@ GGUF (bf16) conversion of [HuggingFaceBio/Carbon-500M](https://huggingface.co/Hu
18
 
19
  Carbon is a hybrid DNA / English language model that switches between Qwen3-4B-Base byte-level BPE for natural text and fixed 6-mer chunking for DNA inside `<dna>...</dna>` tags.
20
 
21
- ## Requires llama.cpp with HybridDNATokenizer support
22
 
23
- Loading these GGUFs needs `LLAMA_VOCAB_TYPE_HYBRIDDNA`, which is not yet in upstream llama.cpp. Until the PR merges, build from the [`carbon-3b-tokenizer`](https://github.com/kashif/llama.cpp/tree/carbon-3b-tokenizer) branch:
24
 
25
  ```bash
26
- git clone -b carbon-3b-tokenizer https://github.com/kashif/llama.cpp
27
  cd llama.cpp && cmake -B build && cmake --build build -j
28
  ```
29
 
 
18
 
19
  Carbon is a hybrid DNA / English language model that switches between Qwen3-4B-Base byte-level BPE for natural text and fixed 6-mer chunking for DNA inside `<dna>...</dna>` tags.
20
 
21
+ ## Requires a recent llama.cpp
22
 
23
+ HybridDNATokenizer support was merged in [ggml-org/llama.cpp#23410](https://github.com/ggml-org/llama.cpp/pull/23410), so any build from `master` after that works:
24
 
25
  ```bash
26
+ git clone https://github.com/ggml-org/llama.cpp
27
  cd llama.cpp && cmake -B build && cmake --build build -j
28
  ```
29