kashif HF Staff commited on
Commit
c8ad7cb
verified
1 Parent(s): 58c9398

Add model card

Browse files
Files changed (1) hide show
  1. README.md +66 -0
README.md ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: gguf
4
+ base_model: HuggingFaceBio/Carbon-500M
5
+ language:
6
+ - dna
7
+ tags:
8
+ - dna
9
+ - genomic
10
+ - llama.cpp
11
+ - gguf
12
+ - hybriddna
13
+ ---
14
+
15
+ # Carbon-500M GGUF
16
+
17
+ GGUF (bf16) conversion of [HuggingFaceBio/Carbon-500M](https://huggingface.co/HuggingFaceBio/Carbon-500M) for use with [llama.cpp](https://github.com/ggml-org/llama.cpp).
18
+
19
+ Carbon is a hybrid DNA / English language model that switches between Qwen3-4B-Base byte-level BPE for natural text and fixed 6-mer chunking for DNA inside `<dna>...</dna>` tags.
20
+
21
+ ## Requires llama.cpp with HybridDNATokenizer support
22
+
23
+ Loading these GGUFs needs `LLAMA_VOCAB_TYPE_HYBRIDDNA`, which is not yet in upstream llama.cpp. Until the PR merges, build from the [`carbon-3b-tokenizer`](https://github.com/kashif/llama.cpp/tree/carbon-3b-tokenizer) branch:
24
+
25
+ ```bash
26
+ git clone -b carbon-3b-tokenizer https://github.com/kashif/llama.cpp
27
+ cd llama.cpp && cmake -B build && cmake --build build -j
28
+ ```
29
+
30
+ ## Files
31
+
32
+ | File | Quant | Size |
33
+ |---|---|---|
34
+ | `carbon-500m-bf16.gguf` | bf16 (lossless from source) | 983 MB |
35
+
36
+ ## Usage
37
+
38
+ ### Basic DNA completion
39
+
40
+ ```bash
41
+ ./build/bin/llama-completion -m carbon-500m-bf16.gguf \
42
+ -p '<dna>ATGCGCTAGCTACGATCGATCGTAGCTAGCTAGCTAGCTACG' \
43
+ -n 64 --temp 0 -no-cnv
44
+ ```
45
+
46
+ ### As a draft model for speculative decoding
47
+
48
+ Carbon-500M shares the HybridDNA vocab with the larger models, so it makes an excellent draft model:
49
+
50
+ ```bash
51
+ # 8B target + 500M draft -> ~2x speedup at temp=0
52
+ ./build/bin/llama-speculative \
53
+ -m carbon-8b-bf16.gguf \
54
+ -md carbon-500m-bf16.gguf \
55
+ -p '<dna>ATGCGCTAGCTACGATCGATCGTAGCTAGCTAGCTAGCTACG' \
56
+ -n 256 --temp 0
57
+ ```
58
+
59
+ ## See also
60
+
61
+ - Source weights: [HuggingFaceBio/Carbon-500M](https://huggingface.co/HuggingFaceBio/Carbon-500M)
62
+ - Other GGUF variants: [500M](https://huggingface.co/HuggingFaceBio/Carbon-500M-GGUF) 路 [3B](https://huggingface.co/HuggingFaceBio/Carbon-3B-GGUF) 路 [8B](https://huggingface.co/HuggingFaceBio/Carbon-8B-GGUF)
63
+
64
+ ## License
65
+
66
+ Apache-2.0, inherited from the source model.