File size: 4,127 Bytes
6f43ce0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9c62c02
6f43ce0
9c62c02
6f43ce0
 
9c62c02
6f43ce0
 
 
 
 
 
 
 
 
 
 
221e1f2
 
 
 
 
 
6f43ce0
 
 
 
 
 
 
 
b9cd5f3
 
 
 
 
 
 
 
6f43ce0
 
37becc5
 
 
 
 
 
60ec38a
6f43ce0
 
 
 
 
 
 
 
 
60ec38a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b9cd5f3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6f43ce0
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
---
license: apache-2.0
library_name: gguf
base_model: HuggingFaceBio/Carbon-8B
language:
- dna
tags:
- dna
- genomic
- llama.cpp
- gguf
- hybriddna
---

# Carbon-8B GGUF

GGUF (bf16) conversion of [HuggingFaceBio/Carbon-8B](https://huggingface.co/HuggingFaceBio/Carbon-8B) for use with [llama.cpp](https://github.com/ggml-org/llama.cpp).

Carbon is a hybrid DNA / English language model that switches between Qwen3-4B-Base byte-level BPE for natural text and fixed 6-mer chunking for DNA inside `<dna>...</dna>` tags.

## Requires a recent llama.cpp

HybridDNATokenizer support was merged in [ggml-org/llama.cpp#23410](https://github.com/ggml-org/llama.cpp/pull/23410), so any build from `master` after that works:

```bash
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp && cmake -B build && cmake --build build -j
```

## Files

| File | Quant | Size |
|---|---|---|
| `carbon-8b-bf16.gguf` | bf16 (lossless from source) | 16 GB |

## Usage

### Download

```bash
hf download HuggingFaceBio/Carbon-8B-GGUF carbon-8b-bf16.gguf --local-dir .
```

### Basic DNA completion

```bash
./build/bin/llama-completion -m carbon-8b-bf16.gguf \
    -p '<dna>ATGCGCTAGCTACGATCGATCGTAGCTAGCTAGCTAGCTACG' \
    -n 64 --temp 0 -no-cnv
```

### Metadata-conditioned generation

```bash
./build/bin/llama-completion -m carbon-8b-bf16.gguf \
    -p '<vertebrate_mammalian><protein_coding_region><dna>ATGCGCTAG' \
    -n 64 --temp 0 -no-cnv
```

### Speculative decoding with Carbon-500M draft (~2x speedup)

The 500M shares the HybridDNA vocab, so it's a near-ideal draft. Measured ~2.1x speedup at temp=0 with 87% accept rate on DNA prompts. Grab the draft GGUF first:

```bash
hf download HuggingFaceBio/Carbon-500M-GGUF carbon-500m-bf16.gguf --local-dir .
```

Then run the standalone tool with `--model-draft`:

```bash
./build/bin/llama-speculative \
    -m  carbon-8b-bf16.gguf \
    -md carbon-500m-bf16.gguf \
    -p '<dna>ATGCGCTAGCTACGATCGATCGTAGCTAGCTAGCTAGCTACG' \
    -n 256 --temp 0
```

Or serve the 8B with the 500M draft (`llama-server` accepts the same `-md` flag):

```bash
./build/bin/llama-server \
    -m  carbon-8b-bf16.gguf \
    -md carbon-500m-bf16.gguf \
    --draft-max 16 --draft-min 1 \
    --port 8080
```

```bash
curl -s http://localhost:8080/completion -d '{
  "prompt": "<dna>ATGCGCTAGCTACGATCGATCGTAGCTAGCTAGCTAGCTACG",
  "n_predict": 256,
  "temperature": 0
}' | jq -r .content
```

### Likelihood scoring

The source card's Python `score()` function computes mean log-prob per DNA token. In llama.cpp the closest tools are `llama-perplexity` for corpus-level perplexity (`perplexity = exp(-mean_logprob)`):

```bash
# one prompt per line in dna_corpus.txt, each wrapped in <dna>...</dna>
./build/bin/llama-perplexity -m carbon-8b-bf16.gguf -f dna_corpus.txt --ppl-stride 0
```

Or `llama-server` with `logprobs` for per-token log-probabilities:

```bash
./build/bin/llama-server -m carbon-8b-bf16.gguf --port 8080 &
curl -s http://localhost:8080/completion -d '{
  "prompt": "<dna>GGGCTATAAAGGCCATCGATCGATCGATCGATCGATCGATCG</dna>",
  "n_predict": 0,
  "n_probs": 1
}' | jq '.completion_probabilities'
```

### Long context with YaRN (65k tokens ≈ 393 kbp)

Mirrors the source card's `rope_scaling = {type: yarn, factor: 4.0, original_max_position_embeddings: 32768}`:

```bash
./build/bin/llama-completion -m carbon-8b-bf16.gguf \
    -c 65536 --rope-scaling yarn --rope-scale 4 --yarn-orig-ctx 32768 \
    -p '<dna>...' -n 64 --temp 0 -no-cnv
```

### Base-pair-level generation (FNS branch) — not supported

The `revision="fns"` example from the source card needs custom modeling code (factorized nucleotide supervision head), which only the Python transformers path can load. llama.cpp can't run that branch.

## See also

- Source weights: [HuggingFaceBio/Carbon-8B](https://huggingface.co/HuggingFaceBio/Carbon-8B)
- Other GGUF variants: [500M](https://huggingface.co/HuggingFaceBio/Carbon-500M-GGUF) · [3B](https://huggingface.co/HuggingFaceBio/Carbon-3B-GGUF) · [8B](https://huggingface.co/HuggingFaceBio/Carbon-8B-GGUF)

## License

Apache-2.0, inherited from the source model.