File size: 4,338 Bytes
58ad9ca
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d38af93
58ad9ca
d38af93
58ad9ca
 
d38af93
58ad9ca
 
 
 
 
 
 
 
 
 
 
12c3c9f
 
 
 
 
 
58ad9ca
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
217b2a7
 
 
 
 
 
c1640e4
217b2a7
58ad9ca
 
 
 
 
 
 
 
c1640e4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1428e66
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
58ad9ca
 
12c3c9f
58ad9ca
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
---
license: apache-2.0
library_name: gguf
base_model: HuggingFaceBio/Carbon-3B
language:
- dna
tags:
- dna
- genomic
- llama.cpp
- gguf
- hybriddna
---

# Carbon-3B GGUF

GGUF (bf16) conversion of [HuggingFaceBio/Carbon-3B](https://huggingface.co/HuggingFaceBio/Carbon-3B) for use with [llama.cpp](https://github.com/ggml-org/llama.cpp).

Carbon is a hybrid DNA / English language model that switches between Qwen3-4B-Base byte-level BPE for natural text and fixed 6-mer chunking for DNA inside `<dna>...</dna>` tags.

## Requires a recent llama.cpp

HybridDNATokenizer support was merged in [ggml-org/llama.cpp#23410](https://github.com/ggml-org/llama.cpp/pull/23410), so any build from `master` after that works:

```bash
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp && cmake -B build && cmake --build build -j
```

## Files

| File | Quant | Size |
|---|---|---|
| `carbon-3b-bf16.gguf` | bf16 (lossless from source) | 6.5 GB |

## Usage

### Download

```bash
hf download HuggingFaceBio/Carbon-3B-GGUF carbon-3b-bf16.gguf --local-dir .
```

### Basic DNA completion

```bash
./build/bin/llama-completion -m carbon-3b-bf16.gguf \
    -p '<dna>ATGCGCTAGCTACGATCGATCGTAGCTAGCTAGCTAGCTACG' \
    -n 64 --temp 0 -no-cnv
```

### Metadata-conditioned generation

```bash
./build/bin/llama-completion -m carbon-3b-bf16.gguf \
    -p '<vertebrate_mammalian><protein_coding_region><dna>ATGCGCTAG' \
    -n 64 --temp 0 -no-cnv
```

### Speculative decoding with Carbon-500M draft (~1.35x speedup)

Carbon-500M shares the HybridDNA vocab, so it works as a drop-in draft model. Grab it first:

```bash
hf download HuggingFaceBio/Carbon-500M-GGUF carbon-500m-bf16.gguf --local-dir .
```

Then run the standalone tool with `--model-draft`:

```bash
./build/bin/llama-speculative \
    -m  carbon-3b-bf16.gguf \
    -md carbon-500m-bf16.gguf \
    -p '<dna>ATGCGCTAGCTACGATCGATCGTAGCTAGCTAGCTAGCTACG' \
    -n 256 --temp 0
```

Or serve the 3B with the 500M draft (`llama-server` accepts the same `-md` flag):

```bash
./build/bin/llama-server \
    -m  carbon-3b-bf16.gguf \
    -md carbon-500m-bf16.gguf \
    --draft-max 16 --draft-min 1 \
    --port 8080
```

```bash
curl -s http://localhost:8080/completion -d '{
  "prompt": "<dna>ATGCGCTAGCTACGATCGATCGTAGCTAGCTAGCTAGCTACG",
  "n_predict": 256,
  "temperature": 0
}' | jq -r .content
```

### Likelihood scoring

The source card's Python `score()` function computes mean log-prob per DNA token. In llama.cpp the closest tools are `llama-perplexity` for corpus-level perplexity (`perplexity = exp(-mean_logprob)`):

```bash
# one prompt per line in dna_corpus.txt, each wrapped in <dna>...</dna>
./build/bin/llama-perplexity -m carbon-3b-bf16.gguf -f dna_corpus.txt --ppl-stride 0
```

Or `llama-server` with `logprobs` for per-token log-probabilities:

```bash
./build/bin/llama-server -m carbon-3b-bf16.gguf --port 8080 &
curl -s http://localhost:8080/completion -d '{
  "prompt": "<dna>GGGCTATAAAGGCCATCGATCGATCGATCGATCGATCGATCG</dna>",
  "n_predict": 0,
  "n_probs": 1
}' | jq '.completion_probabilities'
```

### Long context with YaRN (65k tokens ≈ 393 kbp)

Mirrors the source card's `rope_scaling = {type: yarn, factor: 4.0, original_max_position_embeddings: 32768}`:

```bash
./build/bin/llama-completion -m carbon-3b-bf16.gguf \
    -c 65536 --rope-scaling yarn --rope-scale 4 --yarn-orig-ctx 32768 \
    -p '<dna>...' -n 64 --temp 0 -no-cnv
```

### Base-pair-level generation (FNS branch) — not supported

The `revision="fns"` example from the source card needs custom modeling code (factorized nucleotide supervision head), which only the Python transformers path can load. llama.cpp can't run that branch.

## Tokenization parity

llama.cpp produces byte-for-byte identical token IDs to the Python `HybridDNATokenizer` (loaded with `trust_remote_code=True`) on the standard `<dna>`/metadata/edge-case fixtures shipped in [`ggml-org/vocabs`](https://huggingface.co/ggml-org/vocabs).

## See also

- Source weights: [HuggingFaceBio/Carbon-3B](https://huggingface.co/HuggingFaceBio/Carbon-3B)
- Other GGUF variants: [500M](https://huggingface.co/HuggingFaceBio/Carbon-500M-GGUF) · [3B](https://huggingface.co/HuggingFaceBio/Carbon-3B-GGUF) · [8B](https://huggingface.co/HuggingFaceBio/Carbon-8B-GGUF)

## License

Apache-2.0, inherited from the source model.