kashif HF Staff commited on
Commit
1428e66
·
verified ·
1 Parent(s): 217b2a7

card: add YaRN long-context, likelihood scoring, FNS note

Browse files
Files changed (1) hide show
  1. README.md +34 -0
README.md CHANGED
@@ -69,6 +69,40 @@ Then run with `--model-draft`:
69
  -n 256 --temp 0
70
  ```
71
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
72
  ## Tokenization parity
73
 
74
  For every prompt in the [test fixture](https://github.com/kashif/llama.cpp/blob/carbon-3b-tokenizer/models/ggml-vocab-hybriddna.gguf.inp), llama.cpp produces byte-for-byte identical token IDs to the Python `HybridDNATokenizer` (loaded with `trust_remote_code=True`).
 
69
  -n 256 --temp 0
70
  ```
71
 
72
+ ### Likelihood scoring
73
+
74
+ The source card's Python `score()` function computes mean log-prob per DNA token. In llama.cpp the closest tools are `llama-perplexity` for corpus-level perplexity (`perplexity = exp(-mean_logprob)`):
75
+
76
+ ```bash
77
+ # one prompt per line in dna_corpus.txt, each wrapped in <dna>...</dna>
78
+ ./build/bin/llama-perplexity -m carbon-3b-bf16.gguf -f dna_corpus.txt --ppl-stride 0
79
+ ```
80
+
81
+ Or `llama-server` with `logprobs` for per-token log-probabilities:
82
+
83
+ ```bash
84
+ ./build/bin/llama-server -m carbon-3b-bf16.gguf --port 8080 &
85
+ curl -s http://localhost:8080/completion -d '{
86
+ "prompt": "<dna>GGGCTATAAAGGCCATCGATCGATCGATCGATCGATCGATCG</dna>",
87
+ "n_predict": 0,
88
+ "n_probs": 1
89
+ }' | jq '.completion_probabilities'
90
+ ```
91
+
92
+ ### Long context with YaRN (65k tokens ≈ 393 kbp)
93
+
94
+ Mirrors the source card's `rope_scaling = {type: yarn, factor: 4.0, original_max_position_embeddings: 32768}`:
95
+
96
+ ```bash
97
+ ./build/bin/llama-completion -m carbon-3b-bf16.gguf \
98
+ -c 65536 --rope-scaling yarn --rope-scale 4 --yarn-orig-ctx 32768 \
99
+ -p '<dna>...' -n 64 --temp 0 -no-cnv
100
+ ```
101
+
102
+ ### Base-pair-level generation (FNS branch) — not supported
103
+
104
+ The `revision="fns"` example from the source card needs custom modeling code (factorized nucleotide supervision head), which only the Python transformers path can load. llama.cpp can't run that branch.
105
+
106
  ## Tokenization parity
107
 
108
  For every prompt in the [test fixture](https://github.com/kashif/llama.cpp/blob/carbon-3b-tokenizer/models/ggml-vocab-hybriddna.gguf.inp), llama.cpp produces byte-for-byte identical token IDs to the Python `HybridDNATokenizer` (loaded with `trust_remote_code=True`).