lewtun HF Staff commited on
Commit
593325c
Β·
verified Β·
1 Parent(s): 2d8bb5d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -43,7 +43,7 @@ Carbon-3B is the **flagship** model of the Carbon family. We also release [**Car
43
  - **Native context: 32,768 tokens β‰ˆ 197 kbp.** Extendable to 65,536 tokens (β‰ˆ 393 kbp) at inference time using YaRN.
44
  - **Trained with a Cross-Entropy β†’ Factorised Nucleotide Supervision (FNS) objective schedule** to bridge coarse tokenization and single-nucleotide resolution (see the Carbon technical report).
45
  - **Metadata-conditioned**: optional species-type and gene-type metadata tokens enable conditional generation.
46
- - **Efficient inference**: TODO
47
 
48
  Across our zero-shot evaluation suite β€” sequence recovery, four variant-effect-prediction (VEP) benchmarks (ClinVar coding, ClinVar non-coding, BRCA2, TraitGym Mendelian), and two sequence-level perturbation tasks (TATA-box and synonymous codon) β€” Carbon-3B is competitive with Evo2-7B. It additionally works well on long context and retrieves needles reliably from up to β‰ˆ 393 kbp of distal context on the Genome-NIAH long-context benchmark, while remaining several times faster than Evo2-7B.
49
 
 
43
  - **Native context: 32,768 tokens β‰ˆ 197 kbp.** Extendable to 65,536 tokens (β‰ˆ 393 kbp) at inference time using YaRN.
44
  - **Trained with a Cross-Entropy β†’ Factorised Nucleotide Supervision (FNS) objective schedule** to bridge coarse tokenization and single-nucleotide resolution (see the Carbon technical report).
45
  - **Metadata-conditioned**: optional species-type and gene-type metadata tokens enable conditional generation.
46
+ - **Efficient inference**: compatible with vLLM and other inference engines. Can generate over 100,000 base-pairs per second on a single H100 GPU.
47
 
48
  Across our zero-shot evaluation suite β€” sequence recovery, four variant-effect-prediction (VEP) benchmarks (ClinVar coding, ClinVar non-coding, BRCA2, TraitGym Mendelian), and two sequence-level perturbation tasks (TATA-box and synonymous codon) β€” Carbon-3B is competitive with Evo2-7B. It additionally works well on long context and retrieves needles reliably from up to β‰ˆ 393 kbp of distal context on the Genome-NIAH long-context benchmark, while remaining several times faster than Evo2-7B.
49