Nessie v5 (Llama 3.1 8B Fine-tune)

Nessie is Arkova's credential metadata extraction model, fine-tuned from Meta Llama 3.1 8B Instruct for structured extraction of credential metadata from PII-stripped document text.

Model Details

  • Base model: meta-llama/Meta-Llama-3.1-8B-Instruct
  • Fine-tuning: Together AI (job ft-b8594db6-80f9)
  • Training data: 1,903 train + 211 validation examples
  • Precision: float16
  • Context length: 32,768 tokens
  • Training mix: 75% domain-specific + 25% general credential data

Evaluation Results (v5)

Metric Value
Weighted F1 87.2%
Macro F1 75.7%
Mean Confidence 72.5%
Mean Accuracy 83.5%
Confidence Correlation (r) 0.539
Mean Latency 1,543ms

Per-Type Performance (Top 10)

Type Weighted F1 Sample Size
FINANCIAL 100.0% n=2
TRANSCRIPT 100.0% n=2
RESUME 100.0% n=2
DEGREE 98.5% n=11
PATENT 97.1% n=4
LICENSE 96.6% n=10
PROFESSIONAL 95.8% n=7
INSURANCE 93.3% n=4
LEGAL 92.9% n=3
CLE 91.1% n=2

Intended Use

Nessie extracts structured metadata from PII-stripped credential text. Input is pre-processed to remove personally identifiable information before reaching the model.

Important: This model must be used with its trained condensed prompt (~1.5K chars). Using the full extraction prompt (58K chars) causes 0% F1 due to prompt template mismatch.

Credential Types Supported

DEGREE, LICENSE, CERTIFICATE, BADGE, SEC_FILING, LEGAL, REGULATION, PATENT, PUBLICATION, ATTESTATION, INSURANCE, FINANCIAL, MILITARY, CLE, RESUME, MEDICAL, IDENTITY, TRANSCRIPT, PROFESSIONAL, OTHER

Domain-Specific Adapters

Nessie v5 includes domain-specific LoRA adapters trained on specialized corpora:

  • SEC (45K examples): SEC filings, financial disclosures
  • Academic (45K examples): Degrees, transcripts, publications
  • Legal (13K examples): Legal documents, bar admissions, CLE
  • Regulatory (13K examples): Licenses, regulations, compliance

Limitations

  • Only processes PII-stripped text (by design)
  • Small sample sizes for some credential types (FINANCIAL, TRANSCRIPT, RESUME at n=2)
  • fraudSignals field has 0% F1 (known limitation, under improvement)
  • Confidence calibration ECE of 11% (recalibrated via piecewise linear function)

Citation

@software{nessie-v5,
  title={Nessie v5: Credential Metadata Extraction Model},
  author={Arkova},
  year={2026},
  url={https://arkova.ai}
}

License

This model is released under the Llama 3.1 Community License. See META's license for details.

Downloads last month
551
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for carsonarkova/nessie-v5-llama-3.1-8b

Finetuned
(2585)
this model

Evaluation results