📑 Llama-3 Invoice Extractor (Fine-Tuned)

Developed by: manuelaschrittwieser
License: apache-2.0
Finetuned from model : unsloth/llama-3-8b-bnb-4bit

This llama model was trained 2x faster with Unsloth

This model is a fine-tuned version of Meta's Llama-3-8B, specifically optimized for Structured Data Extraction. It excels at taking messy, unstructured text descriptions of financial transactions (invoices, receipts, and purchase orders) and transforming them into valid, machine-readable JSON objects.

🚀 Model Details

Model type: Causal Language Model
Language(s): English
Technique: QLoRA (4-bit Quantization) via Unsloth
Task: Unstructured Text to JSON (Invoice Extraction)

🎯 Use Case

Standard LLMs often struggle to output only valid JSON without conversational "noise." This model was trained to identify specific financial entities and wrap them in a consistent schema, making it ideal for:

Automated accounting pipelines.
Receipt scanning applications.
Expense tracking bots.

🛠️ Prompt Template

For best results, use the following instruction-based prompt format:

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Extract invoice details into JSON.

### Input:
[PASTE MESSY TEXT HERE]

### Response:

💻 Usage

You can load this model using the unsloth library for 2x faster inference:

from unsloth import FastLanguageModel
import torch

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "manuelaschrittwieser/llama-3-invoice-extractor",
    max_seq_length = 2048,
    load_in_4bit = True,
)
FastLanguageModel.for_inference(model)

# Example Inference
inputs = tokenizer([
    "### Instruction:\nExtract invoice details into JSON.\n\n### Input:\nBought 3 laptops for $1500 each at Best Buy on Oct 12.\n\n### Response:\n"
], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 128)
print(tokenizer.decode(outputs[0]))

📊 Training Data

The model was trained on a curated dataset of synthetic and real-world receipt descriptions, including:

Noise injection: Typos, shorthand (e.g., "3x", "qty"), and varying date formats.
Entity types: Vendors, quantities, items, total amounts, and currencies.

⚠️ Limitations

Language: Currently optimized for English text.
Hallucination: While fine-tuned for structure, users should always verify the output against the original source for critical financial data.
Max Length: Best results are achieved with short-to-medium length invoice descriptions.

🏆 Credits

Created as part of an LLM Fine-Tuning project using Unsloth and Hugging Face TRL.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for manuelaschrittwieser/llama-3-invoice-extractor

Base model

meta-llama/Meta-Llama-3-8B

Quantized

unsloth/llama-3-8b-bnb-4bit

Finetuned

(3066)

this model

manuelaschrittwieser
/

llama-3-invoice-extractor