Update README.md

Browse files

Files changed (1) hide show

README.md +95 -17

README.md CHANGED Viewed

@@ -1,35 +1,110 @@
 ---
 library_name: transformers
 tags:
 - generated_from_trainer
 metrics:
 - f1
 model-index:
-- name: Pix2StructCzechInvoiceV3
   results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# Pix2StructCzechInvoiceV3
-This model was trained from scratch on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.1542
-- F1: 0.8404
 ## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
 ## Training procedure
@@ -46,6 +121,8 @@ The following hyperparameters were used during training:
 - num_epochs: 10
 - mixed_precision_training: Native AMP
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss | F1     |
@@ -61,10 +138,11 @@ The following hyperparameters were used during training:
 | 0.0804        | 9.0   | 207  | 0.1433          | 0.7963 |
 | 0.0664        | 10.0  | 230  | 0.1614          | 0.7991 |
-### Framework versions
-- Transformers 5.0.0
-- Pytorch 2.10.0+cu128
-- Datasets 4.0.0
-- Tokenizers 0.22.2

 ---
 library_name: transformers
+license: apache-2.0
+base_model: google/pix2struct-docvqa-base
 tags:
 - generated_from_trainer
+- invoice-processing
+- information-extraction
+- czech-language
+- document-ai
+- multimodal-model
+- generative-model
+- synthetic-data
+- hybrid-data
+- real-data
 metrics:
 - f1
 model-index:
+- name: Pix2StructCzechInvoice-V3
   results: []
 ---
+# Pix2StructCzechInvoice (V3 – Full Pipeline with Real Data Fine-Tuning)
+This model is a fine-tuned version of [google/pix2struct-docvqa-base](https://huggingface.co/google/pix2struct-docvqa-base) for structured information extraction from Czech invoices.
 It achieves the following results on the evaluation set:
+- Loss: 0.1542
+- F1: 0.8404
+---
 ## Model description
+Pix2StructCzechInvoice (V3) is the final generative model in the experimental pipeline.
+Unlike token classification approaches, this model:
+- processes full document images
+- generates structured outputs as text sequences
+It extracts key invoice fields such as:
+- supplier
+- customer
+- invoice number
+- bank details
+- totals
+- dates
+By combining synthetic, hybrid, and real data, this version significantly improves both performance and stability.
+---
+## Training data
+The dataset used in this stage combines:
+1. **Synthetic template-based invoices (V0)**
+2. **Synthetic invoices with randomized layouts (V1)**
+3. **Hybrid invoices with real layouts and synthetic content (V2)**
+4. **Real annotated invoices**
+### Real data fine-tuning
+The final stage introduces:
+- real invoice images
+- realistic visual noise and distortions
+- natural language variability
+- real formatting inconsistencies
+This allows the model to:
+- better align generated outputs with real-world distributions
+- improve robustness of sequence generation
+- reduce hallucinations and formatting errors
+---
+## Role in the pipeline
+This model corresponds to:
+**V3 – Full pipeline (synthetic + hybrid + real data fine-tuning)**
+It represents:
+- the final generative model
+- the best-performing Pix2Struct variant
+- an end-to-end extraction approach
+---
+## Intended uses
+- End-to-end invoice information extraction from images
+- Document VQA and generative document understanding
+- OCR-free document processing pipelines
+- Research in generative vs structured extraction approaches
+---
+## Limitations
+- Output format may still be inconsistent
+- Sensitive to decoding strategy and prompt structure
+- Less interpretable than token classification models
+- Requires post-processing for structured outputs
+- Computationally more expensive
+---
 ## Training procedure
 - num_epochs: 10
 - mixed_precision_training: Native AMP
+---
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss | F1     |
 | 0.0804        | 9.0   | 207  | 0.1433          | 0.7963 |
 | 0.0664        | 10.0  | 230  | 0.1614          | 0.7991 |
+---
+## Framework versions
+- Transformers 5.0.0
+- PyTorch 2.10.0+cu128
+- Datasets 4.0.0
+- Tokenizers 0.22.2