TomasFAV commited on
Commit
14e6683
·
verified ·
1 Parent(s): d3a60e0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +95 -17
README.md CHANGED
@@ -1,35 +1,110 @@
1
  ---
2
  library_name: transformers
 
 
3
  tags:
4
  - generated_from_trainer
 
 
 
 
 
 
 
 
 
5
  metrics:
6
  - f1
7
  model-index:
8
- - name: Pix2StructCzechInvoiceV3
9
  results: []
10
  ---
11
 
12
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
- should probably proofread and complete it, then remove this comment. -->
14
 
15
- # Pix2StructCzechInvoiceV3
16
 
17
- This model was trained from scratch on an unknown dataset.
18
  It achieves the following results on the evaluation set:
19
- - Loss: 0.1542
20
- - F1: 0.8404
 
 
21
 
22
  ## Model description
23
 
24
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
 
26
- ## Intended uses & limitations
27
 
28
- More information needed
 
 
 
29
 
30
- ## Training and evaluation data
 
 
31
 
32
- More information needed
 
 
 
 
 
 
33
 
34
  ## Training procedure
35
 
@@ -46,6 +121,8 @@ The following hyperparameters were used during training:
46
  - num_epochs: 10
47
  - mixed_precision_training: Native AMP
48
 
 
 
49
  ### Training results
50
 
51
  | Training Loss | Epoch | Step | Validation Loss | F1 |
@@ -61,10 +138,11 @@ The following hyperparameters were used during training:
61
  | 0.0804 | 9.0 | 207 | 0.1433 | 0.7963 |
62
  | 0.0664 | 10.0 | 230 | 0.1614 | 0.7991 |
63
 
 
64
 
65
- ### Framework versions
66
 
67
- - Transformers 5.0.0
68
- - Pytorch 2.10.0+cu128
69
- - Datasets 4.0.0
70
- - Tokenizers 0.22.2
 
1
  ---
2
  library_name: transformers
3
+ license: apache-2.0
4
+ base_model: google/pix2struct-docvqa-base
5
  tags:
6
  - generated_from_trainer
7
+ - invoice-processing
8
+ - information-extraction
9
+ - czech-language
10
+ - document-ai
11
+ - multimodal-model
12
+ - generative-model
13
+ - synthetic-data
14
+ - hybrid-data
15
+ - real-data
16
  metrics:
17
  - f1
18
  model-index:
19
+ - name: Pix2StructCzechInvoice-V3
20
  results: []
21
  ---
22
 
23
+ # Pix2StructCzechInvoice (V3 Full Pipeline with Real Data Fine-Tuning)
 
24
 
25
+ This model is a fine-tuned version of [google/pix2struct-docvqa-base](https://huggingface.co/google/pix2struct-docvqa-base) for structured information extraction from Czech invoices.
26
 
 
27
  It achieves the following results on the evaluation set:
28
+ - Loss: 0.1542
29
+ - F1: 0.8404
30
+
31
+ ---
32
 
33
  ## Model description
34
 
35
+ Pix2StructCzechInvoice (V3) is the final generative model in the experimental pipeline.
36
+
37
+ Unlike token classification approaches, this model:
38
+ - processes full document images
39
+ - generates structured outputs as text sequences
40
+
41
+ It extracts key invoice fields such as:
42
+ - supplier
43
+ - customer
44
+ - invoice number
45
+ - bank details
46
+ - totals
47
+ - dates
48
+
49
+ By combining synthetic, hybrid, and real data, this version significantly improves both performance and stability.
50
+
51
+ ---
52
+
53
+ ## Training data
54
+
55
+ The dataset used in this stage combines:
56
+
57
+ 1. **Synthetic template-based invoices (V0)**
58
+ 2. **Synthetic invoices with randomized layouts (V1)**
59
+ 3. **Hybrid invoices with real layouts and synthetic content (V2)**
60
+ 4. **Real annotated invoices**
61
+
62
+ ### Real data fine-tuning
63
+
64
+ The final stage introduces:
65
+ - real invoice images
66
+ - realistic visual noise and distortions
67
+ - natural language variability
68
+ - real formatting inconsistencies
69
+
70
+ This allows the model to:
71
+ - better align generated outputs with real-world distributions
72
+ - improve robustness of sequence generation
73
+ - reduce hallucinations and formatting errors
74
+
75
+ ---
76
+
77
+ ## Role in the pipeline
78
+
79
+ This model corresponds to:
80
+
81
+ **V3 – Full pipeline (synthetic + hybrid + real data fine-tuning)**
82
+
83
+ It represents:
84
+ - the final generative model
85
+ - the best-performing Pix2Struct variant
86
+ - an end-to-end extraction approach
87
+
88
+ ---
89
 
90
+ ## Intended uses
91
 
92
+ - End-to-end invoice information extraction from images
93
+ - Document VQA and generative document understanding
94
+ - OCR-free document processing pipelines
95
+ - Research in generative vs structured extraction approaches
96
 
97
+ ---
98
+
99
+ ## Limitations
100
 
101
+ - Output format may still be inconsistent
102
+ - Sensitive to decoding strategy and prompt structure
103
+ - Less interpretable than token classification models
104
+ - Requires post-processing for structured outputs
105
+ - Computationally more expensive
106
+
107
+ ---
108
 
109
  ## Training procedure
110
 
 
121
  - num_epochs: 10
122
  - mixed_precision_training: Native AMP
123
 
124
+ ---
125
+
126
  ### Training results
127
 
128
  | Training Loss | Epoch | Step | Validation Loss | F1 |
 
138
  | 0.0804 | 9.0 | 207 | 0.1433 | 0.7963 |
139
  | 0.0664 | 10.0 | 230 | 0.1614 | 0.7991 |
140
 
141
+ ---
142
 
143
+ ## Framework versions
144
 
145
+ - Transformers 5.0.0
146
+ - PyTorch 2.10.0+cu128
147
+ - Datasets 4.0.0
148
+ - Tokenizers 0.22.2