Ruinius commited on
Commit
d382d17
·
verified ·
1 Parent(s): c8a128b

Update README.md

Browse files

updated model card as part of replacing quantized version with full version.

Files changed (1) hide show
  1. README.md +19 -14
README.md CHANGED
@@ -1,17 +1,17 @@
1
  ---
2
  language:
3
- - en
4
  license: apache-2.0
5
  tags:
6
- - financial-analysis
7
- - transformer
8
- - classification
9
- - finbert
10
- - financial-statements
11
  base_model: yiyanghkust/finbert-pretrain
12
  model-index:
13
- - name: tiger-transformer
14
- results: []
15
  ---
16
 
17
  # Tiger Transformer (Standardizing Financial Statements)
@@ -25,19 +25,22 @@ This model is a fine-tuned version of [yiyanghkust/finbert-pretrain](https://hug
25
  The **Tiger Transformer** serves as a specialized classification engine for financial analysis AI agents. It addresses the inconsistency found in broad-purpose LLMs when mapping diverse, raw line items (e.g., "Cash & Equivalents", "Cash and due from banks") to standardized accounting categories.
26
 
27
  ### Key Features:
 
28
  - **Context-Aware Classification**: Unlike simple keyword matching, this model uses a context window of 2 lines before and 2 lines after the target line to refine predictions.
29
  - **Architecture**: Fine-tuned `BertForSequenceClassification` using the FinBERT base.
30
- - **Quantization Support**: A quantized version (`pytorch_model_quantized.pt`) is available for low-latency CPU inference.
31
 
32
  ## Intended Uses & Limitations
33
 
34
  ### Intended Use
 
35
  Standardizing raw line items extracted from 10-K, 10-Q, and other financial reports into a consistent format for downstream financial modeling (DCF, ROIC analysis, etc.).
36
 
37
  ### Training Data Strategy
 
38
  The model was trained on a painstakingly curated dataset of manually cleaned financial statement labels. To maximize performance on a niche dataset, the model utilizes all available high-quality labels for training, with validation performed iteratively against new unseen batches.
39
 
40
  ### Performance
 
41
  - **Accuracy**: 90-95% on modern financial reports.
42
  - **Robustness**: High accuracy on critical fields (Subtotals and Totals), which are essential for structural validation.
43
  - **Limitations**: Accuracy may decrease for companies in highly specialized industries or niche regions with non-standard terminology not present in the training set.
@@ -45,16 +48,18 @@ The model was trained on a painstakingly curated dataset of manually cleaned fin
45
  ## Training Procedure
46
 
47
  ### Input Format
 
48
  The model expects input strings formatted with surrounding context:
49
  `[PREV_2] [PREV_1] [SECTION] [RAW_NAME] [NEXT_1] [NEXT_2]`
50
 
51
- * `[SECTION]`: Balance Sheet or Income Statement.
52
- * `[RAW_NAME]`: The line item name to be classified.
53
- * `[PREV/NEXT]`: Surrounding line items providing structural context.
54
 
55
  ### Hyperparameters
 
56
  - **Base Model**: FinBERT
57
- - **Quantization**: Dynamic quantization (int8) applied to Linear layers for optimized CPU performance.
58
 
59
  ## Usage
60
 
@@ -78,6 +83,6 @@ with torch.no_grad():
78
  ```
79
 
80
  ## Acknowledgments & Licensing
 
81
  This project is a fine-tuned version of the FinBERT-Pretrain model developed by Yang et al. (HKUST).
82
  Licensed under the **Apache License 2.0**. Same as the base FinBERT model.
83
-
 
1
  ---
2
  language:
3
+ - en
4
  license: apache-2.0
5
  tags:
6
+ - financial-analysis
7
+ - transformer
8
+ - classification
9
+ - finbert
10
+ - financial-statements
11
  base_model: yiyanghkust/finbert-pretrain
12
  model-index:
13
+ - name: tiger-transformer
14
+ results: []
15
  ---
16
 
17
  # Tiger Transformer (Standardizing Financial Statements)
 
25
  The **Tiger Transformer** serves as a specialized classification engine for financial analysis AI agents. It addresses the inconsistency found in broad-purpose LLMs when mapping diverse, raw line items (e.g., "Cash & Equivalents", "Cash and due from banks") to standardized accounting categories.
26
 
27
  ### Key Features:
28
+
29
  - **Context-Aware Classification**: Unlike simple keyword matching, this model uses a context window of 2 lines before and 2 lines after the target line to refine predictions.
30
  - **Architecture**: Fine-tuned `BertForSequenceClassification` using the FinBERT base.
 
31
 
32
  ## Intended Uses & Limitations
33
 
34
  ### Intended Use
35
+
36
  Standardizing raw line items extracted from 10-K, 10-Q, and other financial reports into a consistent format for downstream financial modeling (DCF, ROIC analysis, etc.).
37
 
38
  ### Training Data Strategy
39
+
40
  The model was trained on a painstakingly curated dataset of manually cleaned financial statement labels. To maximize performance on a niche dataset, the model utilizes all available high-quality labels for training, with validation performed iteratively against new unseen batches.
41
 
42
  ### Performance
43
+
44
  - **Accuracy**: 90-95% on modern financial reports.
45
  - **Robustness**: High accuracy on critical fields (Subtotals and Totals), which are essential for structural validation.
46
  - **Limitations**: Accuracy may decrease for companies in highly specialized industries or niche regions with non-standard terminology not present in the training set.
 
48
  ## Training Procedure
49
 
50
  ### Input Format
51
+
52
  The model expects input strings formatted with surrounding context:
53
  `[PREV_2] [PREV_1] [SECTION] [RAW_NAME] [NEXT_1] [NEXT_2]`
54
 
55
+ - `[SECTION]`: Balance Sheet or Income Statement.
56
+ - `[RAW_NAME]`: The line item name to be classified.
57
+ - `[PREV/NEXT]`: Surrounding line items providing structural context.
58
 
59
  ### Hyperparameters
60
+
61
  - **Base Model**: FinBERT
62
+ - **Precision**: Full precision (FP32).
63
 
64
  ## Usage
65
 
 
83
  ```
84
 
85
  ## Acknowledgments & Licensing
86
+
87
  This project is a fine-tuned version of the FinBERT-Pretrain model developed by Yang et al. (HKUST).
88
  Licensed under the **Apache License 2.0**. Same as the base FinBERT model.