Update README.md
Browse filesupdated model card as part of replacing quantized version with full version.
README.md
CHANGED
|
@@ -1,17 +1,17 @@
|
|
| 1 |
---
|
| 2 |
language:
|
| 3 |
-
- en
|
| 4 |
license: apache-2.0
|
| 5 |
tags:
|
| 6 |
-
- financial-analysis
|
| 7 |
-
- transformer
|
| 8 |
-
- classification
|
| 9 |
-
- finbert
|
| 10 |
-
- financial-statements
|
| 11 |
base_model: yiyanghkust/finbert-pretrain
|
| 12 |
model-index:
|
| 13 |
-
- name: tiger-transformer
|
| 14 |
-
|
| 15 |
---
|
| 16 |
|
| 17 |
# Tiger Transformer (Standardizing Financial Statements)
|
|
@@ -25,19 +25,22 @@ This model is a fine-tuned version of [yiyanghkust/finbert-pretrain](https://hug
|
|
| 25 |
The **Tiger Transformer** serves as a specialized classification engine for financial analysis AI agents. It addresses the inconsistency found in broad-purpose LLMs when mapping diverse, raw line items (e.g., "Cash & Equivalents", "Cash and due from banks") to standardized accounting categories.
|
| 26 |
|
| 27 |
### Key Features:
|
|
|
|
| 28 |
- **Context-Aware Classification**: Unlike simple keyword matching, this model uses a context window of 2 lines before and 2 lines after the target line to refine predictions.
|
| 29 |
- **Architecture**: Fine-tuned `BertForSequenceClassification` using the FinBERT base.
|
| 30 |
-
- **Quantization Support**: A quantized version (`pytorch_model_quantized.pt`) is available for low-latency CPU inference.
|
| 31 |
|
| 32 |
## Intended Uses & Limitations
|
| 33 |
|
| 34 |
### Intended Use
|
|
|
|
| 35 |
Standardizing raw line items extracted from 10-K, 10-Q, and other financial reports into a consistent format for downstream financial modeling (DCF, ROIC analysis, etc.).
|
| 36 |
|
| 37 |
### Training Data Strategy
|
|
|
|
| 38 |
The model was trained on a painstakingly curated dataset of manually cleaned financial statement labels. To maximize performance on a niche dataset, the model utilizes all available high-quality labels for training, with validation performed iteratively against new unseen batches.
|
| 39 |
|
| 40 |
### Performance
|
|
|
|
| 41 |
- **Accuracy**: 90-95% on modern financial reports.
|
| 42 |
- **Robustness**: High accuracy on critical fields (Subtotals and Totals), which are essential for structural validation.
|
| 43 |
- **Limitations**: Accuracy may decrease for companies in highly specialized industries or niche regions with non-standard terminology not present in the training set.
|
|
@@ -45,16 +48,18 @@ The model was trained on a painstakingly curated dataset of manually cleaned fin
|
|
| 45 |
## Training Procedure
|
| 46 |
|
| 47 |
### Input Format
|
|
|
|
| 48 |
The model expects input strings formatted with surrounding context:
|
| 49 |
`[PREV_2] [PREV_1] [SECTION] [RAW_NAME] [NEXT_1] [NEXT_2]`
|
| 50 |
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
|
| 55 |
### Hyperparameters
|
|
|
|
| 56 |
- **Base Model**: FinBERT
|
| 57 |
-
- **
|
| 58 |
|
| 59 |
## Usage
|
| 60 |
|
|
@@ -78,6 +83,6 @@ with torch.no_grad():
|
|
| 78 |
```
|
| 79 |
|
| 80 |
## Acknowledgments & Licensing
|
|
|
|
| 81 |
This project is a fine-tuned version of the FinBERT-Pretrain model developed by Yang et al. (HKUST).
|
| 82 |
Licensed under the **Apache License 2.0**. Same as the base FinBERT model.
|
| 83 |
-
|
|
|
|
| 1 |
---
|
| 2 |
language:
|
| 3 |
+
- en
|
| 4 |
license: apache-2.0
|
| 5 |
tags:
|
| 6 |
+
- financial-analysis
|
| 7 |
+
- transformer
|
| 8 |
+
- classification
|
| 9 |
+
- finbert
|
| 10 |
+
- financial-statements
|
| 11 |
base_model: yiyanghkust/finbert-pretrain
|
| 12 |
model-index:
|
| 13 |
+
- name: tiger-transformer
|
| 14 |
+
results: []
|
| 15 |
---
|
| 16 |
|
| 17 |
# Tiger Transformer (Standardizing Financial Statements)
|
|
|
|
| 25 |
The **Tiger Transformer** serves as a specialized classification engine for financial analysis AI agents. It addresses the inconsistency found in broad-purpose LLMs when mapping diverse, raw line items (e.g., "Cash & Equivalents", "Cash and due from banks") to standardized accounting categories.
|
| 26 |
|
| 27 |
### Key Features:
|
| 28 |
+
|
| 29 |
- **Context-Aware Classification**: Unlike simple keyword matching, this model uses a context window of 2 lines before and 2 lines after the target line to refine predictions.
|
| 30 |
- **Architecture**: Fine-tuned `BertForSequenceClassification` using the FinBERT base.
|
|
|
|
| 31 |
|
| 32 |
## Intended Uses & Limitations
|
| 33 |
|
| 34 |
### Intended Use
|
| 35 |
+
|
| 36 |
Standardizing raw line items extracted from 10-K, 10-Q, and other financial reports into a consistent format for downstream financial modeling (DCF, ROIC analysis, etc.).
|
| 37 |
|
| 38 |
### Training Data Strategy
|
| 39 |
+
|
| 40 |
The model was trained on a painstakingly curated dataset of manually cleaned financial statement labels. To maximize performance on a niche dataset, the model utilizes all available high-quality labels for training, with validation performed iteratively against new unseen batches.
|
| 41 |
|
| 42 |
### Performance
|
| 43 |
+
|
| 44 |
- **Accuracy**: 90-95% on modern financial reports.
|
| 45 |
- **Robustness**: High accuracy on critical fields (Subtotals and Totals), which are essential for structural validation.
|
| 46 |
- **Limitations**: Accuracy may decrease for companies in highly specialized industries or niche regions with non-standard terminology not present in the training set.
|
|
|
|
| 48 |
## Training Procedure
|
| 49 |
|
| 50 |
### Input Format
|
| 51 |
+
|
| 52 |
The model expects input strings formatted with surrounding context:
|
| 53 |
`[PREV_2] [PREV_1] [SECTION] [RAW_NAME] [NEXT_1] [NEXT_2]`
|
| 54 |
|
| 55 |
+
- `[SECTION]`: Balance Sheet or Income Statement.
|
| 56 |
+
- `[RAW_NAME]`: The line item name to be classified.
|
| 57 |
+
- `[PREV/NEXT]`: Surrounding line items providing structural context.
|
| 58 |
|
| 59 |
### Hyperparameters
|
| 60 |
+
|
| 61 |
- **Base Model**: FinBERT
|
| 62 |
+
- **Precision**: Full precision (FP32).
|
| 63 |
|
| 64 |
## Usage
|
| 65 |
|
|
|
|
| 83 |
```
|
| 84 |
|
| 85 |
## Acknowledgments & Licensing
|
| 86 |
+
|
| 87 |
This project is a fine-tuned version of the FinBERT-Pretrain model developed by Yang et al. (HKUST).
|
| 88 |
Licensed under the **Apache License 2.0**. Same as the base FinBERT model.
|
|
|