Add proper model card
Browse files
README.md
ADDED
|
@@ -0,0 +1,164 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
tags:
|
| 6 |
+
- finance
|
| 7 |
+
- text-generation
|
| 8 |
+
- mixture-of-experts
|
| 9 |
+
- continual-learning
|
| 10 |
+
- financial-nlp
|
| 11 |
+
- custom-architecture
|
| 12 |
+
library_name: transformers
|
| 13 |
+
pipeline_tag: text-generation
|
| 14 |
+
---
|
| 15 |
+
|
| 16 |
+
# Meridian.AI β Finance Language Model
|
| 17 |
+
|
| 18 |
+
Meridian.AI is a custom sparse Mixture-of-Experts (MoE) language model continually trained on finance data. It is designed to run on commodity CPU hardware (including GitHub Actions free runners) and improves automatically via scheduled training runs.
|
| 19 |
+
|
| 20 |
+
> **Not financial advice.** This is an experimental research model.
|
| 21 |
+
|
| 22 |
+
---
|
| 23 |
+
|
| 24 |
+
## Model Details
|
| 25 |
+
|
| 26 |
+
| Property | Value |
|
| 27 |
+
|---|---|
|
| 28 |
+
| Architecture | Custom SMoE + GQA + RoPE + SwiGLU + Numeracy Encoding |
|
| 29 |
+
| Total parameters | ~479M (tied embeddings) |
|
| 30 |
+
| Unique parameters | ~283M |
|
| 31 |
+
| Experts | 8 total, top-2 active per token |
|
| 32 |
+
| Tokenizer | `Qwen/Qwen2.5-0.5B` (151k vocab) |
|
| 33 |
+
| Context length | 2048 tokens |
|
| 34 |
+
| Training method | Continual learning with EWC (Elastic Weight Consolidation) |
|
| 35 |
+
| License | MIT |
|
| 36 |
+
|
| 37 |
+
---
|
| 38 |
+
|
| 39 |
+
## Architecture
|
| 40 |
+
|
| 41 |
+
Meridian.AI is a fully custom transformer built from scratch with the following components:
|
| 42 |
+
|
| 43 |
+
- **Sparse MoE FFN** β 8 experts per MoE layer, top-2 routing. Only 2 of 8 experts activate per token, keeping compute low while retaining capacity. MoE layers alternate every 2nd transformer layer.
|
| 44 |
+
- **Grouped Query Attention (GQA)** β 12 query heads, 4 key/value heads. Reduces memory bandwidth during inference.
|
| 45 |
+
- **Rotary Position Embeddings (RoPE)** β `rope_theta=500,000` for length generalisation.
|
| 46 |
+
- **SwiGLU FFN** β activation function used in dense layers and expert FFNs.
|
| 47 |
+
- **RMSNorm** β replaces LayerNorm for faster normalisation.
|
| 48 |
+
- **Financial Numeracy Encoding** β a learned 64-dim embedding for numeric tokens to improve precision on quantitative finance tasks.
|
| 49 |
+
- **Elastic Weight Consolidation (EWC)** β prevents catastrophic forgetting across continual training runs.
|
| 50 |
+
- **Tied word embeddings** β input embeddings and `lm_head` share weights, saving ~197M parameters.
|
| 51 |
+
|
| 52 |
+
---
|
| 53 |
+
|
| 54 |
+
## How to Use
|
| 55 |
+
|
| 56 |
+
> The model weights are stored under the `checkpoint/` subfolder in this repo.
|
| 57 |
+
|
| 58 |
+
```python
|
| 59 |
+
import torch
|
| 60 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 61 |
+
|
| 62 |
+
repo_id = "meridianal/FinAI"
|
| 63 |
+
|
| 64 |
+
tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="checkpoint")
|
| 65 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 66 |
+
repo_id,
|
| 67 |
+
subfolder="checkpoint",
|
| 68 |
+
trust_remote_code=True,
|
| 69 |
+
torch_dtype=torch.float32,
|
| 70 |
+
low_cpu_mem_usage=True,
|
| 71 |
+
)
|
| 72 |
+
model.eval()
|
| 73 |
+
|
| 74 |
+
prompt = """### Instruction:
|
| 75 |
+
What does a high price-to-earnings ratio indicate about a stock?
|
| 76 |
+
|
| 77 |
+
### Response:
|
| 78 |
+
"""
|
| 79 |
+
|
| 80 |
+
inputs = tokenizer(prompt, return_tensors="pt")
|
| 81 |
+
with torch.no_grad():
|
| 82 |
+
out = model.generate(
|
| 83 |
+
**inputs,
|
| 84 |
+
max_new_tokens=200,
|
| 85 |
+
do_sample=True,
|
| 86 |
+
temperature=0.8,
|
| 87 |
+
top_p=0.92,
|
| 88 |
+
repetition_penalty=1.3,
|
| 89 |
+
no_repeat_ngram_size=3,
|
| 90 |
+
pad_token_id=tokenizer.pad_token_id,
|
| 91 |
+
eos_token_id=tokenizer.eos_token_id,
|
| 92 |
+
)
|
| 93 |
+
|
| 94 |
+
print(tokenizer.decode(out[0], skip_special_tokens=True))
|
| 95 |
+
```
|
| 96 |
+
|
| 97 |
+
### Prompt format
|
| 98 |
+
|
| 99 |
+
All training examples use this instruction/response format:
|
| 100 |
+
|
| 101 |
+
```
|
| 102 |
+
### Instruction:
|
| 103 |
+
<your question or task>
|
| 104 |
+
|
| 105 |
+
### Response:
|
| 106 |
+
<answer>
|
| 107 |
+
```
|
| 108 |
+
|
| 109 |
+
Classification tasks are also formatted this way with a short label-only response.
|
| 110 |
+
|
| 111 |
+
### Generation tips
|
| 112 |
+
|
| 113 |
+
Continual training can introduce mild repetition. Recommended settings:
|
| 114 |
+
|
| 115 |
+
| Parameter | Range |
|
| 116 |
+
|---|---|
|
| 117 |
+
| `temperature` | 0.7 β 0.95 |
|
| 118 |
+
| `top_p` | 0.85 β 0.95 |
|
| 119 |
+
| `repetition_penalty` | 1.2 β 1.4 |
|
| 120 |
+
| `no_repeat_ngram_size` | 3 |
|
| 121 |
+
|
| 122 |
+
If you see repeated phrases, increase `repetition_penalty` and lower `temperature`.
|
| 123 |
+
|
| 124 |
+
---
|
| 125 |
+
|
| 126 |
+
## Training Data
|
| 127 |
+
|
| 128 |
+
Training streams finance datasets from the FinanceMTEB family:
|
| 129 |
+
|
| 130 |
+
- Financial sentiment analysis (FinancialPhraseBank, etc.)
|
| 131 |
+
- ESG and sustainability classification
|
| 132 |
+
- FOMC statement analysis
|
| 133 |
+
- Fraud and financial complaint datasets
|
| 134 |
+
- Financial QA pairs
|
| 135 |
+
- Earnings call and filing excerpts
|
| 136 |
+
|
| 137 |
+
Datasets are loaded in streaming mode with a 15MB-per-source cap to stay within GitHub Actions memory limits.
|
| 138 |
+
|
| 139 |
+
---
|
| 140 |
+
|
| 141 |
+
## Continual Learning
|
| 142 |
+
|
| 143 |
+
The model trains automatically via GitHub Actions on a scheduled hourly cron. Key features:
|
| 144 |
+
|
| 145 |
+
- **EWC regularisation** β Fisher information matrix computed from recent data protects previously learned weights from being overwritten.
|
| 146 |
+
- **RAM-safe checkpointing** β training halts and saves before hitting memory limits (`MAX_RAM_GB=13`).
|
| 147 |
+
- **Optimizer-free saves** β AdaFactor optimizer state is discarded before upload to keep checkpoint size small.
|
| 148 |
+
- **Auto-recovery** β each run pulls the latest checkpoint from this repo before training, resuming from where the last run left off.
|
| 149 |
+
|
| 150 |
+
---
|
| 151 |
+
|
| 152 |
+
## Limitations
|
| 153 |
+
|
| 154 |
+
- Experimental model β outputs may be incorrect, hallucinated, or outdated.
|
| 155 |
+
- Not intended for production financial applications.
|
| 156 |
+
- Continual training without human evaluation gates means quality can regress between runs.
|
| 157 |
+
- Numeric reasoning is improved by the numeracy encoder but not guaranteed accurate.
|
| 158 |
+
|
| 159 |
+
---
|
| 160 |
+
|
| 161 |
+
## Source Code
|
| 162 |
+
|
| 163 |
+
Training pipeline, architecture, and CI workflows:
|
| 164 |
+
[github.com/MeridianAlgo/FinAI](https://github.com/MeridianAlgo/FinAI)
|