JNU-TSB / README.md
HONGRIZON's picture
Upload README.md
5f1bc5b verified
---
license: apache-2.0
base_model:
- amazon/chronos-2
- EleutherAI/polyglot-ko-1.3b
library_name: transformers
pipeline_tag: time-series-forecasting
language:
- ko
tags:
- jnu-tsb
- time-series
- forecasting
- chronos-2
- polyglot-ko
- korean
- finance
- covariates
- r
- reticulate
- education
---
# JNU-TSB
**JNU-TSB**๋Š” ํ•œ๊ตญ์–ด ๋‰ด์Šค์™€ ์ฃผ๊ฐ€ ์‹œ๊ณ„์—ด์„ ํ•จ๊ป˜ ๋‹ค๋ฃจ๊ธฐ ์œ„ํ•œ ๊ต์œก์šฉ **Time-LLM-style time-series bridge/router**์ž…๋‹ˆ๋‹ค.
```text
Repo ID: HONGRIZON/JNU-TSB
Full name: Jeju National University Time-Series Bridge
Nickname: TSB = Time-Series Bridge, also Time-Series Seungbin
Time-series model: amazon/chronos-2
Korean language model: EleutherAI/polyglot-ko-1.3b
Router: stock only, news only, news + stock hybrid
```
์ด ์ €์žฅ์†Œ๋Š” **Chronos-2 ๋˜๋Š” Polyglot-Ko์˜ ๊ฐ€์ค‘์น˜๋ฅผ ์žฌ๋ฐฐํฌํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.** ์—ฌ๊ธฐ์—๋Š” ๊ฐ€๋ฒผ์šด wrapper ์ฝ”๋“œ, ์„ค์ • ํŒŒ์ผ, ์˜ˆ์ œ ์ฝ”๋“œ, ์ƒ˜ํ”Œ ๋ฐ์ดํ„ฐ๋งŒ ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ๋‘ base model์€ ์‹คํ–‰ ์‹œ Hugging Face์—์„œ ๋‹ค์šด๋กœ๋“œ๋ฉ๋‹ˆ๋‹ค.
## ๊ฐœ์š”
JNU-TSB๋Š” ํ•œ๊ตญ์–ด ๊ธˆ์œต ๋‰ด์Šค ์ œ๋ชฉ์„ ์ผ๋ณ„ ๊ณต๋ณ€๋Ÿ‰์œผ๋กœ ๋ณ€ํ™˜ํ•˜๊ณ , ์ด๋ฅผ Chronos-2์— ์ „๋‹ฌํ•ด ๊ณต๋ณ€๋Ÿ‰ ๊ธฐ๋ฐ˜ ์‹œ๊ณ„์—ด ์˜ˆ์ธก์„ ์ˆ˜ํ–‰ํ•˜๋Š” wrapper-style model repo์ž…๋‹ˆ๋‹ค.
```text
๋‰ด์Šค ์ œ๋ชฉ
-> Polyglot-Ko / keyword fallback
-> ์ผ๋ณ„ 14์ฐจ์› ์ด๋ฒคํŠธ ๊ณต๋ณ€๋Ÿ‰
-> Chronos-2 covariate-informed forecasting
์ฃผ๊ฐ€ ์‹œ๊ณ„์—ด
-> Chronos-2 forecasting
```
์ด ๊ตฌ์กฐ๋Š” **Time-LLM-style**์ž…๋‹ˆ๋‹ค. ์› ๋…ผ๋ฌธ์˜ Time-LLM reprogramming architecture๋ฅผ ์—„๋ฐ€ํžˆ ์žฌ๊ตฌํ˜„ํ•œ ๊ฒƒ์€ ์•„๋‹™๋‹ˆ๋‹ค. ์ˆซ์ž ์‹œ๊ณ„์—ด ์˜ˆ์ธก์€ Chronos-2๊ฐ€ ๋‹ด๋‹นํ•˜๊ณ , ํ•œ๊ตญ์–ด LLM์€ ๋‰ด์Šค ํ…์ŠคํŠธ๋ฅผ ๊ตฌ์กฐํ™”๋œ ๊ณต๋ณ€๋Ÿ‰์œผ๋กœ ๋ฐ”๊พธ๋Š” ์—ญํ• ์„ ๋งก์Šต๋‹ˆ๋‹ค.
## ๋ผ์šฐํ„ฐ ๊ตฌ์กฐ
JNU-TSB๋Š” ์ž…๋ ฅ์— ๋”ฐ๋ผ ์„ธ ๊ฐ€์ง€ ๊ฒฝ๋กœ ์ค‘ ํ•˜๋‚˜๋ฅผ ์ž๋™ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.
| ์ž…๋ ฅ | ๊ฒฝ๋กœ | ์ถœ๋ ฅ |
|---|---|---|
| `stock`๋งŒ ์žˆ์Œ | Chronos-2 ๋‹จ๋… ๊ฒฝ๋กœ | ๋ถ„์œ„์ˆ˜ ์‹œ๊ณ„์—ด ์˜ˆ์ธก |
| `news`๋งŒ ์žˆ์Œ | Polyglot-Ko / keyword fallback ๊ฒฝ๋กœ | ์ด๋ฒคํŠธ ์นดํ…Œ๊ณ ๋ฆฌ, ๊ฐ์„ฑ, confidence, ์ผ๋ณ„ ๊ณต๋ณ€๋Ÿ‰ |
| `stock` + `news` ๋ชจ๋‘ ์žˆ์Œ | ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ๊ฒฝ๋กœ | ๋‰ด์Šค ๊ณต๋ณ€๋Ÿ‰์„ ํฌํ•จํ•œ Chronos-2 ์˜ˆ์ธก |
ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ๊ฒฝ๋กœ๋Š” ๋‹ค์Œ ์ˆœ์„œ๋กœ ๋™์ž‘ํ•ฉ๋‹ˆ๋‹ค.
```text
ํ•œ๊ตญ์–ด ๋‰ด์Šค
-> ์ด๋ฒคํŠธ/๊ฐ์„ฑ ์ถ”์ถœ
-> ์ผ๋ณ„ 14์ฐจ์› covariate ์ƒ์„ฑ
-> ์ฃผ๊ฐ€ context dataframe๊ณผ merge
-> Chronos-2 predict_df ํ˜ธ์ถœ
-> forecast ๋ฐ˜ํ™˜
```
## 14์ฐจ์› ๋‰ด์Šค ๊ณต๋ณ€๋Ÿ‰
๋‰ด์Šค๋Š” ํ•˜๋ฃจ ๋‹จ์œ„๋กœ ์ง‘๊ณ„๋˜์–ด ์•„๋ž˜ 14๊ฐœ ๊ณต๋ณ€๋Ÿ‰์œผ๋กœ ๋ณ€ํ™˜๋ฉ๋‹ˆ๋‹ค.
| ์ปฌ๋Ÿผ | ์˜๋ฏธ |
|---|---|
| `cov_earnings_count` | ์‹ค์ /๋งค์ถœ/์˜์—…์ด์ต ๊ด€๋ จ ๋‰ด์Šค ์ˆ˜ |
| `cov_product_count` | ์ œํ’ˆ ์ถœ์‹œ, ๊ฐœ๋ฐœ, ์–‘์‚ฐ, ๋ฐ˜๋„์ฒด ๊ด€๋ จ ๋‰ด์Šค ์ˆ˜ |
| `cov_macro_count` | ๊ธˆ๋ฆฌ, ํ™˜์œจ, ๊ฒฝ๊ธฐ, ํ•ด์™ธ์‹œ์žฅ ๋“ฑ ๊ฑฐ์‹œ๊ฒฝ์ œ ๋‰ด์Šค ์ˆ˜ |
| `cov_regulation_count` | ๊ทœ์ œ, ์†Œ์†ก, ์ œ์žฌ, ์ •๋ถ€ ์ •์ฑ… ๊ด€๋ จ ๋‰ด์Šค ์ˆ˜ |
| `cov_supply_chain_count` | ๊ณต๊ธ‰๋ง, ์ˆ˜์ฃผ, ๊ณ„์•ฝ, ์ƒ์‚ฐ, ๋ฌผ๋ฅ˜ ๊ด€๋ จ ๋‰ด์Šค ์ˆ˜ |
| `cov_competition_count` | ๊ฒฝ์Ÿ์‚ฌ, ์ ์œ ์œจ, ๊ฐ€๊ฒฉ ๊ฒฝ์Ÿ ๊ด€๋ จ ๋‰ด์Šค ์ˆ˜ |
| `cov_other_count` | ์œ„ ๋ฒ”์ฃผ์— ๋ช…ํ™•ํžˆ ์†ํ•˜์ง€ ์•Š๋Š” ๋‰ด์Šค ์ˆ˜ |
| `cov_sentiment_pos_count` | ๊ธ์ • ๊ฐ์„ฑ ๋‰ด์Šค ์ˆ˜ |
| `cov_sentiment_neg_count` | ๋ถ€์ • ๊ฐ์„ฑ ๋‰ด์Šค ์ˆ˜ |
| `cov_sentiment_neu_count` | ์ค‘๋ฆฝ ๊ฐ์„ฑ ๋‰ด์Šค ์ˆ˜ |
| `cov_news_count` | ํ•ด๋‹น ๋‚ ์งœ์˜ ์ „์ฒด ๋‰ด์Šค ์ˆ˜ |
| `cov_sentiment_mean` | ํ‰๊ท  ๊ฐ์„ฑ ์ ์ˆ˜, `-1`, `0`, `1` ๊ธฐ๋ฐ˜ |
| `cov_confidence_mean` | ํ‰๊ท  ์ถ”์ถœ confidence |
| `cov_event_score` | ๊ฐ์„ฑ ร— confidence์˜ ํ•ฉ |
## ์„ค์น˜
```bash
pip install -U transformers torch accelerate pandas pyarrow chronos-forecasting
```
R์—์„œ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ `reticulate` ๊ฐ€์ƒํ™˜๊ฒฝ์— ์œ„ Python ํŒจํ‚ค์ง€๋ฅผ ์„ค์น˜ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.
## Python ๋น ๋ฅธ ์‹œ์ž‘
๋น ๋ฅธ ํ…Œ์ŠคํŠธ์—์„œ๋Š” `use_llm_extractor=False`๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๊ฒฝ์šฐ Polyglot-Ko๋ฅผ ๋กœ๋“œํ•˜์ง€ ์•Š๊ณ  keyword fallback๋งŒ ์‚ฌ์šฉํ•˜๋ฏ€๋กœ ๊ฐ€๋ณ๊ฒŒ ์‹คํ–‰๋ฉ๋‹ˆ๋‹ค.
```python
from transformers import pipeline
pipe = pipeline(
task="jnu-tsb",
model="HONGRIZON/JNU-TSB",
trust_remote_code=True,
device=-1, # CPU. GPU 0๋ฒˆ์„ ์“ฐ๋ ค๋ฉด 0์œผ๋กœ ๋ณ€๊ฒฝ
)
stock = [
{"timestamp": "2024-12-01", "target": 71000},
{"timestamp": "2024-12-02", "target": 71800},
{"timestamp": "2024-12-03", "target": 70400},
{"timestamp": "2024-12-04", "target": 70900},
{"timestamp": "2024-12-05", "target": 72100},
]
news = [
{"date": "2024-12-01", "title": "์‚ผ์„ฑ์ „์ž HBM ์‹ ์ œํ’ˆ ์ถœ์‹œ"},
{"date": "2024-12-02", "title": "๋ฐ˜๋„์ฒด ์—…ํ™ฉ ๋‘”ํ™” ์šฐ๋ ค"},
]
result = pipe(
{"stock": stock, "news": news},
prediction_length=3,
use_llm_extractor=False,
)
print(result)
```
## AutoModel ์ง์ ‘ ์‚ฌ์šฉ
```python
from transformers import AutoModel
model = AutoModel.from_pretrained(
"HONGRIZON/JNU-TSB",
trust_remote_code=True,
)
result = model.predict(
stock=[{"timestamp": "2024-12-01", "target": 71000}],
news=[{"date": "2024-12-01", "title": "์‚ผ์„ฑ์ „์ž HBM ์‹ ์ œํ’ˆ ์ถœ์‹œ"}],
prediction_length=3,
use_llm_extractor=False,
)
print(result)
```
## R ๋น ๋ฅธ ์‹œ์ž‘
```r
library(reticulate)
# ์ตœ์ดˆ 1ํšŒ๋งŒ ์‹คํ–‰:
# reticulate::virtualenv_create("jnu-tsb-env")
# reticulate::virtualenv_install(
# "jnu-tsb-env",
# c("transformers", "torch", "accelerate", "pandas", "pyarrow", "chronos-forecasting")
# )
use_virtualenv("jnu-tsb-env", required = TRUE)
transformers <- import("transformers")
pipe <- transformers$pipeline(
task = "jnu-tsb",
model = "HONGRIZON/JNU-TSB",
trust_remote_code = TRUE,
device = -1L
)
stock <- list(
list(timestamp = "2024-12-01", target = 71000),
list(timestamp = "2024-12-02", target = 71800),
list(timestamp = "2024-12-03", target = 70400)
)
news <- list(
list(date = "2024-12-01", title = "์‚ผ์„ฑ์ „์ž HBM ์‹ ์ œํ’ˆ ์ถœ์‹œ"),
list(date = "2024-12-02", title = "๋ฐ˜๋„์ฒด ์—…ํ™ฉ ๋‘”ํ™” ์šฐ๋ ค")
)
result <- pipe(
list(stock = stock, news = news),
prediction_length = 3L,
use_llm_extractor = FALSE
)
print(py_to_r(result))
```
## ์ž…๋ ฅ ํ˜•์‹
### `stock`
`stock`์€ pandas DataFrame, list of dicts, ๋˜๋Š” dict of columns ํ˜•์‹์œผ๋กœ ๋„ฃ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ตœ์†Œ ์ปฌ๋Ÿผ์€ ๋‹ค์Œ ๋‘ ๊ฐœ์ž…๋‹ˆ๋‹ค.
```text
timestamp: ๋‚ ์งœ ๋˜๋Š” ์‹œ๊ฐ„
target: ์˜ˆ์ธก ๋Œ€์ƒ ๊ฐ’, ์˜ˆ: ์ข…๊ฐ€
```
`item_id`๊ฐ€ ์—†์œผ๋ฉด ๋‚ด๋ถ€์ ์œผ๋กœ `series_0`์ด ์ž๋™ ๋ถ€์—ฌ๋ฉ๋‹ˆ๋‹ค.
### `news`
`news`๋Š” list of dicts ํ˜•์‹์ž…๋‹ˆ๋‹ค. ๊ฐ ํ•ญ๋ชฉ์€ ์ตœ์†Œํ•œ ๋‚ ์งœ์™€ ์ œ๋ชฉ์„ ๊ฐ€์ ธ์•ผ ํ•ฉ๋‹ˆ๋‹ค.
```json
[
{"date": "2024-12-01", "title": "์‚ผ์„ฑ์ „์ž HBM ์‹ ์ œํ’ˆ ์ถœ์‹œ"},
{"date": "2024-12-02", "title": "๋ฐ˜๋„์ฒด ์—…ํ™ฉ ๋‘”ํ™” ์šฐ๋ ค"}
]
```
`title` ๋Œ€์‹  `headline`, `text`, `content`๋„ ์ธ์‹ํ•ฉ๋‹ˆ๋‹ค.
### `future_news`์™€ `future_covariates`
๋ฏธ๋ž˜์— ์ด๋ฏธ ์•Œ๋ ค์ง„ ๋‰ด์Šค๋‚˜ ์ผ์ •์ด ์žˆ์„ ๋•Œ๋งŒ `future_news` ๋˜๋Š” `future_covariates`๋ฅผ ์‚ฌ์šฉํ•˜์„ธ์š”. ์ผ๋ฐ˜ ๋‰ด์Šค ๋ฐ์ดํ„ฐ๋Š” ๋ณดํ†ต ๋ฏธ๋ž˜ ๊ฐ’์„ ์•Œ ์ˆ˜ ์—†์œผ๋ฏ€๋กœ, ๊ณผ๊ฑฐ ๋‰ด์Šค๋Š” context ๊ตฌ๊ฐ„์˜ past covariate๋กœ๋งŒ ์“ฐ๋Š” ๊ฒƒ์ด ์•ˆ์ „ํ•ฉ๋‹ˆ๋‹ค.
## ์ถœ๋ ฅ ์˜ˆ์‹œ
์ถœ๋ ฅ์€ ์‚ฌ์šฉ๋œ route์™€ ์˜ˆ์ธก ๊ฒฐ๊ณผ๋ฅผ ํ•จ๊ป˜ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
```text
route: text_only | chronos_only | hybrid
repo_id: HONGRIZON/JNU-TSB
forecast: ์˜ˆ์ธก ๊ฒฐ๊ณผ ๋˜๋Š” ์ด๋ฒคํŠธ/๊ณต๋ณ€๋Ÿ‰ ๊ฒฐ๊ณผ
used_naive_fallback: Chronos-2 ์‹คํ–‰ ์‹คํŒจ ์‹œ fallback ์‚ฌ์šฉ ์—ฌ๋ถ€
```
## ์ค‘์š”ํ•œ ์ฃผ์˜์‚ฌํ•ญ
- ์ด ๋ชจ๋ธ์€ ๊ต์œก/์—ฐ๊ตฌ ๋ฐ๋ชจ์šฉ์ž…๋‹ˆ๋‹ค. ํˆฌ์ž ์กฐ์–ธ์ด๋‚˜ ์‹ค์ œ ๋งค๋งค ํŒ๋‹จ์— ์‚ฌ์šฉํ•˜์ง€ ๋งˆ์„ธ์š”.
- `EleutherAI/polyglot-ko-1.3b`๋Š” instruction-tuned JSON extractor๊ฐ€ ์•„๋‹ˆ๋ผ base language model์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ JSON ์ถ”์ถœ์ด ์‹คํŒจํ•  ์ˆ˜ ์žˆ๊ณ , ์ด ์ €์žฅ์†Œ๋Š” keyword fallback์„ ํ•จ๊ป˜ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
- Chronos-2 ๋˜๋Š” Polyglot-Ko ๊ฐ€์ค‘์น˜๋ฅผ ์ด ์ €์žฅ์†Œ์— ํฌํ•จํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์‹คํ–‰ ์‹œ ๊ฐ upstream repo์—์„œ ๋‹ค์šด๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค.
- ์ด ์ €์žฅ์†Œ๋Š” ์› ๋…ผ๋ฌธ Time-LLM์„ ๊ทธ๋Œ€๋กœ ์žฌ๊ตฌํ˜„ํ•œ ๊ฒƒ์ด ์•„๋‹ˆ๋ผ, ํ•œ๊ตญ์–ด ๋‰ด์Šค์™€ ์‹œ๊ณ„์—ด ์˜ˆ์ธก์„ ์—ฐ๊ฒฐํ•˜๋Š” Time-LLM-style wrapper/router์ž…๋‹ˆ๋‹ค.
## ๋ผ์ด์„ ์Šค
Wrapper ์ฝ”๋“œ๋Š” Apache-2.0์œผ๋กœ ๋ฐฐํฌ๋ฉ๋‹ˆ๋‹ค. Upstream base model์ธ `amazon/chronos-2`์™€ `EleutherAI/polyglot-ko-1.3b`๋Š” ๊ฐ Hugging Face repo์˜ ๋ผ์ด์„ ์Šค์™€ ์‚ฌ์šฉ ์กฐ๊ฑด์„ ๋”ฐ๋ฆ…๋‹ˆ๋‹ค.