Add README
Browse files
README.md
CHANGED
|
@@ -1,26 +1,48 @@
|
|
| 1 |
-
|
| 2 |
-
tags:
|
| 3 |
-
- ml-intern
|
| 4 |
-
---
|
| 5 |
|
| 6 |
-
#
|
| 7 |
|
| 8 |
-
|
| 9 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
|
| 11 |
-
|
| 12 |
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
## Usage
|
| 17 |
-
|
| 18 |
-
```python
|
| 19 |
-
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 20 |
-
|
| 21 |
-
model_id = "yonghao/risk-control-sequence-models"
|
| 22 |
-
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
| 23 |
-
model = AutoModelForCausalLM.from_pretrained(model_id)
|
| 24 |
```
|
| 25 |
|
| 26 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# 风控序列模型调研报告 & 代码模板
|
|
|
|
|
|
|
|
|
|
| 2 |
|
| 3 |
+
## 📋 文件清单
|
| 4 |
|
| 5 |
+
| 文件 | 内容 | 行数 |
|
| 6 |
+
|---|---|---|
|
| 7 |
+
| `app_sequence_model.py` | App安装序列建模:CoLES+GRU预训练→微调→LightGBM→图增强 | ~870行 |
|
| 8 |
+
| `credit_bureau_model.py` | 征信数据建模:TabM+PLE+FT-Transformer+LightGBM+阈值校准+PSI监控 | ~950行 |
|
| 9 |
+
| `fusion_model.py` | Late Fusion:两模型输出融合为最终决策 | ~150行 |
|
| 10 |
+
| `research_report.md` | 完整论文调研报告(方法对比+超参数+论文链接) | 详细 |
|
| 11 |
|
| 12 |
+
## 🚀 快速开始
|
| 13 |
|
| 14 |
+
```bash
|
| 15 |
+
pip install torch pytorch-lifestream scikit-learn lightgbm pandas numpy scipy
|
| 16 |
+
# 可选: pip install rtdl_num_embeddings rtdl_revisiting_models pytorch-tabular node2vec networkx
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
```
|
| 18 |
|
| 19 |
+
1. 修改 `CONFIG` 中的特征字段名
|
| 20 |
+
2. 替换数据加载部分
|
| 21 |
+
3. 运行
|
| 22 |
+
|
| 23 |
+
## 📑 核心论文
|
| 24 |
+
|
| 25 |
+
### App 序列建模
|
| 26 |
+
| 方法 | 论文 | 链接 |
|
| 27 |
+
|---|---|---|
|
| 28 |
+
| CoLES + GRU ⭐ | Contrastive Learning for Event Sequences (KDD 2022) | https://arxiv.org/abs/2002.08232 |
|
| 29 |
+
| Graph-Augmented CoLES | Beyond Isolated Clients (2026) | https://arxiv.org/abs/2604.09085 |
|
| 30 |
+
| LBSF 层级折叠 | Long-term Behavior Sequence Folding (IEEE 2024) | https://arxiv.org/abs/2411.15056 |
|
| 31 |
+
| TabBERT | Tabular Transformers (IBM 2021) | https://arxiv.org/abs/2011.01843 |
|
| 32 |
+
| BehaveGPT | Foundation Model for User Behavior (2025) | https://arxiv.org/abs/2505.17631 |
|
| 33 |
+
| TransactionGPT | Visa 2025 | https://arxiv.org/abs/2511.08939 |
|
| 34 |
+
|
| 35 |
+
### 征信数据建模
|
| 36 |
+
| 方法 | 论文 | 链接 |
|
| 37 |
+
|---|---|---|
|
| 38 |
+
| LightGBM/XGBoost ⭐ | Why tree-based models still outperform DL (NeurIPS 2022) | https://arxiv.org/abs/2207.08815 |
|
| 39 |
+
| TabM + PLE ⭐ | Advancing Tabular DL (ICLR 2025) | https://arxiv.org/abs/2410.24210 |
|
| 40 |
+
| FT-Transformer | Revisiting DL for Tabular Data (NeurIPS 2021) | https://arxiv.org/abs/2106.11959 |
|
| 41 |
+
| PLE数值编码 | On Embeddings for Numerical Features (2022) | https://arxiv.org/abs/2203.05556 |
|
| 42 |
+
| SAINT | Improved NN for Tabular Data (2021) | https://arxiv.org/abs/2106.01342 |
|
| 43 |
+
|
| 44 |
+
## 🔑 核心结论
|
| 45 |
+
|
| 46 |
+
1. **App序列**:用 GRU + CoLES 对比学习(无标签预训练→LightGBM),不要默认 Transformer
|
| 47 |
+
2. **征信数据**:先 LightGBM baseline,再 TabM+PLE 补充,0.5:0.5 集成
|
| 48 |
+
3. **两个模型分开建**,最后 Late Fusion(向量拼接→LightGBM stacking)
|