yonghao
/

risk-control-sequence-models

Model card Files Files and versions

yonghao commited on 4 days ago

Commit

4096e1c

·

verified ·

1 Parent(s): d6d5abd

Add README

Files changed (1) hide show

README.md +42 -20

README.md CHANGED Viewed

@@ -1,26 +1,48 @@
----
-tags:
-- ml-intern
----
-# yonghao/risk-control-sequence-models
-<!-- ml-intern-provenance -->
-## Generated by ML Intern
-This model repository was generated by [ML Intern](https://github.com/huggingface/ml-intern), an agent for machine learning research and development on the Hugging Face Hub.
-- Try ML Intern: https://smolagents-ml-intern.hf.space
-- Source code: https://github.com/huggingface/ml-intern
-## Usage
-```python
-from transformers import AutoModelForCausalLM, AutoTokenizer
-model_id = "yonghao/risk-control-sequence-models"
-tokenizer = AutoTokenizer.from_pretrained(model_id)
-model = AutoModelForCausalLM.from_pretrained(model_id)
 ```
-For non-causal architectures, replace `AutoModelForCausalLM` with the appropriate `AutoModel` class.

+# 风控序列模型调研报告 & 代码模板
+## 📋 文件清单
+| 文件 | 内容 | 行数 |
+|---|---|---|
+| `app_sequence_model.py` | App安装序列建模：CoLES+GRU预训练→微调→LightGBM→图增强 | ~870行 |
+| `credit_bureau_model.py` | 征信数据建模：TabM+PLE+FT-Transformer+LightGBM+阈值校准+PSI监控 | ~950行 |
+| `fusion_model.py` | Late Fusion：两模型输出融合为最终决策 | ~150行 |
+| `research_report.md` | 完整论文调研报告（方法对比+超参数+论文链接） | 详细 |
+## 🚀 快速开始
+```bash
+pip install torch pytorch-lifestream scikit-learn lightgbm pandas numpy scipy
+# 可选: pip install rtdl_num_embeddings rtdl_revisiting_models pytorch-tabular node2vec networkx
 ```
+1. 修改 `CONFIG` 中的特征字段名
+2. 替换数据加载部分
+3. 运行
+## 📑 核心论文
+### App 序列建模
+| 方法 | 论文 | 链接 |
+|---|---|---|
+| CoLES + GRU ⭐ | Contrastive Learning for Event Sequences (KDD 2022) | https://arxiv.org/abs/2002.08232 |
+| Graph-Augmented CoLES | Beyond Isolated Clients (2026) | https://arxiv.org/abs/2604.09085 |
+| LBSF 层级折叠 | Long-term Behavior Sequence Folding (IEEE 2024) | https://arxiv.org/abs/2411.15056 |
+| TabBERT | Tabular Transformers (IBM 2021) | https://arxiv.org/abs/2011.01843 |
+| BehaveGPT | Foundation Model for User Behavior (2025) | https://arxiv.org/abs/2505.17631 |
+| TransactionGPT | Visa 2025 | https://arxiv.org/abs/2511.08939 |
+### 征信数据建模
+| 方法 | 论文 | 链接 |
+|---|---|---|
+| LightGBM/XGBoost ⭐ | Why tree-based models still outperform DL (NeurIPS 2022) | https://arxiv.org/abs/2207.08815 |
+| TabM + PLE ⭐ | Advancing Tabular DL (ICLR 2025) | https://arxiv.org/abs/2410.24210 |
+| FT-Transformer | Revisiting DL for Tabular Data (NeurIPS 2021) | https://arxiv.org/abs/2106.11959 |
+| PLE数值编码 | On Embeddings for Numerical Features (2022) | https://arxiv.org/abs/2203.05556 |
+| SAINT | Improved NN for Tabular Data (2021) | https://arxiv.org/abs/2106.01342 |
+## 🔑 核心结论
+1. **App序列**：用 GRU + CoLES 对比学习（无标签预训练→LightGBM），不要默认 Transformer
+2. **征信数据**：先 LightGBM baseline，再 TabM+PLE 补充，0.5:0.5 集成
+3. **两个模型分开建**，最后 Late Fusion（向量拼接→LightGBM stacking）