yonghao commited on
Commit
4096e1c
·
verified ·
1 Parent(s): d6d5abd

Add README

Browse files
Files changed (1) hide show
  1. README.md +42 -20
README.md CHANGED
@@ -1,26 +1,48 @@
1
- ---
2
- tags:
3
- - ml-intern
4
- ---
5
 
6
- # yonghao/risk-control-sequence-models
7
 
8
- <!-- ml-intern-provenance -->
9
- ## Generated by ML Intern
 
 
 
 
10
 
11
- This model repository was generated by [ML Intern](https://github.com/huggingface/ml-intern), an agent for machine learning research and development on the Hugging Face Hub.
12
 
13
- - Try ML Intern: https://smolagents-ml-intern.hf.space
14
- - Source code: https://github.com/huggingface/ml-intern
15
-
16
- ## Usage
17
-
18
- ```python
19
- from transformers import AutoModelForCausalLM, AutoTokenizer
20
-
21
- model_id = "yonghao/risk-control-sequence-models"
22
- tokenizer = AutoTokenizer.from_pretrained(model_id)
23
- model = AutoModelForCausalLM.from_pretrained(model_id)
24
  ```
25
 
26
- For non-causal architectures, replace `AutoModelForCausalLM` with the appropriate `AutoModel` class.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 风控序列模型调研报告 & 代码模板
 
 
 
2
 
3
+ ## 📋 文件清单
4
 
5
+ | 文件 | 内容 | 行数 |
6
+ |---|---|---|
7
+ | `app_sequence_model.py` | App安装序列建模:CoLES+GRU预训练→微调→LightGBM→图增强 | ~870行 |
8
+ | `credit_bureau_model.py` | 征信数据建模:TabM+PLE+FT-Transformer+LightGBM+阈值校准+PSI监控 | ~950行 |
9
+ | `fusion_model.py` | Late Fusion:两模型输出融合为最终决策 | ~150行 |
10
+ | `research_report.md` | 完整论文调研报告(方法对比+超参数+论文链接) | 详细 |
11
 
12
+ ## 🚀 快速开始
13
 
14
+ ```bash
15
+ pip install torch pytorch-lifestream scikit-learn lightgbm pandas numpy scipy
16
+ # 可选: pip install rtdl_num_embeddings rtdl_revisiting_models pytorch-tabular node2vec networkx
 
 
 
 
 
 
 
 
17
  ```
18
 
19
+ 1. 修改 `CONFIG` 中的特征字段名
20
+ 2. 替换数据加载部分
21
+ 3. 运行
22
+
23
+ ## 📑 核心论文
24
+
25
+ ### App 序列建模
26
+ | 方法 | 论文 | 链接 |
27
+ |---|---|---|
28
+ | CoLES + GRU ⭐ | Contrastive Learning for Event Sequences (KDD 2022) | https://arxiv.org/abs/2002.08232 |
29
+ | Graph-Augmented CoLES | Beyond Isolated Clients (2026) | https://arxiv.org/abs/2604.09085 |
30
+ | LBSF 层级折叠 | Long-term Behavior Sequence Folding (IEEE 2024) | https://arxiv.org/abs/2411.15056 |
31
+ | TabBERT | Tabular Transformers (IBM 2021) | https://arxiv.org/abs/2011.01843 |
32
+ | BehaveGPT | Foundation Model for User Behavior (2025) | https://arxiv.org/abs/2505.17631 |
33
+ | TransactionGPT | Visa 2025 | https://arxiv.org/abs/2511.08939 |
34
+
35
+ ### 征信数据建模
36
+ | 方法 | 论文 | 链接 |
37
+ |---|---|---|
38
+ | LightGBM/XGBoost ⭐ | Why tree-based models still outperform DL (NeurIPS 2022) | https://arxiv.org/abs/2207.08815 |
39
+ | TabM + PLE ⭐ | Advancing Tabular DL (ICLR 2025) | https://arxiv.org/abs/2410.24210 |
40
+ | FT-Transformer | Revisiting DL for Tabular Data (NeurIPS 2021) | https://arxiv.org/abs/2106.11959 |
41
+ | PLE数值编码 | On Embeddings for Numerical Features (2022) | https://arxiv.org/abs/2203.05556 |
42
+ | SAINT | Improved NN for Tabular Data (2021) | https://arxiv.org/abs/2106.01342 |
43
+
44
+ ## 🔑 核心结论
45
+
46
+ 1. **App序列**:用 GRU + CoLES 对比学习(无标签预训练→LightGBM),不要默认 Transformer
47
+ 2. **征信数据**:先 LightGBM baseline,再 TabM+PLE 补充,0.5:0.5 集成
48
+ 3. **两个模型分开建**,最后 Late Fusion(向量拼接→LightGBM stacking)