yonghao
/

risk-control-sequence-models

Model card Files Files and versions

risk-control-sequence-models / README.md

yonghao's picture

Update ML Intern artifact metadata

c27168a verified 4 days ago

|

history blame contribute delete

3.04 kB

	---
	tags:
	- ml-intern
	---
	# 风控序列模型调研报告 & 代码模板

	## 📋 文件清单

	\| 文件 \| 内容 \| 行数 \|
	\|---\|---\|---\|
	\| `app_sequence_model.py` \| App安装序列建模：CoLES+GRU预训练→微调→LightGBM→图增强 \| ~870行 \|
	\| `credit_bureau_model.py` \| 征信数据建模：TabM+PLE+FT-Transformer+LightGBM+阈值校准+PSI监控 \| ~950行 \|
	\| `fusion_model.py` \| Late Fusion：两模型输出融合为最终决策 \| ~150行 \|
	\| `research_report.md` \| 完整论文调研报告（方法对比+超参数+论文链接） \| 详细 \|

	## 🚀 快速开始

	```bash
	pip install torch pytorch-lifestream scikit-learn lightgbm pandas numpy scipy
	# 可选: pip install rtdl_num_embeddings rtdl_revisiting_models pytorch-tabular node2vec networkx
	```

	1. 修改 `CONFIG` 中的特征字段名
	2. 替换数据加载部分
	3. 运行

	## 📑 核心论文

	### App 序列建模
	\| 方法 \| 论文 \| 链接 \|
	\|---\|---\|---\|
	\| CoLES + GRU ⭐ \| Contrastive Learning for Event Sequences (KDD 2022) \| https://arxiv.org/abs/2002.08232 \|
	\| Graph-Augmented CoLES \| Beyond Isolated Clients (2026) \| https://arxiv.org/abs/2604.09085 \|
	\| LBSF 层级折叠 \| Long-term Behavior Sequence Folding (IEEE 2024) \| https://arxiv.org/abs/2411.15056 \|
	\| TabBERT \| Tabular Transformers (IBM 2021) \| https://arxiv.org/abs/2011.01843 \|
	\| BehaveGPT \| Foundation Model for User Behavior (2025) \| https://arxiv.org/abs/2505.17631 \|
	\| TransactionGPT \| Visa 2025 \| https://arxiv.org/abs/2511.08939 \|

	### 征信数据建模
	\| 方法 \| 论文 \| 链接 \|
	\|---\|---\|---\|
	\| LightGBM/XGBoost ⭐ \| Why tree-based models still outperform DL (NeurIPS 2022) \| https://arxiv.org/abs/2207.08815 \|
	\| TabM + PLE ⭐ \| Advancing Tabular DL (ICLR 2025) \| https://arxiv.org/abs/2410.24210 \|
	\| FT-Transformer \| Revisiting DL for Tabular Data (NeurIPS 2021) \| https://arxiv.org/abs/2106.11959 \|
	\| PLE数值编码 \| On Embeddings for Numerical Features (2022) \| https://arxiv.org/abs/2203.05556 \|
	\| SAINT \| Improved NN for Tabular Data (2021) \| https://arxiv.org/abs/2106.01342 \|

	## 🔑 核心结论

	1. App序列：用 GRU + CoLES 对比学习（无标签预训练→LightGBM），不要默认 Transformer
	2. 征信数据：先 LightGBM baseline，再 TabM+PLE 补充，0.5:0.5 集成
	3. 两个模型分开建，最后 Late Fusion（向量拼接→LightGBM stacking）

	<!-- ml-intern-provenance -->
	## Generated by ML Intern

	This model repository was generated by [ML Intern](https://github.com/huggingface/ml-intern), an agent for machine learning research and development on the Hugging Face Hub.

	- Try ML Intern: https://smolagents-ml-intern.hf.space
	- Source code: https://github.com/huggingface/ml-intern

	## Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_id = "yonghao/risk-control-sequence-models"
	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(model_id)
	```

	For non-causal architectures, replace `AutoModelForCausalLM` with the appropriate `AutoModel` class.