| --- |
| tags: |
| - ml-intern |
| --- |
| # 风控序列模型调研报告 & 代码模板 |
|
|
| ## 📋 文件清单 |
|
|
| | 文件 | 内容 | 行数 | |
| |---|---|---| |
| | `app_sequence_model.py` | App安装序列建模:CoLES+GRU预训练→微调→LightGBM→图增强 | ~870行 | |
| | `credit_bureau_model.py` | 征信数据建模:TabM+PLE+FT-Transformer+LightGBM+阈值校准+PSI监控 | ~950行 | |
| | `fusion_model.py` | Late Fusion:两模型输出融合为最终决策 | ~150行 | |
| | `research_report.md` | 完整论文调研报告(方法对比+超参数+论文链接) | 详细 | |
|
|
| ## 🚀 快速开始 |
|
|
| ```bash |
| pip install torch pytorch-lifestream scikit-learn lightgbm pandas numpy scipy |
| # 可选: pip install rtdl_num_embeddings rtdl_revisiting_models pytorch-tabular node2vec networkx |
| ``` |
|
|
| 1. 修改 `CONFIG` 中的特征字段名 |
| 2. 替换数据加载部分 |
| 3. 运行 |
|
|
| ## 📑 核心论文 |
|
|
| ### App 序列建模 |
| | 方法 | 论文 | 链接 | |
| |---|---|---| |
| | CoLES + GRU ⭐ | Contrastive Learning for Event Sequences (KDD 2022) | https://arxiv.org/abs/2002.08232 | |
| | Graph-Augmented CoLES | Beyond Isolated Clients (2026) | https://arxiv.org/abs/2604.09085 | |
| | LBSF 层级折叠 | Long-term Behavior Sequence Folding (IEEE 2024) | https://arxiv.org/abs/2411.15056 | |
| | TabBERT | Tabular Transformers (IBM 2021) | https://arxiv.org/abs/2011.01843 | |
| | BehaveGPT | Foundation Model for User Behavior (2025) | https://arxiv.org/abs/2505.17631 | |
| | TransactionGPT | Visa 2025 | https://arxiv.org/abs/2511.08939 | |
|
|
| ### 征信数据建模 |
| | 方法 | 论文 | 链接 | |
| |---|---|---| |
| | LightGBM/XGBoost ⭐ | Why tree-based models still outperform DL (NeurIPS 2022) | https://arxiv.org/abs/2207.08815 | |
| | TabM + PLE ⭐ | Advancing Tabular DL (ICLR 2025) | https://arxiv.org/abs/2410.24210 | |
| | FT-Transformer | Revisiting DL for Tabular Data (NeurIPS 2021) | https://arxiv.org/abs/2106.11959 | |
| | PLE数值编码 | On Embeddings for Numerical Features (2022) | https://arxiv.org/abs/2203.05556 | |
| | SAINT | Improved NN for Tabular Data (2021) | https://arxiv.org/abs/2106.01342 | |
|
|
| ## 🔑 核心结论 |
|
|
| 1. **App序列**:用 GRU + CoLES 对比学习(无标签预训练→LightGBM),不要默认 Transformer |
| 2. **征信数据**:先 LightGBM baseline,再 TabM+PLE 补充,0.5:0.5 集成 |
| 3. **两个模型分开建**,最后 Late Fusion(向量拼接→LightGBM stacking) |
|
|
| <!-- ml-intern-provenance --> |
| ## Generated by ML Intern |
|
|
| This model repository was generated by [ML Intern](https://github.com/huggingface/ml-intern), an agent for machine learning research and development on the Hugging Face Hub. |
|
|
| - Try ML Intern: https://smolagents-ml-intern.hf.space |
| - Source code: https://github.com/huggingface/ml-intern |
|
|
| ## Usage |
|
|
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| |
| model_id = "yonghao/risk-control-sequence-models" |
| tokenizer = AutoTokenizer.from_pretrained(model_id) |
| model = AutoModelForCausalLM.from_pretrained(model_id) |
| ``` |
|
|
| For non-causal architectures, replace `AutoModelForCausalLM` with the appropriate `AutoModel` class. |
|
|