ccloud0525 commited on
Commit ·
28e2d77
1
Parent(s): 0312e14
feat: "first commit"
Browse files- README.md +0 -128
- bert_config/config.json +23 -0
- tokenizer.json → bert_config/tokenizer.json +0 -0
- tokenizer_config.json → bert_config/tokenizer_config.json +0 -0
- vocab.txt → bert_config/vocab.txt +0 -0
- config.json +4 -53
- configuration_aurora.py +1 -46
- modeling_aurora.py +1 -2
- vit_config/config.json +21 -0
- preprocessor_config.json → vit_config/preprocessor_config.json +0 -1
README.md
DELETED
|
@@ -1,128 +0,0 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: mit
|
| 3 |
-
language:
|
| 4 |
-
- en
|
| 5 |
-
pipeline_tag: time-series-forecasting
|
| 6 |
-
tags:
|
| 7 |
-
- code
|
| 8 |
-
---
|
| 9 |
-
|
| 10 |
-
<div align="center">
|
| 11 |
-
<img alt="intro" src="https://cdn-uploads.huggingface.co/production/uploads/66276727368ec2a0b933772c/ytpsIAr98keUvNouoOVmb.png" width="30%"/>
|
| 12 |
-
<h1> Aurora: Towards Universal Generative Multimodal Time Series Forecasting </h1>
|
| 13 |
-
The official code repo of our ICLR 26's paper: <a href="https://arxiv.org/pdf/2509.22295">Aurora: Towards Universal Generative Multimodal Time Series Forecasting</a>
|
| 14 |
-
|
| 15 |
-
[](https://arxiv.org/pdf/2509.22295) [](https://www.python.org/) [](https://pytorch.org/) 
|
| 16 |
-
</div>
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
## Introduction
|
| 23 |
-
|
| 24 |
-
Aurora is a highly capable multimodal time series foundation model. Based on the **Modality-Guided Multi-head Self-Attention** and **Prototype-Guided Flow Matching**, Aurora can effectively utilize the domain-specific knowledge contained in modalities and support generative probabilistic forecasting, thus covering versatile forecasting scenarios.
|
| 25 |
-
|
| 26 |
-
See **Figure 1**, to our best knowldege, Aurora is the first pretrained multimodal time series foundation model! Evaluated on three well-recognized benchmarks, including TimeMMD, TSFM-Bench, and ProbTS, Aurora is demonstrated the state-of-the-art.
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
<div align="center">
|
| 30 |
-
<img alt="intro" src="https://cdn-uploads.huggingface.co/production/uploads/66276727368ec2a0b933772c/YdsPeh5mrn_lef19vQXfa.png" width="60%"/>
|
| 31 |
-
</div>
|
| 32 |
-
|
| 33 |
-
## Architecture
|
| 34 |
-
|
| 35 |
-
In this work, we pretrain Aurora in a cross-modality paradigm, which adopts Channel-Independence on time series data, and models corresponding multimodal interaction to inject domain knowledge. Note that the each variable of time series is first normalized through Instance Normalization to mitigate the value discrepancy. See **Figure 2**, Aurora mainly consists of two phases: 1) in Aurora Encoder, we tokenize and encode each modality into modal features, then fuse them to form multimodal representations; 2) in Aurora Decoder, we utilize a Condition Decoder to obtain the multimodal conditions of future tokens, leverage a Prototype Retreiver to retrieve the future prototypes based on the domain knowledge, and conduct flow matching on them to make generative probabilistic forecasts.
|
| 36 |
-
|
| 37 |
-
<div align="center">
|
| 38 |
-
<img alt="intro" src="https://cdn-uploads.huggingface.co/production/uploads/66276727368ec2a0b933772c/d82jT96jiGD0QL9s8RYg-.png" width="100%"/>
|
| 39 |
-
</div>
|
| 40 |
-
|
| 41 |
-
## Quickstart
|
| 42 |
-
|
| 43 |
-
We release the original code of Aurora in this repo. You can also download the pretrained checkpoints in our [huggingface](https://huggingface.co/DecisionIntelligence/Aurora) repo and put them in the folder: aurora/.
|
| 44 |
-
|
| 45 |
-
If you want to pretrain an Aurora on your own time series corpus, you need to install the following important packages:
|
| 46 |
-
|
| 47 |
-
```shell
|
| 48 |
-
$ pip install torch==2.4.0
|
| 49 |
-
$ pip install torchvision==0.19.0
|
| 50 |
-
$ pip install transformers[torch]
|
| 51 |
-
```
|
| 52 |
-
|
| 53 |
-
## Experiments
|
| 54 |
-
|
| 55 |
-
You should refer to our [github repo](https://github.com/decisionintelligence/Aurora) for the complete experimental pipelines. For benchmarking (TSFM-Bench, ProbTS, TimeMMD, TFB, and EPF), you can install additional packages based on the requirement files under folders, and the datasets can be fetched from this [link](https://drive.google.com/file/d/12tJk858WaoG7ZVSvUq8KU1oHfGNJrARF/view?usp=drive_link). All experimental results can be reproduced by running the scripts in the benchmark folder:
|
| 56 |
-
|
| 57 |
-
```shell
|
| 58 |
-
# TimeMMD
|
| 59 |
-
TimeMMD/scripts/run_aurora_timemmd_zero_shot.sh
|
| 60 |
-
|
| 61 |
-
# EPF
|
| 62 |
-
EPF/scripts/run_aurora_short_term_zero_shot.sh
|
| 63 |
-
|
| 64 |
-
# ProbTS
|
| 65 |
-
ProbTS/scripts/run_aurora_probts.sh
|
| 66 |
-
|
| 67 |
-
# TSFM-Bench
|
| 68 |
-
TFB/scripts/run_aurora_tfb.sh
|
| 69 |
-
|
| 70 |
-
# TFB univaraite
|
| 71 |
-
TFB/scripts/run_aurora_uni.sh
|
| 72 |
-
```
|
| 73 |
-
|
| 74 |
-
## Performance
|
| 75 |
-
|
| 76 |
-
**Aurora ahieves consistent state-of-the-art performance on these 5 benchmarks:**
|
| 77 |
-
|
| 78 |
-
<div align="center">
|
| 79 |
-
<img alt="arch" src="https://cdn-uploads.huggingface.co/production/uploads/66276727368ec2a0b933772c/Vh0ENMXJWwiPkWvMeeftG.png" width="100%"/>
|
| 80 |
-
</div>
|
| 81 |
-
|
| 82 |
-
<div align="center">
|
| 83 |
-
<img alt="arch" src="https://cdn-uploads.huggingface.co/production/uploads/66276727368ec2a0b933772c/2nPl7KumS6DU2lRzm8ACr.png" width="100%"/>
|
| 84 |
-
</div>
|
| 85 |
-
|
| 86 |
-
<div align="center">
|
| 87 |
-
<img alt="arch" src="https://cdn-uploads.huggingface.co/production/uploads/66276727368ec2a0b933772c/glgp6HoirIEO3yWBQD2Hw.png" width="100%"/>
|
| 88 |
-
</div>
|
| 89 |
-
|
| 90 |
-
<div align="center">
|
| 91 |
-
<img alt="arch" src="https://cdn-uploads.huggingface.co/production/uploads/66276727368ec2a0b933772c/RmOgS8recYalH-FjsfEOM.png" width="100%"/>
|
| 92 |
-
</div>
|
| 93 |
-
|
| 94 |
-
<div align="center">
|
| 95 |
-
<img alt="arch" src="https://cdn-uploads.huggingface.co/production/uploads/66276727368ec2a0b933772c/JatnUn_fSmD2eJdMPb68y.png" width="100%"/>
|
| 96 |
-
</div>
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
## Citation
|
| 100 |
-
|
| 101 |
-
If you find this repo useful, please cite our paper.
|
| 102 |
-
|
| 103 |
-
```latex
|
| 104 |
-
@inproceedings{wu2026aurora,
|
| 105 |
-
title = {Aurora: Towards Universal Generative Multimodal Time Series Forecasting},
|
| 106 |
-
author = {Wu, Xingjian and Jin, Jianxin and Qiu, Wanghui and Chen, Peng and Shu, Yang and Yang, Bin and Guo, Chenjuan},
|
| 107 |
-
booktitle = {ICLR},
|
| 108 |
-
year = {2026}
|
| 109 |
-
}
|
| 110 |
-
```
|
| 111 |
-
|
| 112 |
-
|
| 113 |
-
|
| 114 |
-
## Contact
|
| 115 |
-
|
| 116 |
-
If you have any questions or suggestions, feel free to contact:
|
| 117 |
-
|
| 118 |
-
- [Xingjian Wu](https://ccloud0525.github.io/) ([xjwu@stu.ecnu.edu.cn](mailto:xjwu@stu.ecnu.edu.cn))
|
| 119 |
-
- [Peng Chen](https://pengchen12.github.io/) (pchen@stu.ecnu.edu.cn)
|
| 120 |
-
|
| 121 |
-
Or describe it in Issues.
|
| 122 |
-
|
| 123 |
-
|
| 124 |
-
|
| 125 |
-
|
| 126 |
-
|
| 127 |
-
|
| 128 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
bert_config/config.json
ADDED
|
@@ -0,0 +1,23 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"architectures": [
|
| 3 |
+
"BertForMaskedLM"
|
| 4 |
+
],
|
| 5 |
+
"attention_probs_dropout_prob": 0.1,
|
| 6 |
+
"gradient_checkpointing": false,
|
| 7 |
+
"hidden_act": "gelu",
|
| 8 |
+
"hidden_dropout_prob": 0.1,
|
| 9 |
+
"hidden_size": 768,
|
| 10 |
+
"initializer_range": 0.02,
|
| 11 |
+
"intermediate_size": 3072,
|
| 12 |
+
"layer_norm_eps": 1e-12,
|
| 13 |
+
"max_position_embeddings": 512,
|
| 14 |
+
"model_type": "bert",
|
| 15 |
+
"num_attention_heads": 12,
|
| 16 |
+
"num_hidden_layers": 12,
|
| 17 |
+
"pad_token_id": 0,
|
| 18 |
+
"position_embedding_type": "absolute",
|
| 19 |
+
"transformers_version": "4.6.0.dev0",
|
| 20 |
+
"type_vocab_size": 2,
|
| 21 |
+
"use_cache": true,
|
| 22 |
+
"vocab_size": 30522
|
| 23 |
+
}
|
tokenizer.json → bert_config/tokenizer.json
RENAMED
|
File without changes
|
tokenizer_config.json → bert_config/tokenizer_config.json
RENAMED
|
File without changes
|
vocab.txt → bert_config/vocab.txt
RENAMED
|
File without changes
|
config.json
CHANGED
|
@@ -1,19 +1,19 @@
|
|
| 1 |
{
|
|
|
|
| 2 |
"architectures": [
|
| 3 |
"AuroraForPrediction"
|
| 4 |
],
|
| 5 |
-
"model_type": "aurora",
|
| 6 |
"auto_map": {
|
| 7 |
"AutoConfig": "configuration_aurora.AuroraConfig",
|
| 8 |
"AutoModelForCausalLM": "modeling_aurora.AuroraForPrediction"
|
| 9 |
},
|
| 10 |
-
|
| 11 |
-
"hidden_size": 256,
|
| 12 |
"dropout_rate": 0.2,
|
| 13 |
"hidden_act": "silu",
|
|
|
|
| 14 |
"token_len": 48,
|
| 15 |
"intermediate_size": 512,
|
| 16 |
"max_position_embeddings": 10000,
|
|
|
|
| 17 |
"num_attention_heads": 8,
|
| 18 |
"num_enc_layers": 1,
|
| 19 |
"num_dec_layers": 9,
|
|
@@ -27,60 +27,11 @@
|
|
| 27 |
"mask_ratio": 0.5,
|
| 28 |
"norm_mode": "batch",
|
| 29 |
"num_prototypes": 1000,
|
| 30 |
-
|
| 31 |
"num_retriever_enc_layers": 1,
|
| 32 |
"num_retriever_dec_layers": 1,
|
| 33 |
"num_text_cross_layers": 1,
|
| 34 |
"num_vision_cross_layers": 1,
|
| 35 |
"num_text_connect_layers": 1,
|
| 36 |
"num_vision_connect_layers": 1,
|
| 37 |
-
"num_distill": 10
|
| 38 |
-
|
| 39 |
-
"text_config": {
|
| 40 |
-
"_name_or_path": "google-bert/bert-base-uncased",
|
| 41 |
-
"architectures": [
|
| 42 |
-
"BertForMaskedLM"
|
| 43 |
-
],
|
| 44 |
-
"attention_probs_dropout_prob": 0.1,
|
| 45 |
-
"gradient_checkpointing": false,
|
| 46 |
-
"hidden_act": "gelu",
|
| 47 |
-
"hidden_dropout_prob": 0.1,
|
| 48 |
-
"hidden_size": 768,
|
| 49 |
-
"initializer_range": 0.02,
|
| 50 |
-
"intermediate_size": 3072,
|
| 51 |
-
"layer_norm_eps": 1e-12,
|
| 52 |
-
"max_position_embeddings": 512,
|
| 53 |
-
"model_type": "bert",
|
| 54 |
-
"num_attention_heads": 12,
|
| 55 |
-
"num_hidden_layers": 12,
|
| 56 |
-
"pad_token_id": 0,
|
| 57 |
-
"position_embedding_type": "absolute",
|
| 58 |
-
"transformers_version": "4.6.0.dev0",
|
| 59 |
-
"type_vocab_size": 2,
|
| 60 |
-
"use_cache": true,
|
| 61 |
-
"vocab_size": 30522
|
| 62 |
-
},
|
| 63 |
-
|
| 64 |
-
"vision_config": {
|
| 65 |
-
"_name_or_path": "google/vit-base-patch16-224-in21k",
|
| 66 |
-
"architectures": [
|
| 67 |
-
"ViTModel"
|
| 68 |
-
],
|
| 69 |
-
"attention_probs_dropout_prob": 0.0,
|
| 70 |
-
"hidden_act": "gelu",
|
| 71 |
-
"hidden_dropout_prob": 0.0,
|
| 72 |
-
"hidden_size": 768,
|
| 73 |
-
"image_size": 224,
|
| 74 |
-
"initializer_range": 0.02,
|
| 75 |
-
"intermediate_size": 3072,
|
| 76 |
-
"layer_norm_eps": 1e-12,
|
| 77 |
-
"model_type": "vit",
|
| 78 |
-
"num_attention_heads": 12,
|
| 79 |
-
"num_channels": 3,
|
| 80 |
-
"num_hidden_layers": 12,
|
| 81 |
-
"patch_size": 16,
|
| 82 |
-
"qkv_bias": true,
|
| 83 |
-
"transformers_version": "4.13.0.dev0"
|
| 84 |
-
}
|
| 85 |
-
|
| 86 |
}
|
|
|
|
| 1 |
{
|
| 2 |
+
"_name_or_path": "aurora_base",
|
| 3 |
"architectures": [
|
| 4 |
"AuroraForPrediction"
|
| 5 |
],
|
|
|
|
| 6 |
"auto_map": {
|
| 7 |
"AutoConfig": "configuration_aurora.AuroraConfig",
|
| 8 |
"AutoModelForCausalLM": "modeling_aurora.AuroraForPrediction"
|
| 9 |
},
|
|
|
|
|
|
|
| 10 |
"dropout_rate": 0.2,
|
| 11 |
"hidden_act": "silu",
|
| 12 |
+
"hidden_size": 256,
|
| 13 |
"token_len": 48,
|
| 14 |
"intermediate_size": 512,
|
| 15 |
"max_position_embeddings": 10000,
|
| 16 |
+
"model_type": "aurora",
|
| 17 |
"num_attention_heads": 8,
|
| 18 |
"num_enc_layers": 1,
|
| 19 |
"num_dec_layers": 9,
|
|
|
|
| 27 |
"mask_ratio": 0.5,
|
| 28 |
"norm_mode": "batch",
|
| 29 |
"num_prototypes": 1000,
|
|
|
|
| 30 |
"num_retriever_enc_layers": 1,
|
| 31 |
"num_retriever_dec_layers": 1,
|
| 32 |
"num_text_cross_layers": 1,
|
| 33 |
"num_vision_cross_layers": 1,
|
| 34 |
"num_text_connect_layers": 1,
|
| 35 |
"num_vision_connect_layers": 1,
|
| 36 |
+
"num_distill": 10
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
}
|
configuration_aurora.py
CHANGED
|
@@ -1,4 +1,4 @@
|
|
| 1 |
-
from transformers import PretrainedConfig
|
| 2 |
|
| 3 |
|
| 4 |
class AuroraConfig(PretrainedConfig):
|
|
@@ -6,7 +6,6 @@ class AuroraConfig(PretrainedConfig):
|
|
| 6 |
|
| 7 |
def __init__(
|
| 8 |
self,
|
| 9 |
-
# --- Aurora Core Parameters ---
|
| 10 |
token_len: int = 48,
|
| 11 |
hidden_size: int = 512,
|
| 12 |
intermediate_size: int = 1024,
|
|
@@ -17,8 +16,6 @@ class AuroraConfig(PretrainedConfig):
|
|
| 17 |
rope_theta: int = 10000,
|
| 18 |
dropout_rate: float = 0.2,
|
| 19 |
max_position_embeddings: int = 10000,
|
| 20 |
-
|
| 21 |
-
# --- Diffusion / Flow Matching ---
|
| 22 |
num_sampling_steps: int = 50,
|
| 23 |
flow_loss_depth: int = 3,
|
| 24 |
diffusion_batch_mul: int = 4,
|
|
@@ -26,8 +23,6 @@ class AuroraConfig(PretrainedConfig):
|
|
| 26 |
mask_ratio: float = 0.5,
|
| 27 |
norm_mode: str = 'batch',
|
| 28 |
num_prototypes: int = 1024,
|
| 29 |
-
|
| 30 |
-
# --- Fusion Layers ---
|
| 31 |
num_retriever_enc_layers: int = 1,
|
| 32 |
num_retriever_dec_layers: int = 1,
|
| 33 |
num_text_cross_layers: int = 1,
|
|
@@ -35,11 +30,6 @@ class AuroraConfig(PretrainedConfig):
|
|
| 35 |
num_text_connect_layers: int = 1,
|
| 36 |
num_vision_connect_layers: int = 1,
|
| 37 |
num_distill: int = 10,
|
| 38 |
-
|
| 39 |
-
# --- Sub-Model Configurations (New) ---
|
| 40 |
-
vision_config=None,
|
| 41 |
-
text_config=None,
|
| 42 |
-
|
| 43 |
**kwargs,
|
| 44 |
):
|
| 45 |
self.token_len = token_len
|
|
@@ -67,41 +57,6 @@ class AuroraConfig(PretrainedConfig):
|
|
| 67 |
self.num_vision_connect_layers = num_vision_connect_layers
|
| 68 |
self.num_distill = num_distill
|
| 69 |
|
| 70 |
-
if vision_config is None:
|
| 71 |
-
self.vision_config = ViTConfig()
|
| 72 |
-
elif isinstance(vision_config, dict):
|
| 73 |
-
self.vision_config = ViTConfig(**vision_config)
|
| 74 |
-
else:
|
| 75 |
-
self.vision_config = vision_config
|
| 76 |
-
|
| 77 |
-
assert text_config is None
|
| 78 |
-
|
| 79 |
-
if text_config is None:
|
| 80 |
-
self.text_config = BertConfig()
|
| 81 |
-
elif isinstance(text_config, dict):
|
| 82 |
-
self.text_config = BertConfig(**text_config)
|
| 83 |
-
else:
|
| 84 |
-
self.text_config = text_config
|
| 85 |
-
|
| 86 |
super().__init__(
|
| 87 |
**kwargs,
|
| 88 |
)
|
| 89 |
-
|
| 90 |
-
def to_dict(self):
|
| 91 |
-
"""
|
| 92 |
-
保存配置时调用。必须把内部嵌套的 Config 对象转回字典。
|
| 93 |
-
"""
|
| 94 |
-
output = super().to_dict()
|
| 95 |
-
|
| 96 |
-
# 将子 Config 对象递归转为字典
|
| 97 |
-
if isinstance(self.vision_config, PretrainedConfig):
|
| 98 |
-
output["vision_config"] = self.vision_config.to_dict()
|
| 99 |
-
else:
|
| 100 |
-
output["vision_config"] = self.vision_config
|
| 101 |
-
|
| 102 |
-
if isinstance(self.text_config, PretrainedConfig):
|
| 103 |
-
output["text_config"] = self.text_config.to_dict()
|
| 104 |
-
else:
|
| 105 |
-
output["text_config"] = self.text_config
|
| 106 |
-
|
| 107 |
-
return output
|
|
|
|
| 1 |
+
from transformers import PretrainedConfig
|
| 2 |
|
| 3 |
|
| 4 |
class AuroraConfig(PretrainedConfig):
|
|
|
|
| 6 |
|
| 7 |
def __init__(
|
| 8 |
self,
|
|
|
|
| 9 |
token_len: int = 48,
|
| 10 |
hidden_size: int = 512,
|
| 11 |
intermediate_size: int = 1024,
|
|
|
|
| 16 |
rope_theta: int = 10000,
|
| 17 |
dropout_rate: float = 0.2,
|
| 18 |
max_position_embeddings: int = 10000,
|
|
|
|
|
|
|
| 19 |
num_sampling_steps: int = 50,
|
| 20 |
flow_loss_depth: int = 3,
|
| 21 |
diffusion_batch_mul: int = 4,
|
|
|
|
| 23 |
mask_ratio: float = 0.5,
|
| 24 |
norm_mode: str = 'batch',
|
| 25 |
num_prototypes: int = 1024,
|
|
|
|
|
|
|
| 26 |
num_retriever_enc_layers: int = 1,
|
| 27 |
num_retriever_dec_layers: int = 1,
|
| 28 |
num_text_cross_layers: int = 1,
|
|
|
|
| 30 |
num_text_connect_layers: int = 1,
|
| 31 |
num_vision_connect_layers: int = 1,
|
| 32 |
num_distill: int = 10,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
**kwargs,
|
| 34 |
):
|
| 35 |
self.token_len = token_len
|
|
|
|
| 57 |
self.num_vision_connect_layers = num_vision_connect_layers
|
| 58 |
self.num_distill = num_distill
|
| 59 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 60 |
super().__init__(
|
| 61 |
**kwargs,
|
| 62 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
modeling_aurora.py
CHANGED
|
@@ -500,7 +500,7 @@ class AuroraModel(nn.Module):
|
|
| 500 |
)
|
| 501 |
|
| 502 |
|
| 503 |
-
class AuroraForPrediction(
|
| 504 |
def __init__(self, config: AuroraConfig):
|
| 505 |
super().__init__(config)
|
| 506 |
self.config = config
|
|
@@ -537,7 +537,6 @@ class AuroraForPrediction(TSGenerationMixin, AuroraPreTrainedModel):
|
|
| 537 |
revin: Optional[bool] = True,
|
| 538 |
num_samples: Optional[int] = 1,
|
| 539 |
inference_token_len: Optional[int] = 48,
|
| 540 |
-
**kwargs
|
| 541 |
):
|
| 542 |
output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
|
| 543 |
output_hidden_states = output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
|
|
|
|
| 500 |
)
|
| 501 |
|
| 502 |
|
| 503 |
+
class AuroraForPrediction(AuroraPreTrainedModel, TSGenerationMixin):
|
| 504 |
def __init__(self, config: AuroraConfig):
|
| 505 |
super().__init__(config)
|
| 506 |
self.config = config
|
|
|
|
| 537 |
revin: Optional[bool] = True,
|
| 538 |
num_samples: Optional[int] = 1,
|
| 539 |
inference_token_len: Optional[int] = 48,
|
|
|
|
| 540 |
):
|
| 541 |
output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
|
| 542 |
output_hidden_states = output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
|
vit_config/config.json
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"_name_or_path": "google/vit-base-patch16-224-in21k",
|
| 3 |
+
"architectures": [
|
| 4 |
+
"ViTModel"
|
| 5 |
+
],
|
| 6 |
+
"attention_probs_dropout_prob": 0.0,
|
| 7 |
+
"hidden_act": "gelu",
|
| 8 |
+
"hidden_dropout_prob": 0.0,
|
| 9 |
+
"hidden_size": 768,
|
| 10 |
+
"image_size": 224,
|
| 11 |
+
"initializer_range": 0.02,
|
| 12 |
+
"intermediate_size": 3072,
|
| 13 |
+
"layer_norm_eps": 1e-12,
|
| 14 |
+
"model_type": "vit",
|
| 15 |
+
"num_attention_heads": 12,
|
| 16 |
+
"num_channels": 3,
|
| 17 |
+
"num_hidden_layers": 12,
|
| 18 |
+
"patch_size": 16,
|
| 19 |
+
"qkv_bias": true,
|
| 20 |
+
"transformers_version": "4.13.0.dev0"
|
| 21 |
+
}
|
preprocessor_config.json → vit_config/preprocessor_config.json
RENAMED
|
@@ -1,5 +1,4 @@
|
|
| 1 |
{
|
| 2 |
-
"image_processor_type": "ViTImageProcessor",
|
| 3 |
"do_normalize": true,
|
| 4 |
"do_resize": true,
|
| 5 |
"image_mean": [
|
|
|
|
| 1 |
{
|
|
|
|
| 2 |
"do_normalize": true,
|
| 3 |
"do_resize": true,
|
| 4 |
"image_mean": [
|