MindscapeRAG commited on
Commit
320ace5
·
verified ·
1 Parent(s): 2d29009

Upload folder using huggingface_hub

Browse files
Files changed (4) hide show
  1. README.md +202 -0
  2. adapter_config.json +36 -0
  3. adapter_model.safetensors +3 -0
  4. training_args.bin +3 -0
README.md ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - zh
6
+ base_model:
7
+ - Qwen/Qwen3-Embedding-8B
8
+ tags:
9
+ - embedding
10
+ - retriever
11
+ - RAG
12
+ pipeline_tag: feature-extraction
13
+ library_name: transformers
14
+ ---
15
+
16
+ # SFT-Emb-8B
17
+
18
+ [![Paper](https://img.shields.io/badge/Paper-arXiv%3A2512.17220-red)](https://arxiv.org/pdf/2512.17220)
19
+ [![Model](https://img.shields.io/badge/HuggingFace-SFT--Emb--8B-yellow)](https://huggingface.co/MindscapeRAG/SFT-Emb-8B)
20
+
21
+ This repository provides the inference implementation for **SFT-Emb**, a supervised fine-tuned embedding model serving as a baseline retriever in the **MiA-RAG** framework.
22
+
23
+ Unlike [**MiA-Emb**](https://huggingface.co/MindscapeRAG/MiA-Emb-8B), which conditions on both the query and a global summary (Mindscape), **SFT-Emb** operates on the **query alone** — without any global summary or residual connection. This makes it a standard retrieval baseline that does not leverage document-level semantic scaffolding.
24
+
25
+ ---
26
+
27
+ ## ✨ Key Features
28
+
29
+ - **Standard Query-Only Retrieval**
30
+ Encodes queries without any global summary, serving as a strong SFT baseline for comparison with Mindscape-aware models.
31
+
32
+ - **Dual-Granularity Retrieval**
33
+ - **Chunk Retrieval** for narrative passages (standard RAG)
34
+ - **Node Retrieval** for knowledge graph entities (GraphRAG-style)
35
+
36
+ - **Same Architecture, Simpler Input**
37
+ Built on the same Qwen3-Embedding-8B backbone and LoRA fine-tuning as MiA-Emb, but without the Mindscape summary injection or residual embedding mechanism.
38
+
39
+ ---
40
+
41
+ ## 🚀 Usage
42
+
43
+ ### Installation
44
+
45
+ ```bash
46
+ pip install torch transformers>=4.53.0
47
+ ```
48
+
49
+ ---
50
+
51
+ ### 1) Initialization
52
+
53
+ > SFT-Emb-8B is initialized from **`Qwen3-Embedding-8B`**.
54
+
55
+ ```python
56
+ import torch
57
+ import torch.nn.functional as F
58
+ from transformers import AutoTokenizer, AutoModel
59
+
60
+ # Configuration
61
+ device = "cuda" if torch.cuda.is_available() else "cpu"
62
+
63
+ # Inference Parameters
64
+ node_delimiter = "<|repo_name|>" # Special token for Node tasks
65
+
66
+ # Load Tokenizer (base)
67
+ tokenizer = AutoTokenizer.from_pretrained(
68
+ "Qwen/Qwen3-Embedding-8B",
69
+ trust_remote_code=True,
70
+ padding_side="left"
71
+ )
72
+
73
+ # Load Model
74
+ model = AutoModel.from_pretrained(
75
+ "MindscapeRAG/SFT-Emb-8B",
76
+ trust_remote_code=True,
77
+ torch_dtype=torch.bfloat16,
78
+ attn_implementation="flash_attention_2",
79
+ device_map={"": 0}
80
+ )
81
+ ```
82
+
83
+ ---
84
+
85
+ ### 2) Chunk Retrieval
86
+
87
+ Use this mode to retrieve narrative text chunks. The query is encoded **without** any global summary.
88
+
89
+ ```python
90
+ def get_query_prompt(query):
91
+ """Construct input prompt (query-only, no summary)."""
92
+ task_desc = "Given a search query, retrieve relevant chunks or helpful entities summaries from the given context that answer the query"
93
+ return (
94
+ f"Instruct: {task_desc}\n"
95
+ f"Query: {query}{node_delimiter}"
96
+ )
97
+
98
+ def last_token_pool(last_hidden_states, attention_mask):
99
+ """Extract the last non-padding token embedding."""
100
+ left_padding = attention_mask[:, -1].sum() == attention_mask.shape[0]
101
+ if left_padding:
102
+ return last_hidden_states[:, -1]
103
+ sequence_lengths = attention_mask.sum(dim=1) - 1
104
+ batch_size = last_hidden_states.shape[0]
105
+ return last_hidden_states[torch.arange(batch_size, device=last_hidden_states.device), sequence_lengths]
106
+
107
+ def encode_chunk(texts):
108
+ batch = tokenizer(
109
+ texts,
110
+ max_length=4096,
111
+ padding=True,
112
+ truncation=True,
113
+ return_tensors="pt"
114
+ ).to(model.device)
115
+
116
+ outputs = model(**batch)
117
+
118
+ # Embedding (Last Token)
119
+ emb = last_token_pool(outputs.last_hidden_state, batch["attention_mask"])
120
+ emb = F.normalize(emb, p=2, dim=-1)
121
+ return emb
122
+
123
+
124
+ # --- Example ---
125
+ query = "Who is the protagonist?"
126
+ chunk = "Harry looked at the scar on his forehead."
127
+
128
+ # Encode
129
+ q_emb = encode_chunk([get_query_prompt(query)])
130
+ c_emb = encode_chunk([chunk])
131
+
132
+ # Score
133
+ score = q_emb @ c_emb.T
134
+ print(f"Chunk Similarity: {score.item():.4f}")
135
+ ```
136
+
137
+ ---
138
+
139
+ ### 3) Node Retrieval
140
+
141
+ SFT-Emb can retrieve knowledge graph entities (**Nodes**). This mode extracts embeddings from the `<|repo_name|>` token position.
142
+
143
+ **Candidate format:**
144
+ `Entity Name : Entity Description`
145
+
146
+ Example:
147
+ `Mary Campbell Smith : Mary Campbell Smith is mentioned as the translator...`
148
+
149
+ ```python
150
+ def extract_specific_token(outputs, batch, token_id):
151
+ """Extract embedding at the position of a specific token."""
152
+ input_ids = batch["input_ids"]
153
+ hidden = outputs.last_hidden_state
154
+ mask = (input_ids == token_id)
155
+ # Take the last occurrence of the token for each sample
156
+ positions = mask.long().cumsum(dim=1).eq(mask.long().sum(dim=1, keepdim=True)) & mask
157
+ return hidden[positions]
158
+
159
+ def encode_node_query(texts, node_delimiter="<|repo_name|>"):
160
+ batch = tokenizer(texts, padding=True, return_tensors="pt").to(model.device)
161
+ outputs = model(**batch)
162
+
163
+ # Node Main Embedding: extract from <|repo_name|> position
164
+ node_id = tokenizer.encode(node_delimiter, add_special_tokens=False)[0]
165
+ q_emb_node = extract_specific_token(outputs, batch, node_id)
166
+ q_emb_node = F.normalize(q_emb_node, p=2, dim=-1)
167
+ return q_emb_node
168
+
169
+
170
+ # --- Example ---
171
+ query = "Who is the protagonist?"
172
+
173
+ # 1) Encode Query (Node Token)
174
+ q_emb_node = encode_node_query([get_query_prompt(query)])
175
+
176
+ # 2) Encode Entity Candidate
177
+ entity_text = "Harry Potter : The main protagonist of the series..."
178
+ n_emb = encode_chunk([entity_text])
179
+
180
+ # 3) Score
181
+ score = q_emb_node @ n_emb.T
182
+ print(f"Node Similarity: {score.item():.4f}")
183
+ ```
184
+
185
+ ---
186
+
187
+ ## 📜 Citation
188
+
189
+ If you find this work useful, please cite:
190
+
191
+ ```bibtex
192
+ @misc{li2025mindscapeawareretrievalaugmentedgeneration,
193
+ title={Mindscape-Aware Retrieval Augmented Generation for Improved Long Context Understanding},
194
+ author={Yuqing Li and Jiangnan Li and Zheng Lin and Ziyan Zhou and Junjie Wu and Weiping Wang and Jie Zhou and Mo Yu},
195
+ year={2025},
196
+ eprint={2512.17220},
197
+ archivePrefix={arXiv},
198
+ primaryClass={cs.CL},
199
+ url={https://arxiv.org/abs/2512.17220},
200
+ }
201
+ ```
202
+ ---
adapter_config.json ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "Qwen/Qwen3-Embedding-8B",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 256,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.05,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "r": 128,
24
+ "rank_pattern": {},
25
+ "revision": null,
26
+ "target_modules": [
27
+ "q_proj",
28
+ "o_proj",
29
+ "k_proj",
30
+ "v_proj"
31
+ ],
32
+ "task_type": "FEATURE_EXTRACTION",
33
+ "trainable_token_indices": null,
34
+ "use_dora": false,
35
+ "use_rslora": false
36
+ }
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e0df25b17d1fe18ca7647a26e50917c9f750202b80f4af7798655d676c7fb5be
3
+ size 245404784
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7f1ecc50f9f179dbc58cdb2ebaf492e4b9b058376d7fe95d5a7057787d25038b
3
+ size 8593