Upload folder using huggingface_hub

Browse files

Files changed (4) hide show

README.md +202 -0
adapter_config.json +36 -0
adapter_model.safetensors +3 -0
training_args.bin +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+license: apache-2.0
+language:
+- en
+- zh
+base_model:
+- Qwen/Qwen3-Embedding-8B
+tags:
+- embedding
+- retriever
+- RAG
+pipeline_tag: feature-extraction
+library_name: transformers
+---
+# SFT-Emb-8B
+[![Paper](https://img.shields.io/badge/Paper-arXiv%3A2512.17220-red)](https://arxiv.org/pdf/2512.17220)
+[![Model](https://img.shields.io/badge/HuggingFace-SFT--Emb--8B-yellow)](https://huggingface.co/MindscapeRAG/SFT-Emb-8B)
+This repository provides the inference implementation for **SFT-Emb**, a supervised fine-tuned embedding model serving as a baseline retriever in the **MiA-RAG** framework.
+Unlike [**MiA-Emb**](https://huggingface.co/MindscapeRAG/MiA-Emb-8B), which conditions on both the query and a global summary (Mindscape), **SFT-Emb** operates on the **query alone** — without any global summary or residual connection. This makes it a standard retrieval baseline that does not leverage document-level semantic scaffolding.
+---
+## ✨ Key Features
+- **Standard Query-Only Retrieval**
+  Encodes queries without any global summary, serving as a strong SFT baseline for comparison with Mindscape-aware models.
+- **Dual-Granularity Retrieval**
+  - **Chunk Retrieval** for narrative passages (standard RAG)
+  - **Node Retrieval** for knowledge graph entities (GraphRAG-style)
+- **Same Architecture, Simpler Input**
+  Built on the same Qwen3-Embedding-8B backbone and LoRA fine-tuning as MiA-Emb, but without the Mindscape summary injection or residual embedding mechanism.
+---
+## 🚀 Usage
+### Installation
+```bash
+pip install torch transformers>=4.53.0
+```
+---
+### 1) Initialization
+> SFT-Emb-8B is initialized from **`Qwen3-Embedding-8B`**.
+```python
+import torch
+import torch.nn.functional as F
+from transformers import AutoTokenizer, AutoModel
+# Configuration
+device = "cuda" if torch.cuda.is_available() else "cpu"
+# Inference Parameters
+node_delimiter = "<|repo_name|>"  # Special token for Node tasks
+# Load Tokenizer (base)
+tokenizer = AutoTokenizer.from_pretrained(
+    "Qwen/Qwen3-Embedding-8B",
+    trust_remote_code=True,
+    padding_side="left"
+)
+# Load Model
+model = AutoModel.from_pretrained(
+    "MindscapeRAG/SFT-Emb-8B",
+    trust_remote_code=True,
+    torch_dtype=torch.bfloat16,
+    attn_implementation="flash_attention_2",
+    device_map={"": 0}
+)
+```
+---
+### 2) Chunk Retrieval
+Use this mode to retrieve narrative text chunks. The query is encoded **without** any global summary.
+```python
+def get_query_prompt(query):
+    """Construct input prompt (query-only, no summary)."""
+    task_desc = "Given a search query, retrieve relevant chunks or helpful entities summaries from the given context that answer the query"
+    return (
+        f"Instruct: {task_desc}\n"
+        f"Query: {query}{node_delimiter}"
+    )
+def last_token_pool(last_hidden_states, attention_mask):
+    """Extract the last non-padding token embedding."""
+    left_padding = attention_mask[:, -1].sum() == attention_mask.shape[0]
+    if left_padding:
+        return last_hidden_states[:, -1]
+    sequence_lengths = attention_mask.sum(dim=1) - 1
+    batch_size = last_hidden_states.shape[0]
+    return last_hidden_states[torch.arange(batch_size, device=last_hidden_states.device), sequence_lengths]
+def encode_chunk(texts):
+    batch = tokenizer(
+        texts,
+        max_length=4096,
+        padding=True,
+        truncation=True,
+        return_tensors="pt"
+    ).to(model.device)
+    outputs = model(**batch)
+    # Embedding (Last Token)
+    emb = last_token_pool(outputs.last_hidden_state, batch["attention_mask"])
+    emb = F.normalize(emb, p=2, dim=-1)
+    return emb
+# --- Example ---
+query = "Who is the protagonist?"
+chunk = "Harry looked at the scar on his forehead."
+# Encode
+q_emb = encode_chunk([get_query_prompt(query)])
+c_emb = encode_chunk([chunk])
+# Score
+score = q_emb @ c_emb.T
+print(f"Chunk Similarity: {score.item():.4f}")
+```
+---
+### 3) Node Retrieval
+SFT-Emb can retrieve knowledge graph entities (**Nodes**). This mode extracts embeddings from the `<|repo_name|>` token position.
+**Candidate format:**
+`Entity Name : Entity Description`
+Example:
+`Mary Campbell Smith : Mary Campbell Smith is mentioned as the translator...`
+```python
+def extract_specific_token(outputs, batch, token_id):
+    """Extract embedding at the position of a specific token."""
+    input_ids = batch["input_ids"]
+    hidden = outputs.last_hidden_state
+    mask = (input_ids == token_id)
+    # Take the last occurrence of the token for each sample
+    positions = mask.long().cumsum(dim=1).eq(mask.long().sum(dim=1, keepdim=True)) & mask
+    return hidden[positions]
+def encode_node_query(texts, node_delimiter="<|repo_name|>"):
+    batch = tokenizer(texts, padding=True, return_tensors="pt").to(model.device)
+    outputs = model(**batch)
+    # Node Main Embedding: extract from <|repo_name|> position
+    node_id = tokenizer.encode(node_delimiter, add_special_tokens=False)[0]
+    q_emb_node = extract_specific_token(outputs, batch, node_id)
+    q_emb_node = F.normalize(q_emb_node, p=2, dim=-1)
+    return q_emb_node
+# --- Example ---
+query = "Who is the protagonist?"
+# 1) Encode Query (Node Token)
+q_emb_node = encode_node_query([get_query_prompt(query)])
+# 2) Encode Entity Candidate
+entity_text = "Harry Potter : The main protagonist of the series..."
+n_emb = encode_chunk([entity_text])
+# 3) Score
+score = q_emb_node @ n_emb.T
+print(f"Node Similarity: {score.item():.4f}")
+```
+---
+## 📜 Citation
+If you find this work useful, please cite:
+```bibtex
+@misc{li2025mindscapeawareretrievalaugmentedgeneration,
+      title={Mindscape-Aware Retrieval Augmented Generation for Improved Long Context Understanding},
+      author={Yuqing Li and Jiangnan Li and Zheng Lin and Ziyan Zhou and Junjie Wu and Weiping Wang and Jie Zhou and Mo Yu},
+      year={2025},
+      eprint={2512.17220},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2512.17220},
+}
+```
+---

adapter_config.json ADDED Viewed

	@@ -0,0 +1,36 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "Qwen/Qwen3-Embedding-8B",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 256,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 128,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "q_proj",
+    "o_proj",
+    "k_proj",
+    "v_proj"
+  ],
+  "task_type": "FEATURE_EXTRACTION",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_rslora": false
+}

adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e0df25b17d1fe18ca7647a26e50917c9f750202b80f4af7798655d676c7fb5be
+size 245404784

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7f1ecc50f9f179dbc58cdb2ebaf492e4b9b058376d7fe95d5a7057787d25038b
+size 8593