Qwen3-Embedding-8B β BART Inversion Model
Inverts 4096-dim Qwen3-Embedding-8B vectors back to text using a BART-base decoder.
Usage
import torch, torch.nn as nn, transformers, numpy as np
from safetensors.torch import load_file
# Load
bart = transformers.AutoModelForSeq2SeqLM.from_pretrained("facebook/bart-base")
tokenizer = transformers.AutoTokenizer.from_pretrained("facebook/bart-base")
embedding_transform = nn.Sequential(
nn.Linear(4096, 4096), nn.LayerNorm(4096), nn.Dropout(0.1), nn.GELU(),
nn.Linear(4096, 768 * 16),
)
state = load_file("model.safetensors", device="cpu")
et_state = {k.replace("embedding_transform.", ""): v for k, v in state.items() if k.startswith("embedding_transform.")}
embedding_transform.load_state_dict(et_state)
ed_state = {k.replace("encoder_decoder.", ""): v for k, v in state.items() if k.startswith("encoder_decoder.")}
bart.load_state_dict(ed_state, strict=False)
device = torch.device("cuda")
bart = bart.to(device).eval()
embedding_transform = embedding_transform.to(device).eval()
# Invert a 4096-dim embedding
def invert(emb_vector):
emb = torch.tensor(emb_vector, dtype=torch.float32).unsqueeze(0).to(device)
with torch.no_grad():
projected = embedding_transform(emb)
inputs_embeds = projected.reshape(1, 16, 768)
attention_mask = torch.ones(1, 16, device=device)
output_ids = bart.generate(
inputs_embeds=inputs_embeds, attention_mask=attention_mask,
max_length=128, num_beams=4, early_stopping=True,
)
return tokenizer.decode(output_ids[0], skip_special_tokens=True)
Details
- Encoder: Qwen3-Embedding-8B (4096-dim, via OpenRouter or local)
- Decoder: BART-base with learned embedding transform (16 repeat tokens)
- Training data: 1M sentences from ClimbMix, embedded with Qwen3-Embedding-8B
- VRAM: ~2GB at inferenc
- Downloads last month
- 37
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support