MiniLM-L12-v2 Memex Fine-tuned Embeddings

A fine-tuned sentence embedding model optimized for semantic similarity matching of structured memory documents containing facts, opinions, observations, and experiences that are formatted according to the Hindsight memory architecture

Model Description

This model is a fine-tuned version of sentence-transformers/all-MiniLM-L12-v2, specifically trained to generate embeddings for documents formatted with epistemic type labels and contextual metadata.

The model is trained with Quantization-Aware Training (QAT) and exported to ONNX format with INT8 dynamic activation and INT4 weight quantization for efficient inference.

Key Features

  • Epistemic-Aware: Optimized for documents with type labels (World, Experience, Opinion, Observation)
  • Context-Sensitive: Leverages contextual metadata for improved semantic matching
  • Quantized: INT8/INT4 quantization for efficient deployment
  • ONNX Export: Ready for production deployment

Usage

With ONNX Runtime

import onnxruntime as ort
import numpy as np
from tokenizers import Tokenizer

tokenizer = Tokenizer.from_file("tokenizer.json")
tokenizer.enable_padding(pad_id=0, pad_token='[PAD]')
tokenizer.enable_truncation(max_length=512)

session = ort.InferenceSession("model.onnx", providers=['CPUExecutionProvider'])

def encode(texts: list[str]) -> np.ndarray:
    encodings = tokenizer.encode_batch(texts)
    input_ids = np.array([e.ids for e in encodings], dtype=np.int64)
    attention_mask = np.array([e.attention_mask for e in encodings], dtype=np.int64)

    outputs = session.run(None, {
        'input_ids': input_ids,
        'attention_mask': attention_mask,
    })
    return outputs[0]

embeddings = encode(["Your documents here"])

With Sentence Transformers

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("your-username/minilm-l12-v2-memex-ft")
embeddings = model.encode(["Your documents here"])

Document Formatting

Documents should be formatted with type and context labels before embedding:

{Type} ({Context}): {Text}

Examples:

  • World (Config): Production is pinned to Node 18.x LTS due to a dependency constraint.
  • Experience (Decision): We decided to implement a circuit breaker pattern to prevent cascading failures.
  • Opinion (Tech Preference): You strongly prefer PostgreSQL because of its JSONB support.
  • Observation (Contact Info): Mike Smith is the account manager; his email is mike@vendor.com.

Supported Types:

  • World - Facts about the world
  • Experience - Personal events and actions
  • Opinion - Subjective beliefs and preferences
  • Observation - Derived or inferred information

Training Details

Training Data

Synthetic triplet data for fine-tuning on structured memory documents.

Split Samples
Train 300
Eval 50
Test 50

Each training example contains:

  • query: Natural language question
  • positive: Relevant structured document
  • negative: Distractor document (similar but irrelevant)

Example:

{
  "query": "What is our stance on remote work?",
  "positive": {
    "text": "I believe remote work requires asynchronous communication discipline to be effective.",
    "type": "Opinion",
    "context": "Work Philosophy"
  },
  "negative": {
    "text": "The company policy allows for 3 days of remote work per week.",
    "type": "World",
    "context": "HR Policy"
  }
}

Training Configuration

Parameter Value
Base Model sentence-transformers/all-MiniLM-L12-v2
Epochs 8
Batch Size 8
Learning Rate 1e-6
Loss Function Multiple Negatives Ranking Loss
Max Sequence Length 512
Warmup Ratio 0.5
Quantization INT8 dynamic activation, INT4 weights (QAT)

Loss Function

Uses Multiple Negatives Ranking Loss (MNRL) which treats each positive pair in a batch as a negative for other pairs, effectively creating many negative samples without explicit hard negatives.

Evaluation Results

Test Set Performance

Metric Baseline Fine-tuned Improvement
Accuracy@1 0.47 0.67 +43%
Accuracy@3 0.80 0.83 +4%
Accuracy@5 0.88 0.93 +6%
Accuracy@10 0.92 0.97 +5%
MRR@10 0.65 0.77 +18%

Eval Set Performance

Metric Baseline Fine-tuned Improvement
Accuracy@1 0.54 0.70 +30%
Accuracy@3 0.88 0.86 -
Accuracy@5 0.96 0.96 -
Accuracy@10 0.98 1.00 +2%
MRR@10 0.71 0.80 +13%

Model Architecture

  • Type: Sentence Transformer (Bi-Encoder)
  • Base: MiniLM-L12 (12 transformer layers)
  • Pooling: Mean pooling over token embeddings
  • Normalization: L2 normalization
  • Output: 384-dimensional embedding vector
  • Quantization: INT8 dynamic activation + INT4 weight (via torchao)

Intended Use

This model is designed for:

  • Memory Retrieval: Finding relevant memories in agent memory systems
  • Semantic Search: Ranking documents by relevance to natural language queries
  • Clustering: Grouping related memories by type or topic
  • Deduplication: Identifying semantically similar documents

Limitations

  • Trained on synthetic data; may require domain-specific fine-tuning for production use
  • Optimized for the structured format described above; may not generalize well to unstructured text
  • English language only
  • Small training dataset (300 examples)

License

Apache 2.0 License

Downloads last month
247
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for JasperHG90/minilm-l12-v2-hindsight-embeddings

Paper for JasperHG90/minilm-l12-v2-hindsight-embeddings