MiniLM-L12-v2 Memex Fine-tuned Embeddings

A fine-tuned sentence embedding model optimized for semantic similarity matching of structured memory documents containing facts, opinions, observations, and experiences that are formatted according to the Hindsight memory architecture

Model Description

This model is a fine-tuned version of sentence-transformers/all-MiniLM-L12-v2, specifically trained to generate embeddings for documents formatted with epistemic type labels and contextual metadata.

The model is trained with Quantization-Aware Training (QAT) and exported to ONNX format with INT8 dynamic activation and INT4 weight quantization for efficient inference.

Key Features

Epistemic-Aware: Optimized for documents with type labels (World, Experience, Opinion, Observation)
Context-Sensitive: Leverages contextual metadata for improved semantic matching
Quantized: INT8/INT4 quantization for efficient deployment
ONNX Export: Ready for production deployment

Usage

With ONNX Runtime

import onnxruntime as ort
import numpy as np
from tokenizers import Tokenizer

tokenizer = Tokenizer.from_file("tokenizer.json")
tokenizer.enable_padding(pad_id=0, pad_token='[PAD]')
tokenizer.enable_truncation(max_length=512)

session = ort.InferenceSession("model.onnx", providers=['CPUExecutionProvider'])

def encode(texts: list[str]) -> np.ndarray:
    encodings = tokenizer.encode_batch(texts)
    input_ids = np.array([e.ids for e in encodings], dtype=np.int64)
    attention_mask = np.array([e.attention_mask for e in encodings], dtype=np.int64)

    outputs = session.run(None, {
        'input_ids': input_ids,
        'attention_mask': attention_mask,
    })
    return outputs[0]

embeddings = encode(["Your documents here"])

With Sentence Transformers

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("your-username/minilm-l12-v2-memex-ft")
embeddings = model.encode(["Your documents here"])

Document Formatting

Documents should be formatted with type and context labels before embedding:

{Type} ({Context}): {Text}

Examples:

World (Config): Production is pinned to Node 18.x LTS due to a dependency constraint.
Experience (Decision): We decided to implement a circuit breaker pattern to prevent cascading failures.
Opinion (Tech Preference): You strongly prefer PostgreSQL because of its JSONB support.
Observation (Contact Info): Mike Smith is the account manager; his email is mike@vendor.com.

Supported Types:

World - Facts about the world
Experience - Personal events and actions
Opinion - Subjective beliefs and preferences
Observation - Derived or inferred information

Training Details

Training Data

Synthetic triplet data for fine-tuning on structured memory documents.

Split	Samples
Train	300
Eval	50
Test	50

Each training example contains:

query: Natural language question
positive: Relevant structured document
negative: Distractor document (similar but irrelevant)

Example:

{
  "query": "What is our stance on remote work?",
  "positive": {
    "text": "I believe remote work requires asynchronous communication discipline to be effective.",
    "type": "Opinion",
    "context": "Work Philosophy"
  },
  "negative": {
    "text": "The company policy allows for 3 days of remote work per week.",
    "type": "World",
    "context": "HR Policy"
  }
}

Training Configuration

Parameter	Value
Base Model	sentence-transformers/all-MiniLM-L12-v2
Epochs	8
Batch Size	8
Learning Rate	1e-6
Loss Function	Multiple Negatives Ranking Loss
Max Sequence Length	512
Warmup Ratio	0.5
Quantization	INT8 dynamic activation, INT4 weights (QAT)

Loss Function

Uses Multiple Negatives Ranking Loss (MNRL) which treats each positive pair in a batch as a negative for other pairs, effectively creating many negative samples without explicit hard negatives.

Evaluation Results

Test Set Performance

Metric	Baseline	Fine-tuned	Improvement
Accuracy@1	0.47	0.67	+43%
Accuracy@3	0.80	0.83	+4%
Accuracy@5	0.88	0.93	+6%
Accuracy@10	0.92	0.97	+5%
MRR@10	0.65	0.77	+18%

Eval Set Performance

Metric	Baseline	Fine-tuned	Improvement
Accuracy@1	0.54	0.70	+30%
Accuracy@3	0.88	0.86	-
Accuracy@5	0.96	0.96	-
Accuracy@10	0.98	1.00	+2%
MRR@10	0.71	0.80	+13%

Model Architecture

Type: Sentence Transformer (Bi-Encoder)
Base: MiniLM-L12 (12 transformer layers)
Pooling: Mean pooling over token embeddings
Normalization: L2 normalization
Output: 384-dimensional embedding vector
Quantization: INT8 dynamic activation + INT4 weight (via torchao)

Intended Use

This model is designed for:

Memory Retrieval: Finding relevant memories in agent memory systems
Semantic Search: Ranking documents by relevance to natural language queries
Clustering: Grouping related memories by type or topic
Deduplication: Identifying semantically similar documents

Limitations

Trained on synthetic data; may require domain-specific fine-tuning for production use
Optimized for the structured format described above; may not generalize well to unstructured text
English language only
Small training dataset (300 examples)

License

Apache 2.0 License

Downloads last month: 247

Model tree for JasperHG90/minilm-l12-v2-hindsight-embeddings

Base model

microsoft/MiniLM-L12-H384-uncased

Quantized

sentence-transformers/all-MiniLM-L12-v2

Quantized

(14)

this model

Paper for JasperHG90/minilm-l12-v2-hindsight-embeddings

Hindsight is 20/20: Building Agent Memory that Retains, Recalls, and Reflects

Paper • 2512.12818 • Published Dec 14, 2025 • 1