# Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("LycheeMem/reranker")
model = AutoModelForSequenceClassification.from_pretrained("LycheeMem/reranker")LycheeMem BERT-Tiny Memory Reranker v0
This repository provides the optional v0 transformer reranker checkpoint for
LycheeMem semantic memory search. The model scores (query, memory candidate)
pairs and is used as a conservative reranker over a wider memory candidate pool.
The reranker is default-off in LycheeMem. It only changes memory search when the user installs the optional rerank dependencies, downloads this checkpoint, and explicitly enables the transformer rerank hook.
Model
name: LycheeMem/reranker
base_model: prajjwal1/bert-tiny
task: memory evidence reranking
architecture: AutoModelForSequenceClassification
runtime: local checkpoint, default-off LycheeMem hook
version: v0.1.0
Intended Use
Use this checkpoint with LycheeMem's experimental transformer reranker hook:
pip install "lycheemem[rerank]"
EXPERIMENTAL_TRANSFORMER_RERANK=true
TRANSFORMER_RERANK_MODEL_PATH=/path/to/lycheemem-reranker-v0
TRANSFORMER_RERANK_MAX_REPLACEMENTS=1
TRANSFORMER_RERANK_MERGE_MARGIN=0.3
TRANSFORMER_RERANK_WIDE_TOP_K=50
If dependencies or the local checkpoint are missing, LycheeMem falls back to baseline memory search.
Training Data
The checkpoint was trained on LoCoMo-derived memory evidence reranking bundles. Each training example pairs a user question with candidate memory texts and evidence IDs derived from the LoCoMo benchmark.
The source repository does not include LoCoMo data, generated caches, or training outputs. Reproduction notes are maintained in the LycheeMem source repository.
Metrics
All metrics below measure evidence retrieval/reranking, not final LLM answer
quality. The primary metric is whether at least one gold evidence item appears
in the returned top-10 candidates (hit@10).
LoCoMo Evidence Retrieval
System memory backend, 200 QA:
baseline: 124/200 = 0.620
v0: 130/200 = 0.650
added/lost/net: +7/-1/+6
System LanceDB backend, 200 QA:
baseline: 124/200 = 0.620
v0: 131/200 = 0.655
added/lost/net: +8/-1/+7
Full-memory cache, 5 seeds:
held added/lost/net: +115/-7/+108
added/lost ratio: 16.43
Split checks:
interleave held: 466/765 -> 495/765, net +29
prefix held: 473/766 -> 501/766, net +28
conversation-heldout held: 476/772 -> 504/772, net +28
Candidate Context Probe
Same checkpoint, different candidate text construction:
single-turn v0: 998/1531 = 0.651862, net +67
context-candidate v0: 1013/1531 = 0.661659, net +82
Zero-Shot Evidence Selection
LongMemEval-S cleaned:
baseline: 469/500 = 0.938
wide: 500/500 = 1.000
v0: 484/500 = 0.968
added/lost/net: +16/-1/+15
MSC-MemFuse-MC10 turn-level:
baseline: 142/299 = 0.475
wide: 279/299 = 0.933
v0: 152/299 = 0.508
added/lost/net: +10/-0/+10
HotpotQA distractor sentence-level:
baseline: 6957/7405 = 0.9395
wide: 7405/7405 = 1.0000
v0: 7076/7405 = 0.9556
added/lost/net: +141/-22/+119
These zero-shot fixtures are intended to check whether the LoCoMo-trained v0 checkpoint transfers as an evidence selector. LongMemEval-S and MSC-MemFuse are memory/dialogue-style settings. HotpotQA is a wiki multi-hop supporting-sentence setting, so it is a useful but less direct transfer check.
Limitations
- The checkpoint is trained on LoCoMo-derived evidence bundles and may not generalize to every private memory corpus.
- It assumes relevant evidence is already present in the wide candidate pool.
- It is not an RL policy and does not learn online by itself.
- The MSC-MemFuse fixture uses answer-string matching to infer evidence turns; this is a conservative heuristic, not original human evidence annotation.
- HotpotQA transfer is positive but has more lost cases than memory-style fixtures, so dense wiki distractors need monitoring.
- The strongest current accuracy bottleneck appears to be candidate representation, especially single-turn evidence-boundary cases.
- The hook should remain default-off until a user or deployment explicitly opts in and monitors diagnostics.
Runtime Behavior
LycheeMem's transformer reranker uses this checkpoint only after baseline memory search has produced a wider candidate pool. The current v0 policy is conservative:
wide_top_k: 50
max_replacements: 1
merge_margin: 0.3
runtime: local checkpoint only
default behavior: disabled
In plain terms: baseline search retrieves memories first. The reranker only gets a narrow chance to replace one item in the final top-k when a better evidence candidate is already present in the wider candidate pool.
Files
Expected checkpoint directory:
config.json
model.safetensors
run_meta.json
special_tokens_map.json
tokenizer_config.json
vocab.txt
SHA256 checksums for the v0.1.0 checkpoint artifact:
ed54572648824881775812e8b2b0af9be1b720ebdbdf2d1b7c0d976c4ca14c8a config.json
0a328c53b55cbd49aeec0a44e6b9e2d02d09539e6784d93fc515ba815261fca0 model.safetensors
7841bca86e19c72c1cd0f4834efb5c413975ad01ffc5c7020328f4cc62b70536 run_meta.json
b6d346be366a7d1d48332dbc9fdf3bf8960b5d879522b7799ddba59e76237ee3 special_tokens_map.json
e711904cac23112776b678356ccf702cf934babaa01125f698ac43bf9ad38e73 tokenizer_config.json
07eced375cec144d27c900241f3e339478dec958f92fddbc551f295c992038a3 vocab.txt
Citation and Scope
This checkpoint is part of LycheeMem's optional memory retrieval research path. It is not an RL policy and does not learn online by itself. Online feedback and personalization are handled by separate experimental components.
- Downloads last month
- 21
Model tree for LycheeMem/reranker
Base model
prajjwal1/bert-tiny
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="LycheeMem/reranker")