LycheeMem BERT-Tiny Memory Reranker v0

This repository provides the optional v0 transformer reranker checkpoint for LycheeMem semantic memory search. The model scores (query, memory candidate) pairs and is used as a conservative reranker over a wider memory candidate pool.

The reranker is default-off in LycheeMem. It only changes memory search when the user installs the optional rerank dependencies, downloads this checkpoint, and explicitly enables the transformer rerank hook.

Model

name: LycheeMem/reranker
base_model: prajjwal1/bert-tiny
task: memory evidence reranking
architecture: AutoModelForSequenceClassification
runtime: local checkpoint, default-off LycheeMem hook
version: v0.1.0

Intended Use

Use this checkpoint with LycheeMem's experimental transformer reranker hook:

pip install "lycheemem[rerank]"

EXPERIMENTAL_TRANSFORMER_RERANK=true
TRANSFORMER_RERANK_MODEL_PATH=/path/to/lycheemem-reranker-v0
TRANSFORMER_RERANK_MAX_REPLACEMENTS=1
TRANSFORMER_RERANK_MERGE_MARGIN=0.3
TRANSFORMER_RERANK_WIDE_TOP_K=50

If dependencies or the local checkpoint are missing, LycheeMem falls back to baseline memory search.

Training Data

The checkpoint was trained on LoCoMo-derived memory evidence reranking bundles. Each training example pairs a user question with candidate memory texts and evidence IDs derived from the LoCoMo benchmark.

The source repository does not include LoCoMo data, generated caches, or training outputs. Reproduction notes are maintained in the LycheeMem source repository.

Metrics

All metrics below measure evidence retrieval/reranking, not final LLM answer quality. The primary metric is whether at least one gold evidence item appears in the returned top-10 candidates (hit@10).

LoCoMo Evidence Retrieval

System memory backend, 200 QA:
  baseline: 124/200 = 0.620
  v0:       130/200 = 0.650
  added/lost/net: +7/-1/+6

System LanceDB backend, 200 QA:
  baseline: 124/200 = 0.620
  v0:       131/200 = 0.655
  added/lost/net: +8/-1/+7

Full-memory cache, 5 seeds:
  held added/lost/net: +115/-7/+108
  added/lost ratio:   16.43

Split checks:
  interleave held:            466/765 -> 495/765, net +29
  prefix held:                473/766 -> 501/766, net +28
  conversation-heldout held:  476/772 -> 504/772, net +28

Candidate Context Probe

Same checkpoint, different candidate text construction:

single-turn v0:       998/1531 = 0.651862, net +67
context-candidate v0: 1013/1531 = 0.661659, net +82

Zero-Shot Evidence Selection

LongMemEval-S cleaned:
  baseline: 469/500 = 0.938
  wide:     500/500 = 1.000
  v0:       484/500 = 0.968
  added/lost/net: +16/-1/+15

MSC-MemFuse-MC10 turn-level:
  baseline: 142/299 = 0.475
  wide:     279/299 = 0.933
  v0:       152/299 = 0.508
  added/lost/net: +10/-0/+10

HotpotQA distractor sentence-level:
  baseline: 6957/7405 = 0.9395
  wide:     7405/7405 = 1.0000
  v0:       7076/7405 = 0.9556
  added/lost/net: +141/-22/+119

These zero-shot fixtures are intended to check whether the LoCoMo-trained v0 checkpoint transfers as an evidence selector. LongMemEval-S and MSC-MemFuse are memory/dialogue-style settings. HotpotQA is a wiki multi-hop supporting-sentence setting, so it is a useful but less direct transfer check.

Limitations

The checkpoint is trained on LoCoMo-derived evidence bundles and may not generalize to every private memory corpus.
It assumes relevant evidence is already present in the wide candidate pool.
It is not an RL policy and does not learn online by itself.
The MSC-MemFuse fixture uses answer-string matching to infer evidence turns; this is a conservative heuristic, not original human evidence annotation.
HotpotQA transfer is positive but has more lost cases than memory-style fixtures, so dense wiki distractors need monitoring.
The strongest current accuracy bottleneck appears to be candidate representation, especially single-turn evidence-boundary cases.
The hook should remain default-off until a user or deployment explicitly opts in and monitors diagnostics.

Runtime Behavior

LycheeMem's transformer reranker uses this checkpoint only after baseline memory search has produced a wider candidate pool. The current v0 policy is conservative:

wide_top_k: 50
max_replacements: 1
merge_margin: 0.3
runtime: local checkpoint only
default behavior: disabled

In plain terms: baseline search retrieves memories first. The reranker only gets a narrow chance to replace one item in the final top-k when a better evidence candidate is already present in the wider candidate pool.

Files

Expected checkpoint directory:

config.json
model.safetensors
run_meta.json
special_tokens_map.json
tokenizer_config.json
vocab.txt

SHA256 checksums for the v0.1.0 checkpoint artifact:

ed54572648824881775812e8b2b0af9be1b720ebdbdf2d1b7c0d976c4ca14c8a  config.json
0a328c53b55cbd49aeec0a44e6b9e2d02d09539e6784d93fc515ba815261fca0  model.safetensors
7841bca86e19c72c1cd0f4834efb5c413975ad01ffc5c7020328f4cc62b70536  run_meta.json
b6d346be366a7d1d48332dbc9fdf3bf8960b5d879522b7799ddba59e76237ee3  special_tokens_map.json
e711904cac23112776b678356ccf702cf934babaa01125f698ac43bf9ad38e73  tokenizer_config.json
07eced375cec144d27c900241f3e339478dec958f92fddbc551f295c992038a3  vocab.txt

Citation and Scope

This checkpoint is part of LycheeMem's optional memory retrieval research path. It is not an RL policy and does not learn online by itself. Online feedback and personalization are handled by separate experimental components.