LycheeMem
/

reranker

+---
+license: apache-2.0
+base_model: prajjwal1/bert-tiny
+library_name: transformers
+pipeline_tag: text-classification
+tags:
+  - lycheemem
+  - memory
+  - reranking
+  - evidence-retrieval
+  - bert-tiny
+---
+# LycheeMem BERT-Tiny Memory Reranker v0
+This repository provides the optional v0 transformer reranker checkpoint for
+LycheeMem semantic memory search. The model scores `(query, memory candidate)`
+pairs and is used as a conservative reranker over a wider memory candidate pool.
+The reranker is default-off in LycheeMem. It only changes memory search when the
+user installs the optional rerank dependencies, downloads this checkpoint, and
+explicitly enables the transformer rerank hook.
+## Model
+```text
+name: LycheeMem/reranker
+base_model: prajjwal1/bert-tiny
+task: memory evidence reranking
+architecture: AutoModelForSequenceClassification
+runtime: local checkpoint, default-off LycheeMem hook
+version: v0.1.0
+```
+## Intended Use
+Use this checkpoint with LycheeMem's experimental transformer reranker hook:
+```bash
+pip install "lycheemem[rerank]"
+EXPERIMENTAL_TRANSFORMER_RERANK=true
+TRANSFORMER_RERANK_MODEL_PATH=/path/to/lycheemem-reranker-v0
+TRANSFORMER_RERANK_MAX_REPLACEMENTS=1
+TRANSFORMER_RERANK_MERGE_MARGIN=0.3
+TRANSFORMER_RERANK_WIDE_TOP_K=50
+```
+If dependencies or the local checkpoint are missing, LycheeMem falls back to
+baseline memory search.
+## Training Data
+The checkpoint was trained on LoCoMo-derived memory evidence reranking bundles.
+Each training example pairs a user question with candidate memory texts and
+evidence IDs derived from the LoCoMo benchmark.
+The source repository does not include LoCoMo data, generated caches, or training
+outputs. Reproduction notes are maintained in the LycheeMem source repository.
+## Metrics
+All metrics below measure evidence retrieval/reranking, not final LLM answer
+quality. The primary metric is whether at least one gold evidence item appears
+in the returned top-10 candidates (`hit@10`).
+### LoCoMo Evidence Retrieval
+```text
+System memory backend, 200 QA:
+  baseline: 124/200 = 0.620
+  v0:       130/200 = 0.650
+  added/lost/net: +7/-1/+6
+System LanceDB backend, 200 QA:
+  baseline: 124/200 = 0.620
+  v0:       131/200 = 0.655
+  added/lost/net: +8/-1/+7
+Full-memory cache, 5 seeds:
+  held added/lost/net: +115/-7/+108
+  added/lost ratio:   16.43
+Split checks:
+  interleave held:            466/765 -> 495/765, net +29
+  prefix held:                473/766 -> 501/766, net +28
+  conversation-heldout held:  476/772 -> 504/772, net +28
+```
+### Candidate Context Probe
+Same checkpoint, different candidate text construction:
+```text
+single-turn v0:       998/1531 = 0.651862, net +67
+context-candidate v0: 1013/1531 = 0.661659, net +82
+```
+### Zero-Shot Evidence Selection
+```text
+LongMemEval-S cleaned:
+  baseline: 469/500 = 0.938
+  wide:     500/500 = 1.000
+  v0:       484/500 = 0.968
+  added/lost/net: +16/-1/+15
+MSC-MemFuse-MC10 turn-level:
+  baseline: 142/299 = 0.475
+  wide:     279/299 = 0.933
+  v0:       152/299 = 0.508
+  added/lost/net: +10/-0/+10
+HotpotQA distractor sentence-level:
+  baseline: 6957/7405 = 0.9395
+  wide:     7405/7405 = 1.0000
+  v0:       7076/7405 = 0.9556
+  added/lost/net: +141/-22/+119
+```
+These zero-shot fixtures are intended to check whether the LoCoMo-trained v0
+checkpoint transfers as an evidence selector. LongMemEval-S and MSC-MemFuse are
+memory/dialogue-style settings. HotpotQA is a wiki multi-hop supporting-sentence
+setting, so it is a useful but less direct transfer check.
+## Limitations
+- The checkpoint is trained on LoCoMo-derived evidence bundles and may not
+  generalize to every private memory corpus.
+- It assumes relevant evidence is already present in the wide candidate pool.
+- It is not an RL policy and does not learn online by itself.
+- The MSC-MemFuse fixture uses answer-string matching to infer evidence turns;
+  this is a conservative heuristic, not original human evidence annotation.
+- HotpotQA transfer is positive but has more lost cases than memory-style
+  fixtures, so dense wiki distractors need monitoring.
+- The strongest current accuracy bottleneck appears to be candidate
+  representation, especially single-turn evidence-boundary cases.
+- The hook should remain default-off until a user or deployment explicitly opts
+  in and monitors diagnostics.
+## Runtime Behavior
+LycheeMem's transformer reranker uses this checkpoint only after baseline memory
+search has produced a wider candidate pool. The current v0 policy is
+conservative:
+```text
+wide_top_k: 50
+max_replacements: 1
+merge_margin: 0.3
+runtime: local checkpoint only
+default behavior: disabled
+```
+In plain terms: baseline search retrieves memories first. The reranker only gets
+a narrow chance to replace one item in the final top-k when a better evidence
+candidate is already present in the wider candidate pool.
+## Files
+Expected checkpoint directory:
+```text
+config.json
+model.safetensors
+run_meta.json
+special_tokens_map.json
+tokenizer_config.json
+vocab.txt
+```
+SHA256 checksums for the v0.1.0 checkpoint artifact:
+```text
+ed54572648824881775812e8b2b0af9be1b720ebdbdf2d1b7c0d976c4ca14c8a  config.json
+0a328c53b55cbd49aeec0a44e6b9e2d02d09539e6784d93fc515ba815261fca0  model.safetensors
+7841bca86e19c72c1cd0f4834efb5c413975ad01ffc5c7020328f4cc62b70536  run_meta.json
+b6d346be366a7d1d48332dbc9fdf3bf8960b5d879522b7799ddba59e76237ee3  special_tokens_map.json
+e711904cac23112776b678356ccf702cf934babaa01125f698ac43bf9ad38e73  tokenizer_config.json
+07eced375cec144d27c900241f3e339478dec958f92fddbc551f295c992038a3  vocab.txt
+```
+## Citation and Scope
+This checkpoint is part of LycheeMem's optional memory retrieval research path.
+It is not an RL policy and does not learn online by itself. Online feedback and
+personalization are handled by separate experimental components.