| --- |
| license: apache-2.0 |
| base_model: prajjwal1/bert-tiny |
| library_name: transformers |
| pipeline_tag: text-classification |
| tags: |
| - lycheemem |
| - memory |
| - reranking |
| - evidence-retrieval |
| - bert-tiny |
| --- |
| |
| # LycheeMem BERT-Tiny Memory Reranker v0 |
|
|
| This repository provides the optional v0 transformer reranker checkpoint for |
| LycheeMem semantic memory search. The model scores `(query, memory candidate)` |
| pairs and is used as a conservative reranker over a wider memory candidate pool. |
|
|
| The reranker is default-off in LycheeMem. It only changes memory search when the |
| user installs the optional rerank dependencies, downloads this checkpoint, and |
| explicitly enables the transformer rerank hook. |
|
|
| ## Model |
|
|
| ```text |
| name: LycheeMem/reranker |
| base_model: prajjwal1/bert-tiny |
| task: memory evidence reranking |
| architecture: AutoModelForSequenceClassification |
| runtime: local checkpoint, default-off LycheeMem hook |
| version: v0.1.0 |
| ``` |
|
|
| ## Intended Use |
|
|
| Use this checkpoint with LycheeMem's experimental transformer reranker hook: |
|
|
| ```bash |
| pip install "lycheemem[rerank]" |
| |
| EXPERIMENTAL_TRANSFORMER_RERANK=true |
| TRANSFORMER_RERANK_MODEL_PATH=/path/to/lycheemem-reranker-v0 |
| TRANSFORMER_RERANK_MAX_REPLACEMENTS=1 |
| TRANSFORMER_RERANK_MERGE_MARGIN=0.3 |
| TRANSFORMER_RERANK_WIDE_TOP_K=50 |
| ``` |
|
|
| If dependencies or the local checkpoint are missing, LycheeMem falls back to |
| baseline memory search. |
|
|
| ## Training Data |
|
|
| The checkpoint was trained on LoCoMo-derived memory evidence reranking bundles. |
| Each training example pairs a user question with candidate memory texts and |
| evidence IDs derived from the LoCoMo benchmark. |
|
|
| The source repository does not include LoCoMo data, generated caches, or training |
| outputs. Reproduction notes are maintained in the LycheeMem source repository. |
|
|
| ## Metrics |
|
|
| All metrics below measure evidence retrieval/reranking, not final LLM answer |
| quality. The primary metric is whether at least one gold evidence item appears |
| in the returned top-10 candidates (`hit@10`). |
|
|
| ### LoCoMo Evidence Retrieval |
|
|
| ```text |
| System memory backend, 200 QA: |
| baseline: 124/200 = 0.620 |
| v0: 130/200 = 0.650 |
| added/lost/net: +7/-1/+6 |
| |
| System LanceDB backend, 200 QA: |
| baseline: 124/200 = 0.620 |
| v0: 131/200 = 0.655 |
| added/lost/net: +8/-1/+7 |
| |
| Full-memory cache, 5 seeds: |
| held added/lost/net: +115/-7/+108 |
| added/lost ratio: 16.43 |
| |
| Split checks: |
| interleave held: 466/765 -> 495/765, net +29 |
| prefix held: 473/766 -> 501/766, net +28 |
| conversation-heldout held: 476/772 -> 504/772, net +28 |
| ``` |
|
|
| ### Candidate Context Probe |
|
|
| Same checkpoint, different candidate text construction: |
|
|
| ```text |
| single-turn v0: 998/1531 = 0.651862, net +67 |
| context-candidate v0: 1013/1531 = 0.661659, net +82 |
| ``` |
|
|
| ### Zero-Shot Evidence Selection |
|
|
| ```text |
| LongMemEval-S cleaned: |
| baseline: 469/500 = 0.938 |
| wide: 500/500 = 1.000 |
| v0: 484/500 = 0.968 |
| added/lost/net: +16/-1/+15 |
| |
| MSC-MemFuse-MC10 turn-level: |
| baseline: 142/299 = 0.475 |
| wide: 279/299 = 0.933 |
| v0: 152/299 = 0.508 |
| added/lost/net: +10/-0/+10 |
| |
| HotpotQA distractor sentence-level: |
| baseline: 6957/7405 = 0.9395 |
| wide: 7405/7405 = 1.0000 |
| v0: 7076/7405 = 0.9556 |
| added/lost/net: +141/-22/+119 |
| ``` |
|
|
| These zero-shot fixtures are intended to check whether the LoCoMo-trained v0 |
| checkpoint transfers as an evidence selector. LongMemEval-S and MSC-MemFuse are |
| memory/dialogue-style settings. HotpotQA is a wiki multi-hop supporting-sentence |
| setting, so it is a useful but less direct transfer check. |
|
|
| ## Limitations |
|
|
| - The checkpoint is trained on LoCoMo-derived evidence bundles and may not |
| generalize to every private memory corpus. |
| - It assumes relevant evidence is already present in the wide candidate pool. |
| - It is not an RL policy and does not learn online by itself. |
| - The MSC-MemFuse fixture uses answer-string matching to infer evidence turns; |
| this is a conservative heuristic, not original human evidence annotation. |
| - HotpotQA transfer is positive but has more lost cases than memory-style |
| fixtures, so dense wiki distractors need monitoring. |
| - The strongest current accuracy bottleneck appears to be candidate |
| representation, especially single-turn evidence-boundary cases. |
| - The hook should remain default-off until a user or deployment explicitly opts |
| in and monitors diagnostics. |
|
|
| ## Runtime Behavior |
|
|
| LycheeMem's transformer reranker uses this checkpoint only after baseline memory |
| search has produced a wider candidate pool. The current v0 policy is |
| conservative: |
|
|
| ```text |
| wide_top_k: 50 |
| max_replacements: 1 |
| merge_margin: 0.3 |
| runtime: local checkpoint only |
| default behavior: disabled |
| ``` |
|
|
| In plain terms: baseline search retrieves memories first. The reranker only gets |
| a narrow chance to replace one item in the final top-k when a better evidence |
| candidate is already present in the wider candidate pool. |
|
|
| ## Files |
|
|
| Expected checkpoint directory: |
|
|
| ```text |
| config.json |
| model.safetensors |
| run_meta.json |
| special_tokens_map.json |
| tokenizer_config.json |
| vocab.txt |
| ``` |
|
|
| SHA256 checksums for the v0.1.0 checkpoint artifact: |
|
|
| ```text |
| ed54572648824881775812e8b2b0af9be1b720ebdbdf2d1b7c0d976c4ca14c8a config.json |
| 0a328c53b55cbd49aeec0a44e6b9e2d02d09539e6784d93fc515ba815261fca0 model.safetensors |
| 7841bca86e19c72c1cd0f4834efb5c413975ad01ffc5c7020328f4cc62b70536 run_meta.json |
| b6d346be366a7d1d48332dbc9fdf3bf8960b5d879522b7799ddba59e76237ee3 special_tokens_map.json |
| e711904cac23112776b678356ccf702cf934babaa01125f698ac43bf9ad38e73 tokenizer_config.json |
| 07eced375cec144d27c900241f3e339478dec958f92fddbc551f295c992038a3 vocab.txt |
| ``` |
|
|
| ## Citation and Scope |
|
|
| This checkpoint is part of LycheeMem's optional memory retrieval research path. |
| It is not an RL policy and does not learn online by itself. Online feedback and |
| personalization are handled by separate experimental components. |
|
|