--- license: apache-2.0 base_model: prajjwal1/bert-tiny library_name: transformers pipeline_tag: text-classification tags: - lycheemem - memory - reranking - evidence-retrieval - bert-tiny --- # LycheeMem BERT-Tiny Memory Reranker v0 This repository provides the optional v0 transformer reranker checkpoint for LycheeMem semantic memory search. The model scores `(query, memory candidate)` pairs and is used as a conservative reranker over a wider memory candidate pool. The reranker is default-off in LycheeMem. It only changes memory search when the user installs the optional rerank dependencies, downloads this checkpoint, and explicitly enables the transformer rerank hook. ## Model ```text name: LycheeMem/reranker base_model: prajjwal1/bert-tiny task: memory evidence reranking architecture: AutoModelForSequenceClassification runtime: local checkpoint, default-off LycheeMem hook version: v0.1.0 ``` ## Intended Use Use this checkpoint with LycheeMem's experimental transformer reranker hook: ```bash pip install "lycheemem[rerank]" EXPERIMENTAL_TRANSFORMER_RERANK=true TRANSFORMER_RERANK_MODEL_PATH=/path/to/lycheemem-reranker-v0 TRANSFORMER_RERANK_MAX_REPLACEMENTS=1 TRANSFORMER_RERANK_MERGE_MARGIN=0.3 TRANSFORMER_RERANK_WIDE_TOP_K=50 ``` If dependencies or the local checkpoint are missing, LycheeMem falls back to baseline memory search. ## Training Data The checkpoint was trained on LoCoMo-derived memory evidence reranking bundles. Each training example pairs a user question with candidate memory texts and evidence IDs derived from the LoCoMo benchmark. The source repository does not include LoCoMo data, generated caches, or training outputs. Reproduction notes are maintained in the LycheeMem source repository. ## Metrics All metrics below measure evidence retrieval/reranking, not final LLM answer quality. The primary metric is whether at least one gold evidence item appears in the returned top-10 candidates (`hit@10`). ### LoCoMo Evidence Retrieval ```text System memory backend, 200 QA: baseline: 124/200 = 0.620 v0: 130/200 = 0.650 added/lost/net: +7/-1/+6 System LanceDB backend, 200 QA: baseline: 124/200 = 0.620 v0: 131/200 = 0.655 added/lost/net: +8/-1/+7 Full-memory cache, 5 seeds: held added/lost/net: +115/-7/+108 added/lost ratio: 16.43 Split checks: interleave held: 466/765 -> 495/765, net +29 prefix held: 473/766 -> 501/766, net +28 conversation-heldout held: 476/772 -> 504/772, net +28 ``` ### Candidate Context Probe Same checkpoint, different candidate text construction: ```text single-turn v0: 998/1531 = 0.651862, net +67 context-candidate v0: 1013/1531 = 0.661659, net +82 ``` ### Zero-Shot Evidence Selection ```text LongMemEval-S cleaned: baseline: 469/500 = 0.938 wide: 500/500 = 1.000 v0: 484/500 = 0.968 added/lost/net: +16/-1/+15 MSC-MemFuse-MC10 turn-level: baseline: 142/299 = 0.475 wide: 279/299 = 0.933 v0: 152/299 = 0.508 added/lost/net: +10/-0/+10 HotpotQA distractor sentence-level: baseline: 6957/7405 = 0.9395 wide: 7405/7405 = 1.0000 v0: 7076/7405 = 0.9556 added/lost/net: +141/-22/+119 ``` These zero-shot fixtures are intended to check whether the LoCoMo-trained v0 checkpoint transfers as an evidence selector. LongMemEval-S and MSC-MemFuse are memory/dialogue-style settings. HotpotQA is a wiki multi-hop supporting-sentence setting, so it is a useful but less direct transfer check. ## Limitations - The checkpoint is trained on LoCoMo-derived evidence bundles and may not generalize to every private memory corpus. - It assumes relevant evidence is already present in the wide candidate pool. - It is not an RL policy and does not learn online by itself. - The MSC-MemFuse fixture uses answer-string matching to infer evidence turns; this is a conservative heuristic, not original human evidence annotation. - HotpotQA transfer is positive but has more lost cases than memory-style fixtures, so dense wiki distractors need monitoring. - The strongest current accuracy bottleneck appears to be candidate representation, especially single-turn evidence-boundary cases. - The hook should remain default-off until a user or deployment explicitly opts in and monitors diagnostics. ## Runtime Behavior LycheeMem's transformer reranker uses this checkpoint only after baseline memory search has produced a wider candidate pool. The current v0 policy is conservative: ```text wide_top_k: 50 max_replacements: 1 merge_margin: 0.3 runtime: local checkpoint only default behavior: disabled ``` In plain terms: baseline search retrieves memories first. The reranker only gets a narrow chance to replace one item in the final top-k when a better evidence candidate is already present in the wider candidate pool. ## Files Expected checkpoint directory: ```text config.json model.safetensors run_meta.json special_tokens_map.json tokenizer_config.json vocab.txt ``` SHA256 checksums for the v0.1.0 checkpoint artifact: ```text ed54572648824881775812e8b2b0af9be1b720ebdbdf2d1b7c0d976c4ca14c8a config.json 0a328c53b55cbd49aeec0a44e6b9e2d02d09539e6784d93fc515ba815261fca0 model.safetensors 7841bca86e19c72c1cd0f4834efb5c413975ad01ffc5c7020328f4cc62b70536 run_meta.json b6d346be366a7d1d48332dbc9fdf3bf8960b5d879522b7799ddba59e76237ee3 special_tokens_map.json e711904cac23112776b678356ccf702cf934babaa01125f698ac43bf9ad38e73 tokenizer_config.json 07eced375cec144d27c900241f3e339478dec958f92fddbc551f295c992038a3 vocab.txt ``` ## Citation and Scope This checkpoint is part of LycheeMem's optional memory retrieval research path. It is not an RL policy and does not learn online by itself. Online feedback and personalization are handled by separate experimental components.