---
license: apache-2.0
base_model: prajjwal1/bert-tiny
library_name: transformers
pipeline_tag: text-classification
tags:
  - lycheemem
  - memory
  - reranking
  - evidence-retrieval
  - bert-tiny
---

# LycheeMem BERT-Tiny Memory Reranker v0

This repository provides the optional v0 transformer reranker checkpoint for
LycheeMem semantic memory search. The model scores `(query, memory candidate)`
pairs and is used as a conservative reranker over a wider memory candidate pool.

The reranker is default-off in LycheeMem. It only changes memory search when the
user installs the optional rerank dependencies, downloads this checkpoint, and
explicitly enables the transformer rerank hook.

## Model

```text
name: LycheeMem/reranker
base_model: prajjwal1/bert-tiny
task: memory evidence reranking
architecture: AutoModelForSequenceClassification
runtime: local checkpoint, default-off LycheeMem hook
version: v0.1.0
```

## Intended Use

Use this checkpoint with LycheeMem's experimental transformer reranker hook:

```bash
pip install "lycheemem[rerank]"

EXPERIMENTAL_TRANSFORMER_RERANK=true
TRANSFORMER_RERANK_MODEL_PATH=/path/to/lycheemem-reranker-v0
TRANSFORMER_RERANK_MAX_REPLACEMENTS=1
TRANSFORMER_RERANK_MERGE_MARGIN=0.3
TRANSFORMER_RERANK_WIDE_TOP_K=50
```

If dependencies or the local checkpoint are missing, LycheeMem falls back to
baseline memory search.

## Training Data

The checkpoint was trained on LoCoMo-derived memory evidence reranking bundles.
Each training example pairs a user question with candidate memory texts and
evidence IDs derived from the LoCoMo benchmark.

The source repository does not include LoCoMo data, generated caches, or training
outputs. Reproduction notes are maintained in the LycheeMem source repository.

## Metrics

All metrics below measure evidence retrieval/reranking, not final LLM answer
quality. The primary metric is whether at least one gold evidence item appears
in the returned top-10 candidates (`hit@10`).

### LoCoMo Evidence Retrieval

```text
System memory backend, 200 QA:
  baseline: 124/200 = 0.620
  v0:       130/200 = 0.650
  added/lost/net: +7/-1/+6

System LanceDB backend, 200 QA:
  baseline: 124/200 = 0.620
  v0:       131/200 = 0.655
  added/lost/net: +8/-1/+7

Full-memory cache, 5 seeds:
  held added/lost/net: +115/-7/+108
  added/lost ratio:   16.43

Split checks:
  interleave held:            466/765 -> 495/765, net +29
  prefix held:                473/766 -> 501/766, net +28
  conversation-heldout held:  476/772 -> 504/772, net +28
```

### Candidate Context Probe

Same checkpoint, different candidate text construction:

```text
single-turn v0:       998/1531 = 0.651862, net +67
context-candidate v0: 1013/1531 = 0.661659, net +82
```

### Zero-Shot Evidence Selection

```text
LongMemEval-S cleaned:
  baseline: 469/500 = 0.938
  wide:     500/500 = 1.000
  v0:       484/500 = 0.968
  added/lost/net: +16/-1/+15

MSC-MemFuse-MC10 turn-level:
  baseline: 142/299 = 0.475
  wide:     279/299 = 0.933
  v0:       152/299 = 0.508
  added/lost/net: +10/-0/+10

HotpotQA distractor sentence-level:
  baseline: 6957/7405 = 0.9395
  wide:     7405/7405 = 1.0000
  v0:       7076/7405 = 0.9556
  added/lost/net: +141/-22/+119
```

These zero-shot fixtures are intended to check whether the LoCoMo-trained v0
checkpoint transfers as an evidence selector. LongMemEval-S and MSC-MemFuse are
memory/dialogue-style settings. HotpotQA is a wiki multi-hop supporting-sentence
setting, so it is a useful but less direct transfer check.

## Limitations

- The checkpoint is trained on LoCoMo-derived evidence bundles and may not
  generalize to every private memory corpus.
- It assumes relevant evidence is already present in the wide candidate pool.
- It is not an RL policy and does not learn online by itself.
- The MSC-MemFuse fixture uses answer-string matching to infer evidence turns;
  this is a conservative heuristic, not original human evidence annotation.
- HotpotQA transfer is positive but has more lost cases than memory-style
  fixtures, so dense wiki distractors need monitoring.
- The strongest current accuracy bottleneck appears to be candidate
  representation, especially single-turn evidence-boundary cases.
- The hook should remain default-off until a user or deployment explicitly opts
  in and monitors diagnostics.

## Runtime Behavior

LycheeMem's transformer reranker uses this checkpoint only after baseline memory
search has produced a wider candidate pool. The current v0 policy is
conservative:

```text
wide_top_k: 50
max_replacements: 1
merge_margin: 0.3
runtime: local checkpoint only
default behavior: disabled
```

In plain terms: baseline search retrieves memories first. The reranker only gets
a narrow chance to replace one item in the final top-k when a better evidence
candidate is already present in the wider candidate pool.

## Files

Expected checkpoint directory:

```text
config.json
model.safetensors
run_meta.json
special_tokens_map.json
tokenizer_config.json
vocab.txt
```

SHA256 checksums for the v0.1.0 checkpoint artifact:

```text
ed54572648824881775812e8b2b0af9be1b720ebdbdf2d1b7c0d976c4ca14c8a  config.json
0a328c53b55cbd49aeec0a44e6b9e2d02d09539e6784d93fc515ba815261fca0  model.safetensors
7841bca86e19c72c1cd0f4834efb5c413975ad01ffc5c7020328f4cc62b70536  run_meta.json
b6d346be366a7d1d48332dbc9fdf3bf8960b5d879522b7799ddba59e76237ee3  special_tokens_map.json
e711904cac23112776b678356ccf702cf934babaa01125f698ac43bf9ad38e73  tokenizer_config.json
07eced375cec144d27c900241f3e339478dec958f92fddbc551f295c992038a3  vocab.txt
```

## Citation and Scope

This checkpoint is part of LycheeMem's optional memory retrieval research path.
It is not an RL policy and does not learn online by itself. Online feedback and
personalization are handled by separate experimental components.