LycheeMem commited on
Commit
960236b
·
verified ·
1 Parent(s): 221f262

Update model card with expanded zero-shot evaluation: LongMemEval-S, MSC-MemFuse-MC10, and HotpotQA. Checkpoint unchanged.

Browse files
Files changed (1) hide show
  1. README.md +187 -0
README.md ADDED
@@ -0,0 +1,187 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: prajjwal1/bert-tiny
4
+ library_name: transformers
5
+ pipeline_tag: text-classification
6
+ tags:
7
+ - lycheemem
8
+ - memory
9
+ - reranking
10
+ - evidence-retrieval
11
+ - bert-tiny
12
+ ---
13
+
14
+ # LycheeMem BERT-Tiny Memory Reranker v0
15
+
16
+ This repository provides the optional v0 transformer reranker checkpoint for
17
+ LycheeMem semantic memory search. The model scores `(query, memory candidate)`
18
+ pairs and is used as a conservative reranker over a wider memory candidate pool.
19
+
20
+ The reranker is default-off in LycheeMem. It only changes memory search when the
21
+ user installs the optional rerank dependencies, downloads this checkpoint, and
22
+ explicitly enables the transformer rerank hook.
23
+
24
+ ## Model
25
+
26
+ ```text
27
+ name: LycheeMem/reranker
28
+ base_model: prajjwal1/bert-tiny
29
+ task: memory evidence reranking
30
+ architecture: AutoModelForSequenceClassification
31
+ runtime: local checkpoint, default-off LycheeMem hook
32
+ version: v0.1.0
33
+ ```
34
+
35
+ ## Intended Use
36
+
37
+ Use this checkpoint with LycheeMem's experimental transformer reranker hook:
38
+
39
+ ```bash
40
+ pip install "lycheemem[rerank]"
41
+
42
+ EXPERIMENTAL_TRANSFORMER_RERANK=true
43
+ TRANSFORMER_RERANK_MODEL_PATH=/path/to/lycheemem-reranker-v0
44
+ TRANSFORMER_RERANK_MAX_REPLACEMENTS=1
45
+ TRANSFORMER_RERANK_MERGE_MARGIN=0.3
46
+ TRANSFORMER_RERANK_WIDE_TOP_K=50
47
+ ```
48
+
49
+ If dependencies or the local checkpoint are missing, LycheeMem falls back to
50
+ baseline memory search.
51
+
52
+ ## Training Data
53
+
54
+ The checkpoint was trained on LoCoMo-derived memory evidence reranking bundles.
55
+ Each training example pairs a user question with candidate memory texts and
56
+ evidence IDs derived from the LoCoMo benchmark.
57
+
58
+ The source repository does not include LoCoMo data, generated caches, or training
59
+ outputs. Reproduction notes are maintained in the LycheeMem source repository.
60
+
61
+ ## Metrics
62
+
63
+ All metrics below measure evidence retrieval/reranking, not final LLM answer
64
+ quality. The primary metric is whether at least one gold evidence item appears
65
+ in the returned top-10 candidates (`hit@10`).
66
+
67
+ ### LoCoMo Evidence Retrieval
68
+
69
+ ```text
70
+ System memory backend, 200 QA:
71
+ baseline: 124/200 = 0.620
72
+ v0: 130/200 = 0.650
73
+ added/lost/net: +7/-1/+6
74
+
75
+ System LanceDB backend, 200 QA:
76
+ baseline: 124/200 = 0.620
77
+ v0: 131/200 = 0.655
78
+ added/lost/net: +8/-1/+7
79
+
80
+ Full-memory cache, 5 seeds:
81
+ held added/lost/net: +115/-7/+108
82
+ added/lost ratio: 16.43
83
+
84
+ Split checks:
85
+ interleave held: 466/765 -> 495/765, net +29
86
+ prefix held: 473/766 -> 501/766, net +28
87
+ conversation-heldout held: 476/772 -> 504/772, net +28
88
+ ```
89
+
90
+ ### Candidate Context Probe
91
+
92
+ Same checkpoint, different candidate text construction:
93
+
94
+ ```text
95
+ single-turn v0: 998/1531 = 0.651862, net +67
96
+ context-candidate v0: 1013/1531 = 0.661659, net +82
97
+ ```
98
+
99
+ ### Zero-Shot Evidence Selection
100
+
101
+ ```text
102
+ LongMemEval-S cleaned:
103
+ baseline: 469/500 = 0.938
104
+ wide: 500/500 = 1.000
105
+ v0: 484/500 = 0.968
106
+ added/lost/net: +16/-1/+15
107
+
108
+ MSC-MemFuse-MC10 turn-level:
109
+ baseline: 142/299 = 0.475
110
+ wide: 279/299 = 0.933
111
+ v0: 152/299 = 0.508
112
+ added/lost/net: +10/-0/+10
113
+
114
+ HotpotQA distractor sentence-level:
115
+ baseline: 6957/7405 = 0.9395
116
+ wide: 7405/7405 = 1.0000
117
+ v0: 7076/7405 = 0.9556
118
+ added/lost/net: +141/-22/+119
119
+ ```
120
+
121
+ These zero-shot fixtures are intended to check whether the LoCoMo-trained v0
122
+ checkpoint transfers as an evidence selector. LongMemEval-S and MSC-MemFuse are
123
+ memory/dialogue-style settings. HotpotQA is a wiki multi-hop supporting-sentence
124
+ setting, so it is a useful but less direct transfer check.
125
+
126
+ ## Limitations
127
+
128
+ - The checkpoint is trained on LoCoMo-derived evidence bundles and may not
129
+ generalize to every private memory corpus.
130
+ - It assumes relevant evidence is already present in the wide candidate pool.
131
+ - It is not an RL policy and does not learn online by itself.
132
+ - The MSC-MemFuse fixture uses answer-string matching to infer evidence turns;
133
+ this is a conservative heuristic, not original human evidence annotation.
134
+ - HotpotQA transfer is positive but has more lost cases than memory-style
135
+ fixtures, so dense wiki distractors need monitoring.
136
+ - The strongest current accuracy bottleneck appears to be candidate
137
+ representation, especially single-turn evidence-boundary cases.
138
+ - The hook should remain default-off until a user or deployment explicitly opts
139
+ in and monitors diagnostics.
140
+
141
+ ## Runtime Behavior
142
+
143
+ LycheeMem's transformer reranker uses this checkpoint only after baseline memory
144
+ search has produced a wider candidate pool. The current v0 policy is
145
+ conservative:
146
+
147
+ ```text
148
+ wide_top_k: 50
149
+ max_replacements: 1
150
+ merge_margin: 0.3
151
+ runtime: local checkpoint only
152
+ default behavior: disabled
153
+ ```
154
+
155
+ In plain terms: baseline search retrieves memories first. The reranker only gets
156
+ a narrow chance to replace one item in the final top-k when a better evidence
157
+ candidate is already present in the wider candidate pool.
158
+
159
+ ## Files
160
+
161
+ Expected checkpoint directory:
162
+
163
+ ```text
164
+ config.json
165
+ model.safetensors
166
+ run_meta.json
167
+ special_tokens_map.json
168
+ tokenizer_config.json
169
+ vocab.txt
170
+ ```
171
+
172
+ SHA256 checksums for the v0.1.0 checkpoint artifact:
173
+
174
+ ```text
175
+ ed54572648824881775812e8b2b0af9be1b720ebdbdf2d1b7c0d976c4ca14c8a config.json
176
+ 0a328c53b55cbd49aeec0a44e6b9e2d02d09539e6784d93fc515ba815261fca0 model.safetensors
177
+ 7841bca86e19c72c1cd0f4834efb5c413975ad01ffc5c7020328f4cc62b70536 run_meta.json
178
+ b6d346be366a7d1d48332dbc9fdf3bf8960b5d879522b7799ddba59e76237ee3 special_tokens_map.json
179
+ e711904cac23112776b678356ccf702cf934babaa01125f698ac43bf9ad38e73 tokenizer_config.json
180
+ 07eced375cec144d27c900241f3e339478dec958f92fddbc551f295c992038a3 vocab.txt
181
+ ```
182
+
183
+ ## Citation and Scope
184
+
185
+ This checkpoint is part of LycheeMem's optional memory retrieval research path.
186
+ It is not an RL policy and does not learn online by itself. Online feedback and
187
+ personalization are handled by separate experimental components.