# Resume

## Main conclusions from this chat

1. The main issue is data/sample construction, not just checkpoint choice.
- `class_id` is token-level.
- labels are context-level and depend on sampled `T_cutoff`.
- balanced token classes do not imply balanced future outcomes.
- the model can easily learn an over-negative prior if the cache is built naively.

2. Cache balancing must happen at cache generation time.
- train-time weighting is too late to fix disk waste or missing context diversity.
- the cache should control:
  - class balance
  - good/bad context balance within class

3. Validation needed token isolation, not just class balancing.
- same token appearing in train and val through different contexts makes validation misleading.

4. Movement-threshold circularity is not the main blocker anymore.
- movement labels are downstream of realized returns.
- movement thresholds should not control cache construction.

5. OHLC is important, but current usage looks like trend/regime summary, not true pattern detection.
- the chart branch is being used.
- but probe tests suggest it is mostly acting as continuation/trend context.
- not as breakout / support-resistance / head-and-shoulders intelligence.

6. The current code is still regression-first.
- main target is multi-horizon return prediction.
- there is also:
  - quality head
  - movement head
- calling movement auxiliary only matters if its loss contribution is actually secondary.

## What was implemented in code

### 1. Validation split
- `train.py`
- validation split was changed to group by token identity (`source_token` / `token_address`) instead of only `class_id`.

### 2. Task tracker
- `TASK_LIST.md`
- created as the running checklist for this work.

### 3. Context weighting signal
- `data/data_loader.py`
- added a context-quality summary derived from realized multi-horizon returns.
- current code computes:
  - `context_score`
  - `context_bucket` (`good` / `bad`)
- this is used for weighting and cache metadata.

### 4. Cached dataset weighting
- `data/data_loader.py`
- sampler weights now account for:
  - class balance
  - good/bad context balance inside class

### 5. Weighted cache generation
- `cache_dataset.py`
- canonical builder now writes context cache only.
- no `cache_mode`.
- no `max_samples`.
- cache budget is context-based:
  - `--target_contexts_total`
  - or `--target_contexts_per_class`
- quota-driven acceptance is implemented before save:
  - level 1: class quotas
  - level 2: good/bad quotas within class

### 6. Metadata rebuild
- `scripts/rebuild_metadata.py`
- now rebuilds:
  - `file_class_map`
  - `file_context_bucket_map`
  - `file_context_summary_map`

### 7. `min_trades`
- re-added properly
- no longer a dead CLI arg
- now actually controls dataset/context eligibility thresholds

### 8. Evaluate script OHLC probes
- `scripts/evaluate_sample.py`
- added these OHLC probe modes:
  - `ohlc_reverse`
  - `ohlc_shuffle_chunks`
  - `ohlc_mask_recent`
  - `ohlc_trend_only`
  - `ohlc_summary_shuffle`
  - `ohlc_detrend`
  - `ohlc_smooth`

### 9. Bad mint metadata fallback
- `data/data_loader.py`
- if mint metadata has invalid or zero supply/decimals, it now defaults to:
  - `total_supply = 1000000000000000`
  - `token_decimals = 6`

## Current cache-builder interface

`cache_dataset.py` now uses a canonical context-cache path only.

Relevant args:
- `--output_dir`
- `--start_date`
- `--min_trade_usd`
- `--min_trades`
- `--context_length`
- `--samples_per_token`
- `--target_contexts_per_class`
- `--target_contexts_total`
- `--good_ratio_nonzero`
- `--good_ratio_class0`
- `--num_workers`
- DB connection args

Rules:
- choose exactly one of:
  - `--target_contexts_per_class`
  - `--target_contexts_total`
- if quota-driven caching is used, current implementation expects:
  - `--num_workers 1`

`--samples_per_token` is still present.
- It is not a cache budget.
- It is a candidate-generation knob.
- It may still be removable later if a better attempt-based planner replaces it.

## Important conceptual corrections from this chat

1. Binary context type matters operationally, but a naive binary rule is dangerous.
- examples like `+1%` then `-20%` show why simplistic rules fail.
- context typing should come from realized multi-horizon behavior, not one crude shortcut.

2. Patching alone does not force pattern learning.
- it only makes local pattern use possible.
- the model can still rely on trend shortcuts unless the representation and training setup make that harder.

3. Support/resistance may be inferable from current chart inputs in principle.
- but current encoder likely compresses too early and learns an easier shortcut instead.

4. For this question, the bottleneck is not just “what loss?”
- but also:
  - chart input representation
  - encoder compression
  - whether the model can preserve local 1s structure

## OHLC probe findings from this chat

The key probe result repeated across runs was:
- `ohlc_detrend` had the largest impact
- `ohlc_trend_only` had the second-largest impact
- `ohlc_smooth`, `shuffle`, `summary_shuffle`, `reverse`, and `mask_recent` had very small impact

Interpretation:
- the model is using OHLC mostly as:
  - broad trend/regime context
  - continuation-style directional signal
- not as:
  - local chart pattern detector
  - support/resistance-aware trader logic
  - breakout/rejection/fair-value-gap style reasoning

This strongly suggests many of the model’s bearish/bad predictions are driven by trend continuation behavior from the chart branch.

## What was explicitly rejected or corrected

- do not keep treating `cache_mode` as a real concept
- do not let `max_samples` and context-budget args overlap
- do not call something “auxiliary” if its weight can dominate optimization
- do not assume OHLC importance means chart-pattern understanding
- do not answer architecture questions by guessing from intention; inspect actual code

## What still matters next

1. verify token-overlap assertions are enforced in train/val, not just token-grouped split logic
2. rebuild cache with the new quota-based builder and inspect actual distributions
3. update cache audit tooling
4. decide whether `samples_per_token` should be replaced by a better attempt planner
5. decide how the chart branch should evolve if the goal is trader-like 1s pattern reasoning rather than trend summary

## Short current state

- cache construction is much closer to the right direction now
- validation logic is less broken than before
- OHLC is confirmed important
- but OHLC is currently behaving more like a trend/summary branch than a technical-pattern branch
- the codebase is still primarily training on return regression, with extra heads layered on top