Spaces:

InosLihka
/

rhythm_env

Sleeping

InosLihka commited on 13 days ago

Commit

1217c1d

1 Parent(s): ff20f02

Fix prompt truncation in inference_eval.py: max_seq_length 768 -> 2048

Our user prompts (system + observation with 7-step history + anomalies)
are 900-1200 tokens. The model was being loaded with max_seq_length=768,
silently truncating prompts on the left. The model was seeing the END of
the prompt (last few history entries) but missing the system prompt or
older context. This explains why the earlier 'v1 buggy eval' had muted
final_score even though belief_MAE was good — the model was generating
from incomplete context.

The HF Job logs surfaced this clearly:
'Unsloth: Input IDs of shape torch.Size([1, 986]) with length 986 >
the model's max sequence length of 768. We shall truncate it ourselves.'

Bumping to 2048 leaves room for ~1200 prompt + 256 generation + slack.

Files changed (1) hide show

training/inference_eval.py +7 -1

training/inference_eval.py CHANGED Viewed

@@ -219,10 +219,16 @@ def main():
     if args.model_path and os.path.exists(args.model_path):
         try:
             from unsloth import FastLanguageModel
             model, tokenizer = FastLanguageModel.from_pretrained(
                 model_name=args.model_path,
                 load_in_4bit=True,
-                max_seq_length=768,
             )
             FastLanguageModel.for_inference(model)
             strategies.append("model")

     if args.model_path and os.path.exists(args.model_path):
         try:
             from unsloth import FastLanguageModel
+            # max_seq_length=2048 must accommodate: user prompt with 7-step
+            # history + per-meter anomalies (~900-1200 tokens) PLUS
+            # max_new_tokens=256 for the CoT response. Earlier value of 768
+            # silently truncated prompts on the LEFT (kept end of prompt,
+            # lost system instructions or older meter history), producing
+            # incoherent model outputs.
             model, tokenizer = FastLanguageModel.from_pretrained(
                 model_name=args.model_path,
                 load_in_4bit=True,
+                max_seq_length=2048,
             )
             FastLanguageModel.for_inference(model)
             strategies.append("model")