Update modeling_baichuan.py to handle empty Transformers cache objects in generation

#20

Recent Transformers versions can pass a cache object into generate() before any KV tensors have actually been populated. Baichuan remote code still assumes legacy tuple-style past_key_values and treats any truthy cache as populated.

This causes two problems during generation:

  • BaichuanModel.forward() reads past_key_values[0][0].shape[2] without checking whether the first cached key is real.
  • prepare_inputs_for_generation() slices input_ids down to the last token whenever past_key_values is truthy, even if the cache is only an empty placeholder.
Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment