Update modeling_baichuan.py to handle empty Transformers cache objects in generation
#20
by sylwia-kuros - opened
Recent Transformers versions can pass a cache object into generate() before any KV tensors have actually been populated. Baichuan remote code still assumes legacy tuple-style past_key_values and treats any truthy cache as populated.
This causes two problems during generation:
BaichuanModel.forward()readspast_key_values[0][0].shape[2]without checking whether the first cached key is real.prepare_inputs_for_generation()slicesinput_idsdown to the last token wheneverpast_key_valuesis truthy, even if the cache is only an empty placeholder.