Reduce GPU memory usage in the runtime.
#14
by xiping - opened
After adding 'with torch.no_grad():', memory can be reduced form 11.77G to 3.19G when batch=4, token=1024.
Thanks, it helped a lots! And it works for the rerank model as well.
After adding 'with torch.no_grad():', memory can be reduced form 11.77G to 3.19G when batch=4, token=1024.
Thanks, it helped a lots! And it works for the rerank model as well.