"use_bidirectional_attention": true flag

#13
No description provided.
ybabakhin changed pull request status to merged

Did this flag broke the implementation with the vLLM container from NVIDIA? see here

https://forums.developer.nvidia.com/t/getting-nemotron-embed-working-on-dgx-spark/359447/2

Sign up or log in to comment