"use_bidirectional_attention": true flag
#13
by michaelfeil - opened
No description provided.
ybabakhin changed pull request status to merged
Did this flag broke the implementation with the vLLM container from NVIDIA? see here
https://forums.developer.nvidia.com/t/getting-nemotron-embed-working-on-dgx-spark/359447/2