Was the whole word masking technique used during pre-training?
#1
by ryo0634 - opened
Thank you for sharing the fantastic model!
I have a quick question, was the model trained with the whole word masking technique?
No, it's not trained with WWM, following the original paper.
Training DeBERTa with WWM is one of possible future improvements.
I see, thank you very much for your quick response
ryo0634 changed discussion status to closed