Occured problem at long context
#3
by Se-Hun - opened
I found empty output string when long context is passed to this model.
As my inference testing, i suggest that this problem is occurred in case of text longer than 2000 tokens (or about 2040 tokens) is passed.
Why this problem is occured ? Is it caused by your dataset configurations ?
Se-Hun changed discussion title from List of datasets to Occured problem at long context
Did you check max_position_embedding in config.json? I guess this problem occurd by token length. Also, check the tokenizer with your data language. Becuase llama's vocab does not contain much tokens except english subwords.