Add GSM8K eval result (79.2)
#11 opened 27 days ago
by
julien-c
typo spot: gready->greedy
#10 opened about 1 year ago
by
Jeol
Exact computations for multi-head latent attention
1
#9 opened about 1 year ago
by
mseeger
This is by far the best model I have seen until now.
🤝 1
2
#8 opened almost 2 years ago
by
ZeroWw
How many tokens per second when using Deepseek-V2(236B) as inference model in 8*A100
1
#7 opened almost 2 years ago
by
harvin-cn
Can DeepSeek-V2 run on two nodes (each with 4 A100)?
👍 1
1
#5 opened almost 2 years ago
by
jy395
Calculation of _mscale during YARN RoPE scaling
1
#4 opened almost 2 years ago
by
sszymczyk
keyError: 'sdpa'
1
#3 opened almost 2 years ago
by
minglingfeng
Smaller Models
👍 10
1
#2 opened almost 2 years ago
by
puffy310
KV Cache for compress_kv or key-value states
6
#1 opened almost 2 years ago
by
House-99