deepseek-ai
/

DeepSeek-V2

Text Generation

text-generation-inference

Model card Files Files and versions

Resources

View closed (1)

Add GSM8K eval result (79.2)

#11 opened 27 days ago by

typo spot: gready->greedy

#10 opened about 1 year ago by

Exact computations for multi-head latent attention

#9 opened about 1 year ago by

This is by far the best model I have seen until now.

#8 opened almost 2 years ago by

How many tokens per second when using Deepseek-V2(236B) as inference model in 8*A100

#7 opened almost 2 years ago by

Can DeepSeek-V2 run on two nodes (each with 4 A100)?

#5 opened almost 2 years ago by

Calculation of _mscale during YARN RoPE scaling

#4 opened almost 2 years ago by

keyError: 'sdpa'

#3 opened almost 2 years ago by

Smaller Models

#2 opened almost 2 years ago by

KV Cache for compress_kv or key-value states

#1 opened almost 2 years ago by