aksinghyaani commited on
Commit
b6aee65
·
verified ·
1 Parent(s): cff872c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -36,7 +36,7 @@ Nandi-Mini-600M introduces several efficiency-focused architectural optimization
36
 
37
  #### Shared KV (Shared Key-Value Vectors)
38
 
39
- Shared KV is one of the core architectural ideas explored in Nandi-Mini. Instead of storing separate Key and Value vectors, both share the same underlying representation, while a lightweight Key normalization step is applied specifically for attention computation.
40
 
41
  This design reduces KV-cache memory usage by ~50% during inference with only a small increase in compute overhead, since RoPE and Key normalization are applied dynamically during attention computation.
42
 
 
36
 
37
  #### Shared KV (Shared Key-Value Vectors)
38
 
39
+ Shared KV is one of the core architectural ideas explored in Nandi-Mini. Instead of computing separate Key and Value projections, both reuse a shared latent representation, while a lightweight Key normalization step is applied specifically for attention computation.
40
 
41
  This design reduces KV-cache memory usage by ~50% during inference with only a small increase in compute overhead, since RoPE and Key normalization are applied dynamically during attention computation.
42