aksinghyaani commited on
Commit
22a013d
·
verified ·
1 Parent(s): 918168d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -1
README.md CHANGED
@@ -72,7 +72,41 @@ Stay tuned!
72
 
73
  Nandi-Mini-500M introduces several efficiency-focused architectural optimizations designed for compact yet capable language models.
74
 
75
- #### Shared KV (Shared Key-Value Vectors)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
76
 
77
  Shared KV is one of the core architectural ideas explored in Nandi-Mini. Instead of storing separate Key and Value vectors, both share the same underlying representation, while a lightweight Key normalization step is applied specifically for attention computation.
78
 
 
72
 
73
  Nandi-Mini-500M introduces several efficiency-focused architectural optimizations designed for compact yet capable language models.
74
 
75
+
76
+ ### Shared KV KV-Cache Memory Comparison
77
+
78
+ The following comparison illustrates the KV-cache memory reduction enabled by Shared KV mode.
79
+
80
+ ```python
81
+ import matplotlib.pyplot as plt
82
+
83
+ modes = ["Vanilla KV", "Shared KV"]
84
+ memory = [100, 50]
85
+
86
+ plt.figure(figsize=(5,4))
87
+ bars = plt.bar(modes, memory)
88
+
89
+ plt.ylabel("Relative KV Cache Memory")
90
+ plt.title("KV Cache Memory Usage")
91
+
92
+ for bar, val in zip(bars, memory):
93
+ plt.text(
94
+ bar.get_x() + bar.get_width()/2,
95
+ val + 2,
96
+ f"{val}%",
97
+ ha='center'
98
+ )
99
+
100
+ plt.ylim(0, 120)
101
+ plt.show()
102
+ ```
103
+
104
+ Expected result:
105
+
106
+ - Vanilla KV → 100% KV-cache memory
107
+ - Shared KV → ~50% KV-cache memory
108
+
109
+ Shared KV trades a small increase in compute overhead for significantly lower memory usage, since RoPE and Key normalization are applied dynamically during attention computation.
110
 
111
  Shared KV is one of the core architectural ideas explored in Nandi-Mini. Instead of storing separate Key and Value vectors, both share the same underlying representation, while a lightweight Key normalization step is applied specifically for attention computation.
112