arxiv:2605.14200
Leena Chennuru Vankadara
leenachennuru
AI & ML interests
None yet
Recent Activity
authored a paper about 9 hours ago
How to Scale Mixture-of-Experts: From muP to the Maximally Scale-Stable Parameterization authored a paper about 9 hours ago
On the Surprising Effectiveness of Large Learning Rates under Standard Width Scaling authored a paper about 9 hours ago
μP$^2$: Effective Sharpness Aware Minimization Requires Layerwise Perturbation ScalingOrganizations
None yet