Hannu Varjoranta
varjoranta
ยท
AI & ML interests
Weight and KV cache compression for production LLM serving. Building turboquant-plus-vllm.
Recent Activity
updated a model 3 days ago
varjosoft/GLM-4.7-Flash-TQ3 published a model 3 days ago
varjosoft/GLM-4.7-Flash-TQ3 new activity 3 days ago
varjosoft/GLM-5.1-Open-TQ3:Pending GPU & vLLM validation