Nesso-0.4B-Instruct Q6_K GGUF

Small Italian/English model (~0.4B parameters) based on mii-llm/nesso-0.4B-instruct, quantized to Q6_K using llama.cpp.

Features

  • Quantization: Q6_K (excellent balance between quality, speed, and RAM usage)
  • Format: GGUF (compatible with llama.cpp, Ollama, LM Studio, KoboldCPP, etc.)
  • File size: approximately 580 MB
  • Base: converted from Hugging Face โ†’ GGUF f16 โ†’ quantized to Q6_K

How to use it

Example with llama.cpp:

./llama-cli -m nesso-0.4B-Q6_K.gguf --temp 0.7 --repeat-penalty 1.15 -p "Your prompt here"
Downloads last month
9
GGUF
Model size
0.6B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

6-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support