Octen-Embedding-0.6B โ€” GGUF

GGUF conversion of Octen/Octen-Embedding-0.6B for use with CrispEmbed.

Model Details

  • Architecture: Qwen3 decoder with GQA (16 Q heads, 8 KV heads, head_dim=128)
  • Parameters: 0.6B (28 layers, 1024 hidden, 3072 intermediate)
  • Embedding dim: 1024
  • Pooling: Last-token
  • Tokenizer: GPT-2 BPE (151K vocab)
  • RoPE: theta=1,000,000
  • License: Apache-2.0

Files

File Type Size CosSim vs HF
octen-0.6b-f32.gguf F32 2.3 GB 0.9999
octen-0.6b-q8_0.gguf Q8_0 609 MB 0.9993
octen-0.6b-q4_k.gguf Q4_K 325 MB 0.9570

Usage with CrispEmbed

./crispembed -m octen-0.6b-q8_0.gguf "Hello world"
# prints 1024-dim L2-normalized embedding

# Server mode
./crispembed-server -m octen-0.6b-q8_0.gguf --port 8080
curl -X POST http://localhost:8080/embed -d '{"texts": ["Hello world"]}'

Conversion

Converted from the original PyTorch model using models/convert-decoder-embed-to-gguf.py from the CrispEmbed repo. Verified bit-identical (cosโ‰ฅ0.999) to HuggingFace sentence-transformers output.

Downloads last month
103
GGUF
Model size
0.6B params
Architecture
decoder_embed
Hardware compatibility
Log In to add your hardware

8-bit

32-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for cstr/Octen-Embedding-0.6B-GGUF

Quantized
(11)
this model