Octen-Embedding-0.6B โ GGUF
GGUF conversion of Octen/Octen-Embedding-0.6B for use with CrispEmbed.
Model Details
- Architecture: Qwen3 decoder with GQA (16 Q heads, 8 KV heads, head_dim=128)
- Parameters: 0.6B (28 layers, 1024 hidden, 3072 intermediate)
- Embedding dim: 1024
- Pooling: Last-token
- Tokenizer: GPT-2 BPE (151K vocab)
- RoPE: theta=1,000,000
- License: Apache-2.0
Files
| File | Type | Size | CosSim vs HF |
|---|---|---|---|
octen-0.6b-f32.gguf |
F32 | 2.3 GB | 0.9999 |
octen-0.6b-q8_0.gguf |
Q8_0 | 609 MB | 0.9993 |
octen-0.6b-q4_k.gguf |
Q4_K | 325 MB | 0.9570 |
Usage with CrispEmbed
./crispembed -m octen-0.6b-q8_0.gguf "Hello world"
# prints 1024-dim L2-normalized embedding
# Server mode
./crispembed-server -m octen-0.6b-q8_0.gguf --port 8080
curl -X POST http://localhost:8080/embed -d '{"texts": ["Hello world"]}'
Conversion
Converted from the original PyTorch model using models/convert-decoder-embed-to-gguf.py from the CrispEmbed repo. Verified bit-identical (cosโฅ0.999) to HuggingFace sentence-transformers output.
- Downloads last month
- 103
Hardware compatibility
Log In to add your hardware
8-bit
32-bit
Model tree for cstr/Octen-Embedding-0.6B-GGUF
Base model
Qwen/Qwen3-0.6B-Base Finetuned
Qwen/Qwen3-Embedding-0.6B Finetuned
Octen/Octen-Embedding-0.6B