Instructions to use sinimiini/HRM-Text-1B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use sinimiini/HRM-Text-1B-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="sinimiini/HRM-Text-1B-GGUF", filename="HRM-Text-1B-BF16.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use sinimiini/HRM-Text-1B-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf sinimiini/HRM-Text-1B-GGUF:BF16 # Run inference directly in the terminal: llama-cli -hf sinimiini/HRM-Text-1B-GGUF:BF16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf sinimiini/HRM-Text-1B-GGUF:BF16 # Run inference directly in the terminal: llama-cli -hf sinimiini/HRM-Text-1B-GGUF:BF16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf sinimiini/HRM-Text-1B-GGUF:BF16 # Run inference directly in the terminal: ./llama-cli -hf sinimiini/HRM-Text-1B-GGUF:BF16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf sinimiini/HRM-Text-1B-GGUF:BF16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf sinimiini/HRM-Text-1B-GGUF:BF16
Use Docker
docker model run hf.co/sinimiini/HRM-Text-1B-GGUF:BF16
- LM Studio
- Jan
- vLLM
How to use sinimiini/HRM-Text-1B-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "sinimiini/HRM-Text-1B-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sinimiini/HRM-Text-1B-GGUF", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/sinimiini/HRM-Text-1B-GGUF:BF16
- Ollama
How to use sinimiini/HRM-Text-1B-GGUF with Ollama:
ollama run hf.co/sinimiini/HRM-Text-1B-GGUF:BF16
- Unsloth Studio new
How to use sinimiini/HRM-Text-1B-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for sinimiini/HRM-Text-1B-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for sinimiini/HRM-Text-1B-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for sinimiini/HRM-Text-1B-GGUF to start chatting
- Docker Model Runner
How to use sinimiini/HRM-Text-1B-GGUF with Docker Model Runner:
docker model run hf.co/sinimiini/HRM-Text-1B-GGUF:BF16
- Lemonade
How to use sinimiini/HRM-Text-1B-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull sinimiini/HRM-Text-1B-GGUF:BF16
Run and chat with the model
lemonade run user.HRM-Text-1B-GGUF-BF16
List all available models
lemonade list
HRM-Text-1B GGUF
This repository contains a BF16 GGUF conversion of sapientinc/HRM-Text-1B and validated Q8_0, Q6_K, and Q5_K_M quantizations derived from that BF16 GGUF.
The GGUF files use:
general.architecture = hrm_text- BF16 source tensor storage or standard
llama.cppquantized tensor storage - the original tokenizer from
tokenizer.json - no injected chat template
This is not a chat model and is not instruction tuned. "Useful output" for this repository means alignment with the original Transformers model on the same prompt, not chat-assistant behavior.
Compatibility Notice
Standard upstream llama.cpp, Ollama, LM Studio, and llama-cpp-python are expected not to load this file until hrm_text is supported upstream.
Use the included patch:
runtime/llama.cpp-hrm_text.patch
The patch was built against:
ggml-org/llama.cpp commit 6a257d44633d4a752183ed778b88d2924d0a6b9d
Only the normal causal generation path is implemented in the patched runtime. Prefix-LM bidirectional token_type_ids are not supported by the llama.cpp path in this release.
Files
| File | Description |
|---|---|
HRM-Text-1B-BF16.gguf |
BF16 GGUF conversion of sapientinc/HRM-Text-1B |
HRM-Text-1B-Q8_0.gguf |
Validated Q8_0 quantization from BF16 |
HRM-Text-1B-Q6_K.gguf |
Validated Q6_K quantization from BF16 |
HRM-Text-1B-Q5_K_M.gguf |
Validated Q5_K_M quantization from BF16 |
runtime/llama.cpp-hrm_text.patch |
Patch adding hrm_text conversion and runtime support to the clean llama.cpp base commit |
reports/validation/final_report.md |
Human-readable conversion and validation report |
reports/validation/quantization_report.md |
Quantization report, hashes, and pass/fail summary |
reports/validation/baseline_transformers.json |
Transformers baseline prompts, logits, and continuations |
reports/validation/bf16_tensor_validation.json |
Tensor-level GGUF validation |
reports/validation/bf16_vs_hf.json |
Runtime logit and text validation |
reports/validation/q8_0_vs_bf16.json |
Q8_0 vs BF16 runtime validation |
reports/validation/q6_k_vs_bf16.json |
Q6_K vs BF16 runtime validation |
reports/validation/q5_k_m_vs_bf16.json |
Q5_K_M vs BF16 runtime validation |
Provenance
| Item | Value |
|---|---|
| Source model | sapientinc/HRM-Text-1B |
| Source snapshot SHA | 2285b999f6fb8a5b16e0cc313a9e8e4fe447140d |
Source model.safetensors SHA256 |
F8FE2B2BF6948414E8E8D6538659198726D98F967C55B533B7AABE8A1FA9A584 |
| BF16 GGUF SHA256 | 2DD5E2EF55E40C46DB0D0CB4CF1427A4E72DA34FEE36F0D2B73D081D0E1C2010 |
| BF16 GGUF size | 2,367,995,648 bytes |
| llama.cpp base commit | 6a257d44633d4a752183ed778b88d2924d0a6b9d |
Available GGUF Files
| Variant | File | Size (bytes) | SHA256 |
|---|---|---|---|
| BF16 | HRM-Text-1B-BF16.gguf |
2367995648 |
2DD5E2EF55E40C46DB0D0CB4CF1427A4E72DA34FEE36F0D2B73D081D0E1C2010 |
| Q8_0 | HRM-Text-1B-Q8_0.gguf |
1259126560 |
C0729C267C3421E1F6DE0488AC5448E98EA30E56514DAF210596B70AC3F9786D |
| Q6_K | HRM-Text-1B-Q6_K.gguf |
972668704 |
24D93CA4EF4A02CFE415E3EA56A78AD65198A165A4157B928004B58DBDA2D93C |
| Q5_K_M | HRM-Text-1B-Q5_K_M.gguf |
851509024 |
F6CE71A076EC897174C555D810ED6E379767D52F9396D485B42E42BF8DB1D0B7 |
Validation Summary
Validation was performed from a clean source snapshot and a clean llama.cpp base checkout.
| Check | Result |
|---|---|
| Tensor validation | Pass, 259/259 tensors found and compared |
| Tensor values | BF16 tensor bits match HF after expected BF16 conversion |
| Prompt token IDs | Match for all validation prompts |
| Next-token top-1 | Match on 4/4 prompts |
| Top-10 overlap | 10/10 for all prompts |
| Text validation | BF16 GGUF continuations are aligned with Transformers baseline |
Quantized variants were validated against the BF16 GGUF:
| Variant | Token IDs | Top-1 matches | Min top-10 overlap | New loop check | Result |
|---|---|---|---|---|---|
| Q8_0 | Pass | 4/4 |
9/10 |
Pass | Pass |
| Q6_K | Pass | 4/4 |
9/10 |
Pass | Pass |
| Q5_K_M | Pass | 4/4 |
9/10 |
Pass | Pass |
Full-vocab mean absolute logit error:
| Prompt | MAE |
|---|---|
The quick brown fox |
0.0199148655 |
In a distant future, humanity |
0.0051696529 |
Question: What is 2+2?\nAnswer: |
0.0076530445 |
def fibonacci(n): |
0.0045031775 |
The original model already repeats on some prompts. Repetition by itself is not treated as a conversion failure unless it is newly introduced by the GGUF runtime. The BF16 GGUF validation did not reproduce the unrelated garbage pattern seen in a previous broken conversion attempt.
Example Runtime Setup
Download this repository:
pip install -U huggingface_hub
hf download sinimiini/HRM-Text-1B-GGUF --local-dir HRM-Text-1B-GGUF
Patch and build llama.cpp:
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp
git checkout 6a257d44633d4a752183ed778b88d2924d0a6b9d
git apply ..\HRM-Text-1B-GGUF\runtime\llama.cpp-hrm_text.patch
cmake -B build -S . -DGGML_NATIVE=OFF
cmake --build build --config Release --target llama-cli llama-completion llama-results
Run a short causal-generation smoke test:
.\build\bin\Release\llama-cli.exe -m ..\HRM-Text-1B-GGUF\HRM-Text-1B-BF16.gguf -p "The quick brown fox" -n 32 --temp 0 --no-conversation
Depending on the generator binary and llama.cpp build type, the executable may be under build\bin\llama-cli.exe instead of build\bin\Release\llama-cli.exe.
Limitations
hrm_textis a custom GGUF architecture in this conversion.- Generic GGUF runners will not work until they implement the HRM runtime graph.
- Prefix-LM bidirectional attention with
token_type_idsis not implemented in the patchedllama.cpppath.
License
The source model is released under the Apache 2.0 license. See LICENSE.
- Downloads last month
- -
5-bit
6-bit
8-bit
16-bit
Model tree for sinimiini/HRM-Text-1B-GGUF
Base model
sapientinc/HRM-Text-1B