OLMo-3 7B Instruct (1-Bit Experimental)
This is an experimental 1-bit quantized version of the OLMo-3 7B Instruct model. It was developed using Quantization Aware Distillation (QAD) techniques. Notably, the entire architecture, including the embeddings, has been fully compressed to 1-bit.
Current Development Status
The model was trained for 12 hours on a cluster of 4x B200 GPUs. Please note that it currently serves as a technical proof of concept and is not intended for production environments.
- Performance: The model is capable of processing basic English and short sequences.
- Known Issues: Due to the experimental nature and training duration, users may encounter frequent repetition loops and limited context tracking.
Usage and Implementation
The required 1-bit kernels have been merged into mainline llama.cpp, simply use any recent llama.cpp build.
llama-server -m olmo3-7b-1bit.gguf --port 8080
Future Roadmap
Future iterations will focus on extending the training duration and refining dataset selection. These steps are expected to significantly stabilize the 1-bit quantization and enhance the model's reasoning capabilities.
License: Apache 2.0 Base Model: allenai/Olmo-3-7B-Instruct
- Downloads last month
- 399
We're not able to determine the quantization variants.
Model tree for cturan/Olmo-3-7B-Instruct-Q1_0
Base model
allenai/Olmo-3-1025-7B