OLMo-3 7B Instruct (1-Bit Experimental)

This is an experimental 1-bit quantized version of the OLMo-3 7B Instruct model. It was developed using Quantization Aware Distillation (QAD) techniques. Notably, the entire architecture, including the embeddings, has been fully compressed to 1-bit.

Current Development Status

The model was trained for 12 hours on a cluster of 4x B200 GPUs. Please note that it currently serves as a technical proof of concept and is not intended for production environments.

Performance: The model is capable of processing basic English and short sequences.
Known Issues: Due to the experimental nature and training duration, users may encounter frequent repetition loops and limited context tracking.

Usage and Implementation

The required 1-bit kernels have been merged into mainline llama.cpp, simply use any recent llama.cpp build.

llama-server -m olmo3-7b-1bit.gguf --port 8080

Future Roadmap

Future iterations will focus on extending the training duration and refining dataset selection. These steps are expected to significantly stabilize the 1-bit quantization and enhance the model's reasoning capabilities.

License: Apache 2.0 Base Model: allenai/Olmo-3-7B-Instruct