OLMo-3 7B Instruct (1-Bit Experimental)

This is an experimental 1-bit quantized version of the OLMo-3 7B Instruct model. It was developed using Quantization Aware Distillation (QAD) techniques. Notably, the entire architecture, including the embeddings, has been fully compressed to 1-bit.

Current Development Status

The model was trained for 12 hours on a cluster of 4x B200 GPUs. Please note that it currently serves as a technical proof of concept and is not intended for production environments.

  • Performance: The model is capable of processing basic English and short sequences.
  • Known Issues: Due to the experimental nature and training duration, users may encounter frequent repetition loops and limited context tracking.

Usage and Implementation

The required 1-bit kernels have been merged into mainline llama.cpp, simply use any recent llama.cpp build.

llama-server -m olmo3-7b-1bit.gguf --port 8080

Future Roadmap

Future iterations will focus on extending the training duration and refining dataset selection. These steps are expected to significantly stabilize the 1-bit quantization and enhance the model's reasoning capabilities.


License: Apache 2.0 Base Model: allenai/Olmo-3-7B-Instruct

Downloads last month
399
GGUF
Model size
7B params
Architecture
olmo2
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cturan/Olmo-3-7B-Instruct-Q1_0