Model Card for gpt-oss-10b-quants

This repository contains multiple quantized versions of the GPT-OSS-10B model in GGUF format.
It is intended for efficient inference on consumer hardware, making large model deployment more accessible.

Model Details

Model Description

  • Developed by: leeminwaan
  • Funded by [optional]: Independent project
  • Shared by [optional]: leeminwaan
  • Model type: Decoder-only transformer language model
  • Language(s) (NLP): English (primary), multilingual capabilities not benchmarked
  • License: Apache-2.0
  • Finetuned from model [optional]: openai/gpt-oss-20b (via pruning and expert selection)

Model Sources

  • Repository: Hugging Face Repo
  • Paper [optional]: Not available
  • Demo [optional]: To be released

Uses

Direct Use

  • Text generation
  • Experimentation with quantization formats
  • Running benchmarks on low-resource hardware

Downstream Use

  • Fine-tuning for chatbots, classification, summarization, or research projects
  • Integration into lightweight inference pipelines

Out-of-Scope Use

  • High-stakes decision making (medical, legal, financial)
  • Content moderation without further fine-tuning
  • Applications requiring guaranteed factual accuracy

Bias, Risks, and Limitations

  • May reproduce societal biases from training data
  • Limited evaluation on multilingual or domain-specific text
  • Quantization may degrade accuracy slightly compared to full precision

Recommendations

  • Run evaluations before production deployment
  • Do not use outputs as factual truth without human verification

How to Get Started with the Model

from huggingface_hub import hf_hub_download

model_path = hf_hub_download("leeminwaan/gpt-oss-10.6B-GGUF", "gpt-oss-10b-q4_k_m.gguf")
print("Downloaded:", model_path)

Quantized versions available:

  • Q2_K, Q3_K_S, Q3_K_M, Q3_K_L
  • Q4_0, Q4_1, Q4_K_S, Q4_K_M
  • Q5_0, Q5_1, Q5_K_S, Q5_K_M
  • Q6_K, Q8_0

Training Details

Training Data

  • Based on GPT-OSS-20B pretraining corpus (public large-scale web text, open datasets).
  • No additional fine-tuning was performed for this release.

Training Procedure

  • Original GPT-OSS-20B β†’ pruned to 10B experts β†’ quantized to GGUF formats.

Preprocessing

  • Standard tokenization, no special preprocessing for quantization.

Training Hyperparameters

  • Quantization only; no gradient updates performed.
  • Storage optimized for GGUF inference.

Speeds, Sizes, Times

  • Full FP16 checkpoint size ~20B β†’ reduced to ~10B experts β†’ GGUF quantizations ranging from ~3GB to ~7GB.

Evaluation

Testing Data

  • No dedicated evaluation dataset; informal testing on open prompts.

Factors

  • Quantization level strongly affects perplexity and memory footprint.

Metrics

  • Perplexity (approximate, not benchmarked formally).
  • Memory usage on consumer GPUs/CPUs.

Results

  • Q8_0 maintains near full precision quality.
  • Q4_K_M, Q5_K_M provide good trade-off between performance and quality.

Summary

Quantized models are suitable for lightweight inference and experimentation.

Model Examination

  • No interpretability analysis yet.

Technical Specifications

Model Architecture and Objective

  • Decoder-only Transformer
  • Optimized for text generation

Compute Infrastructure

Hardware

  • Single RTX 3090 (24GB VRAM) for quantization tasks

Software

  • llama.cpp for quantization
  • Python 3.10, huggingface_hub

Citation

BibTeX:

@misc{gptoss10bquants,
  title={GPT-OSS-10B Quantized Models},
  author={leeminwaan},
  year={2025},
  howpublished={\url{https://huggingface.co/leeminwaan/gpt-oss-10b-quants}}
}

APA:

leeminwaan. (2025). GPT-OSS-10B Quantized Models [Computer software]. Hugging Face. https://huggingface.co/leeminwaan/gpt-oss-10b-quants

Glossary

  • Quantization: Reducing precision of weights to lower memory usage.
  • GGUF: Optimized format for llama.cpp inference.

More Information

  • This project is experimental.
  • Expect further updates and quantization benchmarks.

Model Card Authors

  • leeminwaan

Model Card Contact

Downloads last month
177
GGUF
Model size
10B params
Architecture
gpt-oss
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for leeminwaan/gpt-oss-20b-pruned-10.2B-GGUF

Collection including leeminwaan/gpt-oss-20b-pruned-10.2B-GGUF