Model Card for gpt-oss-10b-quants

This repository contains multiple quantized versions of the GPT-OSS-10B model in GGUF format.
It is intended for efficient inference on consumer hardware, making large model deployment more accessible.

Model Details

Model Description

Developed by: leeminwaan
Funded by [optional]: Independent project
Shared by [optional]: leeminwaan
Model type: Decoder-only transformer language model
Language(s) (NLP): English (primary), multilingual capabilities not benchmarked
License: Apache-2.0
Finetuned from model [optional]: openai/gpt-oss-20b (via pruning and expert selection)

Model Sources

Repository: Hugging Face Repo
Paper [optional]: Not available
Demo [optional]: To be released

Uses

Direct Use

Text generation
Experimentation with quantization formats
Running benchmarks on low-resource hardware

Downstream Use

Fine-tuning for chatbots, classification, summarization, or research projects
Integration into lightweight inference pipelines

Out-of-Scope Use

High-stakes decision making (medical, legal, financial)
Content moderation without further fine-tuning
Applications requiring guaranteed factual accuracy

Bias, Risks, and Limitations

May reproduce societal biases from training data
Limited evaluation on multilingual or domain-specific text
Quantization may degrade accuracy slightly compared to full precision

Recommendations

Run evaluations before production deployment
Do not use outputs as factual truth without human verification

How to Get Started with the Model

from huggingface_hub import hf_hub_download

model_path = hf_hub_download("leeminwaan/gpt-oss-10.6B-GGUF", "gpt-oss-10b-q4_k_m.gguf")
print("Downloaded:", model_path)

Quantized versions available:

Q2_K, Q3_K_S, Q3_K_M, Q3_K_L
Q4_0, Q4_1, Q4_K_S, Q4_K_M
Q5_0, Q5_1, Q5_K_S, Q5_K_M
Q6_K, Q8_0

Training Details

Training Data

Based on GPT-OSS-20B pretraining corpus (public large-scale web text, open datasets).
No additional fine-tuning was performed for this release.

Training Procedure

Original GPT-OSS-20B → pruned to 10B experts → quantized to GGUF formats.

Preprocessing

Standard tokenization, no special preprocessing for quantization.

Training Hyperparameters

Quantization only; no gradient updates performed.
Storage optimized for GGUF inference.

Speeds, Sizes, Times

Full FP16 checkpoint size ~20B → reduced to ~10B experts → GGUF quantizations ranging from ~3GB to ~7GB.

Evaluation

Testing Data

No dedicated evaluation dataset; informal testing on open prompts.

Factors

Quantization level strongly affects perplexity and memory footprint.

Metrics

Perplexity (approximate, not benchmarked formally).
Memory usage on consumer GPUs/CPUs.

Results

Q8_0 maintains near full precision quality.
Q4_K_M, Q5_K_M provide good trade-off between performance and quality.

Summary

Quantized models are suitable for lightweight inference and experimentation.

Model Examination

No interpretability analysis yet.

Technical Specifications

Model Architecture and Objective

Decoder-only Transformer
Optimized for text generation

Compute Infrastructure

Hardware

Single RTX 3090 (24GB VRAM) for quantization tasks

Software

llama.cpp for quantization
Python 3.10, huggingface_hub

Citation

BibTeX:

@misc{gptoss10bquants,
  title={GPT-OSS-10B Quantized Models},
  author={leeminwaan},
  year={2025},
  howpublished={\url{https://huggingface.co/leeminwaan/gpt-oss-10b-quants}}
}

APA:

leeminwaan. (2025). GPT-OSS-10B Quantized Models [Computer software]. Hugging Face. https://huggingface.co/leeminwaan/gpt-oss-10b-quants

Glossary

Quantization: Reducing precision of weights to lower memory usage.
GGUF: Optimized format for llama.cpp inference.

More Information

This project is experimental.
Expect further updates and quantization benchmarks.

Model Card Authors

leeminwaan

Model Card Contact

Hugging Face: leeminwaan

Downloads last month: 177

GGUF

Model size

10B params

Architecture

gpt-oss

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

View +1 variant

Model tree for leeminwaan/gpt-oss-20b-pruned-10.2B-GGUF

Base model

AmanPriyanshu/gpt-oss-10.2b-specialized-all-pruned-moe-only-14-experts

Quantized

(1)

this model

Collection including leeminwaan/gpt-oss-20b-pruned-10.2B-GGUF

gpt-oss-20b-pruned-GGUF

Collection

Bascially GPT OSS but for low end device, format in GGUF • 4 items • Updated Sep 2, 2025 • 1