πŸš€ v0.1.6: Real-time Metrics & Blackwell-Optimized Docker (Recommended)

This model is fully compatible with the DGX-Spark-llama.cpp-Bench. Experience the state-of-the-art inference engine optimized for NVIDIA Blackwell (DGX Spark) hardware.

🌟 Key Features (v0.1.6)

  • Real-time Performance Metrics: Now visualizes Input TPS and Output TPS during streaming.
  • Improved Reasoning UI: Seamlessly renders and stabilizes the model's Chain-of-Thought (CoT).
  • Blackwell Optimization: Native support for ARM64/SM121 and CUDA 13.0 FP4.

🐳 Quick Start

# Pull the latest optimized image
docker pull ghcr.io/sowilow/dgx-spark-llama.cpp-bench:v0.1.6

For more details, visit our GitHub Repository.


πŸš€ v0.1.6: μ‹€μ‹œκ°„ μ§€ν‘œ 및 Blackwell μ΅œμ ν™” 도컀 (ꢌμž₯)

이 λͺ¨λΈμ€ DGX-Spark-llama.cpp-Bench μ‹œμŠ€ν…œμ— μ΅œμ ν™”λ˜μ–΄ μžˆμŠ΅λ‹ˆλ‹€. NVIDIA Blackwell (DGX Spark) ν•˜λ“œμ›¨μ–΄μ˜ μ„±λŠ₯을 μ΅œλŒ€λ‘œ ν™œμš©ν•˜μ„Έμš”.

🌟 μ£Όμš” νŠΉμ§• (v0.1.6)

  • μ‹€μ‹œκ°„ μ„±λŠ₯ μ§€ν‘œ μ‹œκ°ν™”: 슀트리밍 쀑 Input TPS 및 Output TPSλ₯Ό μ‹€μ‹œκ°„μœΌλ‘œ ν‘œμ‹œν•©λ‹ˆλ‹€.
  • μ§€λŠ₯ν˜• μΆ”λ‘  UI 고도화: λͺ¨λΈμ˜ μƒκ°ν•˜λŠ” κ³Όμ •(CoT)을 더 μ•ˆμ •μ μœΌλ‘œ λ Œλ”λ§ν•©λ‹ˆλ‹€.
  • Blackwell μ΅œμ ν™”: ARM64/SM121 μ•„ν‚€ν…μ²˜ 및 CUDA 13.0 FP4 가속 지원.

🐳 μ‹€ν–‰ 방법

# μ΅œμ‹  μ΅œμ ν™” 이미지 λ‚΄λ €λ°›κΈ°
docker pull ghcr.io/sowilow/dgx-spark-llama.cpp-bench:v0.1.6

μƒμ„Έν•œ μ‚¬μš©λ²•μ€ GitHub 리포지토리λ₯Ό μ°Έμ‘°ν•˜μ„Έμš”.



πŸš€ v0.1.5: Real-time Metrics & Blackwell-Optimized Docker (Recommended)

This model is fully compatible with the DGX-Spark-llama.cpp-Bench. Experience the state-of-the-art inference engine optimized for NVIDIA Blackwell (DGX Spark) hardware.

🌟 Key Features (v0.1.5)

  • Real-time Performance Metrics: Now visualizes Input TPS and Output TPS during streaming.
  • Improved Reasoning UI: Seamlessly renders and stabilizes the model's Chain-of-Thought (CoT).
  • Blackwell Optimization: Native support for ARM64/SM121 and CUDA 13.0 FP4.

🐳 Quick Start

# Pull the latest optimized image
docker pull ghcr.io/sowilow/dgx-spark-llama.cpp-bench:v0.1.5

For more details, visit our GitHub Repository.


πŸš€ v0.1.5: μ‹€μ‹œκ°„ μ§€ν‘œ 및 Blackwell μ΅œμ ν™” 도컀 (ꢌμž₯)

이 λͺ¨λΈμ€ DGX-Spark-llama.cpp-Bench μ‹œμŠ€ν…œμ— μ΅œμ ν™”λ˜μ–΄ μžˆμŠ΅λ‹ˆλ‹€. NVIDIA Blackwell (DGX Spark) ν•˜λ“œμ›¨μ–΄μ˜ μ„±λŠ₯을 μ΅œλŒ€λ‘œ ν™œμš©ν•˜μ„Έμš”.

🌟 μ£Όμš” νŠΉμ§• (v0.1.5)

  • μ‹€μ‹œκ°„ μ„±λŠ₯ μ§€ν‘œ μ‹œκ°ν™”: 슀트리밍 쀑 Input TPS 및 Output TPSλ₯Ό μ‹€μ‹œκ°„μœΌλ‘œ ν‘œμ‹œν•©λ‹ˆλ‹€.
  • μ§€λŠ₯ν˜• μΆ”λ‘  UI 고도화: λͺ¨λΈμ˜ μƒκ°ν•˜λŠ” κ³Όμ •(CoT)을 더 μ•ˆμ •μ μœΌλ‘œ λ Œλ”λ§ν•©λ‹ˆλ‹€.
  • Blackwell μ΅œμ ν™”: ARM64/SM121 μ•„ν‚€ν…μ²˜ 및 CUDA 13.0 FP4 가속 지원.

🐳 μ‹€ν–‰ 방법

# μ΅œμ‹  μ΅œμ ν™” 이미지 λ‚΄λ €λ°›κΈ°
docker pull ghcr.io/sowilow/dgx-spark-llama.cpp-bench:v0.1.5

μƒμ„Έν•œ μ‚¬μš©λ²•μ€ GitHub 리포지토리λ₯Ό μ°Έμ‘°ν•˜μ„Έμš”.



πŸš€ v0.1.4: Quick Start with Blackwell-Optimized Docker (Recommended)

This model is fully compatible with the DGX-Spark-llama.cpp-Bench. Experience the best performance on NVIDIA Blackwell (DGX Spark) hardware with our optimized inference engine.

🌟 Key Features (v0.1.4)

  • Blackwell Optimized: Native support for ARM64/SM121 and CUDA 13.0 FP4.
  • Intelligent Reasoning UI: Automatic extraction and visualization of reasoning processes (CoT).
  • One-Click Deployment: Standardized environment via GHCR Docker image.

🐳 How to Run

# Pull the latest optimized image
docker pull ghcr.io/sowilow/dgx-spark-llama.cpp-bench:v0.1.4

# Follow the instructions in our repo to serve this model
# GitHub: https://github.com/sowilow/DGX-Spark-llama.cpp-Bench

πŸš€ v0.1.4: Blackwell μ΅œμ ν™” 도컀 ν€΅μŠ€νƒ€νŠΈ (ꢌμž₯)

이 λͺ¨λΈμ€ DGX-Spark-llama.cpp-Bench μ‹œμŠ€ν…œμ— μ΅œμ ν™”λ˜μ–΄ μžˆμŠ΅λ‹ˆλ‹€. NVIDIA Blackwell (DGX Spark) ν•˜λ“œμ›¨μ–΄μ˜ μ„±λŠ₯을 μ΅œλŒ€λ‘œ ν™œμš©ν•˜λŠ” μ΅œμ ν™”λœ μΆ”λ‘  엔진을 κ²½ν—˜ν•΄ λ³΄μ„Έμš”.

🌟 μ£Όμš” νŠΉμ§• (v0.1.4)

  • Blackwell μ΅œμ ν™”: ARM64/SM121 μ•„ν‚€ν…μ²˜ 및 CUDA 13.0 FP4 ν•˜λ“œμ›¨μ–΄ 가속 지원.
  • μ§€λŠ₯ν˜• μΆ”λ‘  UI: λͺ¨λΈμ˜ μƒκ°ν•˜λŠ” κ³Όμ •(CoT)을 μžλ™μœΌλ‘œ κ°μ§€ν•˜κ³  μ‹œκ°ν™”ν•©λ‹ˆλ‹€.
  • κ°„νŽΈν•œ 배포: GHCR 도컀 이미지λ₯Ό 톡해 ν™˜κ²½ μ„€μ • 없이 μ¦‰μ‹œ μ‹€ν–‰ κ°€λŠ₯ν•©λ‹ˆλ‹€.

🐳 μ‹€ν–‰ 방법

# μ΅œμ‹  μ΅œμ ν™” 이미지 λ‚΄λ €λ°›κΈ°
docker pull ghcr.io/sowilow/dgx-spark-llama.cpp-bench:v0.1.4

μƒμ„Έν•œ μ‚¬μš©λ²•μ€ GitHub 리포지토리λ₯Ό μ°Έμ‘°ν•˜μ„Έμš”.



πŸš€ Quick Start with Docker (Recommended)

You can easily run this model using the DGX-Spark-llama.cpp-Bench inference engine. It's pre-configured for high-performance inference on NVIDIA hardware (especially Blackwell/DGX Spark).

1. Pull the Docker Image

docker pull ghcr.io/sowilow/dgx-spark-llama.cpp-bench:latest

2. Run the Inference Server

For detailed configuration and usage, visit the GitHub Repository.


gpt-oss-20b-DGX-Spark-GGUF

gpt-oss-20b

This repository provides GGUF quantized versions of OpenAI's gpt-oss-20b, optimized specifically for NVIDIA Blackwell (DGX Spark) architectures.

These models were converted and quantized using llama.cpp with support for the gpt_oss architecture.

Model Highlights

  • Optimized for Blackwell: Specifically tuned for high-performance inference on NVIDIA DGX Spark (SM120/SM121).
  • Flexible Quantization:
    • Q4_MXFP4: 4-bit Medium quantization (recommended for efficiency).
    • Q8_0: 8-bit quantization (recommended for maximum precision).
  • MoE Architecture: 21B total parameters with 3.6B active parameters, leveraging Mixture-of-Experts for high efficiency.
  • Long Context: Supports up to 131k context length.

Quantization Details

File Quant Method Bitrate Size Description
gpt-oss-20b-q4_mxfp4.gguf Q4_MXFP4 4.5 bpw ~12 GB Balanced performance and quality.
gpt-oss-20b-q8_0.gguf Q8_0 8.5 bpw ~22 GB Standard 8-bit quantization.

Quick Start (llama.cpp)

To run these models on a DGX Spark system:

  1. Pull the optimized Docker image:

    docker pull ghcr.io/sowilow/dgx-spark-llama.cpp-bench:latest
    
  2. Run with llama-server:

    docker run --gpus all -v $(pwd)/models:/model \
        ghcr.io/sowilow/dgx-spark-llama.cpp-bench:latest \
        llama-server -m /model/gpt-oss-20b-q4_mxfp4.gguf -ngl 99 -c 8192
    

Original Model Information

This is a quantized version of openai/gpt-oss-20b. Please refer to the original model card for details on training, safety, and benchmarks.

Citation

@misc{openai2025gptoss120bgptoss20bmodel,
      title={gpt-oss-120b & gpt-oss-20b Model Card}, 
      author={OpenAI},
      year={2025},
      eprint={2508.10925},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2508.10925}, 
}
Downloads last month
343
GGUF
Model size
21B params
Architecture
gpt-oss
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for sowilow/gpt-oss-20b-DGX-Spark-GGUF

Quantized
(194)
this model

Collection including sowilow/gpt-oss-20b-DGX-Spark-GGUF

Paper for sowilow/gpt-oss-20b-DGX-Spark-GGUF