Qwen3-1.77B-g023-NF4 - NF4 (4-bit NormalFloat)

Overview

NF4 double-quantized version of g023/Qwen3-1.77B-g023, which is an optimized 29-layer variant of Qwen3-1.7B created by duplicating layer 21. Quantized using bitsandbytes for efficient GPU inference.

FIXED: Example INFERENCING in PYTHON (see LOWER DOWN IN THIS DOCUMENT + has sample output. You're welcome.). Sorry about that (I included a spicy prompt in the example as a token peace offering).

NOTE: This is an amazing model when you figure it out properly. You can disable thinking by sending no_think in the prompt, then it really screams.

NEW: Try this model with Triattention for massive KV cache savings as well: (https://github.com/g023/triattention_nf4)

Quantization Details

Parameter	Value
Quant Method	bitsandbytes
Quant Type	NF4 (NormalFloat 4-bit)
Double Quantization	Yes (quantizes quantization constants)
Compute Dtype	bfloat16
Quant Storage	uint8
Size on Disk	1.3 GB (vs 3.3 GB full precision)
Compression Ratio	~2.5x

Why NF4 + Double Quantization?

NF4 is information-theoretically optimal for normally distributed weights (which transformer weights approximate). It preserves more information per bit than INT4 or FP4.
Double quantization quantizes the quantization constants themselves, saving ~0.4 bits/param with negligible quality loss.
bfloat16 compute maintains numerical stability during inference while keeping weights in 4-bit.

Source Model Architecture

Parameter	Value
Layers	29 (28 original + layer 21 duplicated)
Hidden Size	2048
Intermediate Size	6144
Attention Heads	16 (query) / 8 (KV)
Head Dimension	128
Vocab Size	151,936
Total Parameters	~1.77B (effective ~1.04B in 4-bit)
Base Model	Qwen/Qwen3-1.7B

Source Model Performance (Pre-Quantization)

Metric	Baseline (28L)	LayerDup21 (29L)
Overall Score	85.9	93.6 (+7.7)
Factual Accuracy	7/9	9/9
Avg Perplexity	17.71	19.50
Thinking/Non-Thinking	Both OK	Both OK

Files

File	Size	Description
`model.safetensors`	1.3 GB	Quantized model weights (NF4)
`config.json`	2 KB	Model + quantization configuration
`tokenizer.json`	11 MB	Tokenizer
`tokenizer_config.json`	<1 KB	Tokenizer configuration
`chat_template.jinja`	4 KB	Chat template
`generation_config.json`	<1 KB	Generation defaults

Usage (EXAMPLE)

Requires bitsandbytes >= 0.41.0 and a CUDA GPU.

# Tweakable parameters
# MODEL_PATH = "./Qwen3-NF4"
MODEL_PATH = "g023/Qwen3-1.77B-g023-NF4"
MAX_NEW_TOKENS = 2000
TEMPERATURE = 0.7
DO_SAMPLE = True
TOP_P = 0.9
TOP_K = 50
REPETITION_PENALTY = 1.1
STREAMING = True  # Set to True for streaming inference
INPUT_MESSAGE = "You are completing the next step in a task to create an arcade game in javascript. Your available tools are rationalize, red_green_tdd, and create_plan. Synthesize their output when reasoning. "

import sys
import os
sys.path.append(os.path.dirname(__file__))

from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

def load_model():
    print("Loading model...")
    model = AutoModelForCausalLM.from_pretrained(
        MODEL_PATH,
        device_map="auto",
    )
    tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
    print("Model loaded.")
    return model, tokenizer

def inference_non_streaming(model, tokenizer, messages):
    text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=True)
    inputs = tokenizer(text, return_tensors="pt").to(model.device)
    outputs = model.generate(
        **inputs,
        max_new_tokens=MAX_NEW_TOKENS,
        temperature=TEMPERATURE,
        do_sample=DO_SAMPLE,
        top_p=TOP_P,
        top_k=TOP_K,
        repetition_penalty=REPETITION_PENALTY,
    )
    response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
    print("Response:", response)

def inference_streaming(model, tokenizer, messages):
    text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=True)
    inputs = tokenizer(text, return_tensors="pt").to(model.device)
    streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
    outputs = model.generate(
        **inputs,
        max_new_tokens=MAX_NEW_TOKENS,
        temperature=TEMPERATURE,
        do_sample=DO_SAMPLE,
        top_p=TOP_P,
        top_k=TOP_K,
        repetition_penalty=REPETITION_PENALTY,
        streamer=streamer,
    )

if __name__ == "__main__":
    model, tokenizer = load_model()
    messages = [{"role": "user", "content": INPUT_MESSAGE}]
    if STREAMING:
        inference_streaming(model, tokenizer, messages)
    else:
        inference_non_streaming(model, tokenizer, messages)

OUTPUT FROM THIS MODEL FOR THE EXAMPLE:


<think>
Okay, I need to create an arcade game in JavaScript. Let me think about how to approach this. The user mentioned using the tools rationalize, red_green_tdd, and create_plan. Hmm, these sounds like they're suggesting a process that's structured with tests first, then writing code.

So, maybe I should start by creating a plan. First, I need to outline the game's mechanics. What kind of game? Maybe a simple arcade game, perhaps something like a pong clone or a matching game. Since it's an arcade game, maybe something with quick reflexes.

Let me think about the steps. The Create Plan tool would help me break down the problem into smaller parts. So, I'll start by defining the game's core mechanics. Then, write unit tests for each part. Then implement the code step by step.

For example, if the game is a simple click-to-throw a ball that bounces off walls, the plan might include setting up the game canvas, handling user input, collision detection, and movement. Then, test each part incrementally.

But since the user hasn't specified the exact game, I should probably keep it general. Maybe start with a basic structure: setup the canvas, handle keyboard input, update game state, draw everything.

Wait, but the tools mention Rationalize, Red-Green TDD, and Create Plan. Rationalize might be referring to thinking through the problem before coding. Red-green TDD means writing tests first, then writing code based on those tests. Create Plan is outlining the steps.

So, the process would be:

1. Create a plan: define the game's components (game loop, player controls, collisions, scoring).
2. Write tests for each component.
3. Implement the components one by one, testing after each step.

For example, first set up the game canvas. Then, test that the canvas renders correctly. Then add player movement, test that. Then add collision detection, test that. Then add physics, test again.

I need to make sure each step is tested incrementally. So the initial plan would involve setting up the basics before adding features. Then, each new feature has its own test.

So, the answer should outline the plan, then the implementation steps with tests. Use RSpec-like tests, but in JS. For example, a test for rendering the game, then another for moving the player, etc.

Maybe the final code will have a game loop, event listeners for keys, collision checks, and rendering functions. Each part is tested as the code is written.

I should structure the thought process by first creating the plan, then the implementation steps with tests.
</think>

To build an arcade game in JavaScript using the provided tools (`rationalize`, `red_green_tdd`, and `create_plan`), follow this structured approach:

---

### **Step 1: Create a Plan**  
Define the game's core mechanics and modular components:
1. **Game Loop**: Update game state and render graphics.
2. **Player Controls**: Keyboard input for movement/shot.
3. **Collision Detection**: Bounce objects off walls/enemies.
4. **Scoring System**: Track player score.
5. **Game Over Condition**: End game when player hits a boundary.

**Plan Outline**:
```javascript
// Plan: Game Loop & Rendering
const canvas = document.getElementById('gameCanvas');
const ctx = canvas.getContext('2d');

// Game State
let player = { x: 100, y: 200, size: 20 };
let enemies = [];
let score = 0;
let gameOver = false;

// Game Loop
function gameLoop() {
  update();
  render();
  requestAnimationFrame(gameLoop);
}

// Update Game State
function update() {
  // Handle player movement
  // Check for collisions
  // Update enemy positions
}

// Render Game
function render() {
  ctx.clearRect(0, 0, canvas.width, canvas.height);
  drawPlayer();
  drawEnemies();
  drawScore();
}
```

---

### **Step 2: Write Tests (Red-Green TDD)**  
Write unit tests for critical components:

#### **Test 1: Canvas Setup**
```javascript
describe("Canvas Setup", () => {
  it("renders the game canvas", () => {
    const canvas = document.createElement('canvas');
    canvas.id = 'gameCanvas';
    canvas.style.display = 'block';
    document.body.appendChild(canvas);
    expect(canvas).toBeDefined();
  });
});
```

#### **Test 2: Player Movement**
```javascript
describe("Player Movement", () => {
  it("updates player position on key press", () => {
    const player = { x: 100, y: 200, size: 20 };
    window.addEventListener('keydown', () => player.x += 5);
    expect(player.x).toBe(105);
  });
});
```

---

### **Step 3: Implementation & Iterative Testing**  
Implement features one by one, testing after each change:

#### **1. Game Loop & Rendering**
Add the game loop and rendering logic as outlined earlier.

#### **2. Player Movement**
Add keyboard listeners and update the player’s position.

#### **3. Collision Detection**
Add boundaries and detect collisions between the player and enemies.

#### **4. Scoring System**
Track score and display it on the screen.

#### **5. Game Over Condition**
Trigger a game over when the player hits the boundary.

---

### **Final Code Structure**  
Combine all components into a working game:

```javascript
const canvas = document.getElementById('gameCanvas');
const ctx = canvas.getContext('2d');

let player = { x: 100, y: 200, size: 20 };
let enemies = [];
let score = 0;
let gameOver = false;

const winWidth = 800;
const winHeight = 600;

// Game Loop
function gameLoop() {
  update();
  render();
  requestAnimationFrame(gameLoop);
}

// Update Game State
function update() {
  // Handle player movement
  if (keys['left']) player.x -= 5;
  if (keys['right']) player.x += 5;
  if (keys['up']) player.y -= 5;
  if (keys['down']) player.y += 5;

  // Boundary check
  if (player.x < 0 || player.x + player.size > winWidth) {
    gameOver = true;
  }
  if (player.y < 0 || player.y + player.size > winHeight) {
    gameOver = true;
  }

  // Enemy movement (simplified)
  enemies.forEach(enemy => {
    enemy.x += 1;
    if (enemy.x + enemy.size > winWidth) {
      enemy.x = -100; // Reset
    }
  });

  // Collision detection
  enemies.forEach((enemy, index) => {
    if (
      player.x + player.size > enemy.x &&
      player.x < enemy.x + enemy.size &&
      player.y + player.size > enemy.y &&
      player.y < enemy.y + enemy.size
    ) {
      gameOver = true;
      enemies.splice(index, 1);
    }
  });
}

// Render Game
function render() {
  ctx.fillStyle = '#fff';
  ctx.fillRect(0, 0, winWidth, winHeight);

  // Draw player
  ctx.fillStyle = '#0f0';
  ctx.beginPath();
  ctx.arc(player.x, player.y, player.size / 2, 0, Math.PI * 2);
  ctx.fill();

  // Draw enemies
  enemies.forEach((enemy, index) => {
    ctx.fillStyle = '#ff0';
    ctx.beginPath();
    ctx.arc(enemy.x, enemy.y, enemy.size / 2, 0, Math.PI * 2);
    ctx.fill();
  });

  // Draw score
  ctx.fillStyle = '#000';
  ctx.fillText(`Score: ${score}`, 10, 20);
}

// Key bindings
const keys = {};
document.addEventListener('keydown', e => keys[e.key] = true);
document.addEventListener('keyup', e => keys[e.key] = false);

// Start game loop
gameLoop();
```

---

### **Conclusion**  
This approach ensures incremental development with tests guiding each step. The game grows from simple interactions to complex mechanics, ensuring robustness and test coverage.

Requirements

Python >= 3.8
PyTorch >= 2.0 with CUDA
transformers >= 4.36.0
bitsandbytes >= 0.41.0
NVIDIA GPU with >= 4 GB VRAM

Base Model

Model: Qwen/Qwen3-1.7B
License: Apache 2.0

Downloads last month: 185

Safetensors

Model size

2B params

Tensor type

F32

BF16

Model tree for g023/Qwen3-1.77B-g023-NF4

Base model

Qwen/Qwen3-1.7B-Base

Finetuned

Qwen/Qwen3-1.7B

Quantized

(254)

this model