Qwen3-4B-Opus-Reasoning-SFT
Trained with Clawpathy using the Tinker platform.
Training Details
| Parameter | Value |
|---|---|
| Base model | Qwen/Qwen3-4B-Instruct-2507 |
| Method | Supervised Fine-Tuning |
| Dataset | nohurry/Opus-4.6-Reasoning-3000x-filtered |
| LoRA rank | 32 |
| Learning rate | 0.00018 |
| Steps | 50 |
| Final loss | 1.236125 |
| Tinker sampler path | tinker://332646c7-9a78-5c49-a58c-3ca971b142ce:train:0/sampler_weights/clawpathy-c942fbe6-f6b-final |
| Tinker state path | tinker://332646c7-9a78-5c49-a58c-3ca971b142ce:train:0/weights/clawpathy-c942fbe6-f6b-state |
Evaluation
{ "job_id": "c942fbe6-f6b", "model": "Trained: Qwen/Qwen3-4B-Instruct-2507 (job c942fbe6-f6b)", "eval_type": "benchmark", "benchmarks": [ "Dataset SFT-match (50)", "MMLU (0-shot, 200)" ], "benchmark_names": [ "Dataset SFT-match (50)", "MMLU (0-shot, 200)" ], "domain": "general", "accuracy": 0.21, "metrics": { "dataset_sft_match/token_f1": 0.3774, "dataset_sft_match/exact_match": 0.0, "dataset_sft_match/evaluated": 50, "dataset_sft_match/split": "train", "dataset_sft_match/format": "instruction", "dataset_sft_match/examples": [ { "expected": "# Solution: Calculating Bus Rental and Toll Costs\n\n## Step 1: Find the total number of people going on the field trip\n\n- Fifth-grade students: 252\n- Teachers: 8\n- Total people:", "output": "## Solution\n\nStep 1: Calculate the total number of people going on the trip\n\n$$252 \text{ students} + 8 \text{ teachers} = 260 \text{ people}$$\n\nStep 2: Determine how many bu", "token_f1": 0.3976, "exact": false }, { "expected": "## Missing Problem Description\n\nYour request appears to be incomplete. You've provided the flushing instructions for an interactive problem, but the actual problem descriptio", "output": "I notice you've provided instructions for an interactive problem, but you haven't actually included the problem statement or the specific question you'd like me to solve.\n\nCould yo", "token_f1": 0.2332, "exact": false }, { "expected": "Answer: no\n\nExplanation:\n\nThe premise states that "Thorn thought the same thing." For this statement to make sense, there must be another person (or people) who had this th", "output": "## Solution\n\nStep 1: Analyze the premise\n- "Apparently Thorn thought the same thing" indicates that Thorn's thoughts aligned with someone else's thoughts\n- The word "same" impl", "token_f1": 0.4814, "exact": false }, { "expected": "## Solution\n\nThe question asks for automated software specifically designed to create energy profile diagrams.\n\n### Analysis of Options:\n\n| Option | Software | Type | Automated", "output": "# Automated Software for Energy Profile Diagrams\n\n## Analysis of Each Option\n\nA: TikZ - This is a LaTeX package for creating graphics. While it can produce energy profile diagr", "token_f1": 0.391, "exact": false }, { "expected": "No, these sentences are not paraphrases of each other. \n\nA paraphrase would convey the same meaning or information using different wording. However, these two sentences describe en", "output": "## Analysis\n\nLet me compare these two sentences to determine if they are paraphrases.\n\nSentence 1: "In Ventura County, the number of closings and advisories dropped by 73 perce", "token_f1": 0.3219, "exact": false }, { "expected": "The answer is C (contradiction).\n\nThe two sentences directly contradict each other. Sentence 1 explicitly states that the woman is wearing a WHITE dress while riding a bicycle. How", "output": "# Analysis of Sentence Agreement\n\n## Step 1: Identify the key information in each sentence\n\nSentence 1: "A woman wearing a white dress riding a bicycle."\n- Subject: A woman\n- C", "token_f1": 0.3739, "exact": false }, { "expected": "# Article Based on Provided Highlights\n\n## Clinton to Suspend Presidential Campaign Saturday, Sources Say\n\nIn a significant development in the Democratic presidential race, sources", "output": "# Clinton Suspends Presidential Campaign Amid Democratic Unity Efforts\n\n## Key Developments in the 2016 Democratic Primary Race\n\nWASHINGTON, D.C. \u2014 Democratic presidential nomi", "token_f1": 0.4765, "exact": false }, { "expected": "# Problem Analysis\n\nBased on the example provided, I can identify this as a modular arithmetic transformation problem.\n\n## Problem Statement (Reconstructed)\n\nGiven:\n- A modulus", "output": "I notice you've provided an example but haven't actually asked a question. Let me analyze what this example demonstrates:\n\n## What this example shows:\n\nProblem Type: This appea", "token_f1": 0.1667, "exact": false }, { "expected": "It appears you've provided the output format for a competitive programming problem, but I'm missing the actual problem statement. To give you a detailed solution with verification,", "output": "I notice that your message appears to be a partial problem statement, but it's incomplete. You've provided:\n\n- A conditional structure: "Otherwise, print an integer m..."\n- A descr", "token_f1": 0.4075, "exact": false }, { "expected": "Based on the context provided, the answer is:\n\nAll federal funds, including grants, loans, unions and employers who accepted taxpayer funds, and other forms of financial assist", "output": "# Answer Extraction\n\n## Step-by-Step Analysis\n\nStep 1: Identify the question\n- "What was forced to comply?"\n\nStep 2: Locate relevant information in the context\n- The key se", "token_f1": 0.5455, "exact": false } ], "mmlu/accuracy": 0.21, "mmlu/evaluated": 200, "mmlu/correct": 42, "mmlu/split": "validation", "mmlu/limit": 200, "mmlu/examples": [ { "question": "The cyclic subgroup of Z_24 generated by 18 has order", "gold": "A", "pred": "A", "raw": "# Solution\n\n## Step 1: Understanding the Problem\n\nWe need to find the order of t", "correct": true }, { "question": "Find the order of the factor group Z_6/<3>.", "gold": "B", "pred": "C", "raw": "# Solution: Order of the Factor Group Z_6/<3>\n\n## Step 1: Understand the problem", "correct": false }, { "question": "Statement 1 | A permutation that is a product of m even permutations and n odd permutations is an even permutation if and only if n is even. Statement 2 | Every", "gold": "A", "pred": "A", "raw": "# Solution\n\n## Statement 1 Analysis\n\nStatement 1: A permutation that is a pr", "correct": true }, { "question": "Find the order of the factor group (Z_4 x Z_12)/(<2> x <2>)", "gold": "C", "pred": "A", "raw": "# Solution\n\n## Step 1: Understand the problem\nWe need to find the order of the f", "correct": false }, { "question": "Find the maximum possible order for some element of Z_4 x Z_6.", "gold": "C", "pred": "A", "raw": "# Finding the Maximum Order in Z_4 \u00d7 Z_6\n\n## Step 1: Understanding the Problem\n\n", "correct": false }, { "question": "Statement 1 | The symmetric group S_3 is cyclic. Statement 2 | Every group is isomorphic to some group of permutations.", "gold": "D", "pred": "A", "raw": "# Solution\n\n## Statement 1: "The symmetric group S_3 is cyclic."\n\nAnalysis:\n", "correct": false }, { "question": "Statement 1 | If a and b are elements of finite order in an Abelian group, then |ab| is the lcm (|a|,|b|). Statement 2 | If g is a group element and g^n = e, th", "gold": "B", "pred": "A", "raw": "# Verification of Group Theory Statements\n\n## Statement 1 Analysis\nStatement 1", "correct": false }, { "question": "Statement 1 | If f is a homomorphism from G to K and H is normal in G then f(H) is normal in K. Statement 2 | If f is a homomorphism from G to a group and H is ", "gold": "D", "pred": "A", "raw": "# Solution\n\n## Statement 1 Analysis\nIf f is a homomorphism from G to K and H i", "correct": false }, { "question": "Find the maximum possible order for an element of S_n for n = 7.", "gold": "B", "pred": "A", "raw": "# Solution: Finding the Maximum Order of an Element in S\u2087\n\n## Step 1: Understand", "correct": false }, { "question": "Statement 1 | Every integral domain has a field of quotients. Statement 2 | A polynomial of degree n over a ring can have at most n zeros counting multiplicity.", "gold": "C", "pred": "A", "raw": "# Solution\n\n## Statement 1: Every integral domain has a field of quotients\n\n**TR", "correct": false } ] }, "results": [ { "benchmark": "dataset_sft_match/token_f1", "score": 0.3774 }, { "benchmark": "dataset_sft_match/exact_match", "score": 0.0 }, { "benchmark": "dataset_sft_match/evaluated", "score": 50.0 }, { "benchmark": "mmlu/accuracy", "score": 0.21 }, { "benchmark": "mmlu/evaluated", "score": 200.0 }, { "benchmark": "mmlu/correct", "score": 42.0 }, { "benchmark": "mmlu/limit", "score": 200.0 } ], "code_eval": null, "stored_eval": null, "dataset_id": "nohurry/Opus-4.6-Reasoning-3000x-filtered", "status": "completed", "requested_eval_tasks": [], "executed_eval_tasks": [ "dataset_sft_match", "internal/mmlu_0_shot" ], "task_errors": {}, "is_proxy": false, "is_complete": true }
Model tree for g30rv17ys/Qwen3-4B-Opus-Reasoning-SFT
Base model
Qwen/Qwen3-4B-Instruct-2507