grpo-tax-qwen-3b-GGUF

Built with NEO — Your Autonomous AI Agent

GGUF quantized versions of Qwen2.5-3B-Instruct fine-tuned with GRPO (Group Relative Policy Optimization) on tax and financial reasoning tasks.

Model Details

Property	Value
Base Model	Qwen/Qwen2.5-3B-Instruct
Fine-tuning Method	GRPO (Group Relative Policy Optimization)
Domain	Tax & Financial Reasoning
Architecture	Qwen2
Context Length	32,768 tokens
Format	GGUF

Available Quantizations

File	Quantization	Size	Use Case
`grpo-tax-qwen-3b-Q4_K_M.gguf`	Q4_K_M	~2.0 GB	Best balance of speed and quality
`grpo-tax-qwen-3b-Q8_0.gguf`	Q8_0	~3.2 GB	Higher quality, more RAM required

Usage

With llama.cpp

# Download the model
huggingface-cli download daksh-neo/grpo-tax-qwen-3b-gguf grpo-tax-qwen-3b-Q4_K_M.gguf

# Run inference
./llama-cli -m grpo-tax-qwen-3b-Q4_K_M.gguf \
  -p "<|im_start|>system\nYou are a tax expert assistant.<|im_end|>\n<|im_start|>user\nWhat is the standard deduction for 2024?<|im_end|>\n<|im_start|>assistant\n" \
  -n 512 --temp 0.7

With Ollama

# Create a Modelfile
cat > Modelfile << 'EOF'
FROM ./grpo-tax-qwen-3b-Q4_K_M.gguf
TEMPLATE """<|im_start|>system
{{ .System }}<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""
SYSTEM "You are a helpful tax and financial assistant."
EOF

ollama create grpo-tax-qwen-3b -f Modelfile
ollama run grpo-tax-qwen-3b

With Python (llama-cpp-python)

from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="daksh-neo/grpo-tax-qwen-3b-gguf",
    filename="grpo-tax-qwen-3b-Q4_K_M.gguf",
    n_ctx=4096,
)

response = llm.create_chat_completion(
    messages=[
        {"role": "system", "content": "You are a helpful tax assistant."},
        {"role": "user", "content": "Explain what a W-2 form is."}
    ]
)
print(response["choices"][0]["message"]["content"])

Training Details

This model was fine-tuned using GRPO (Group Relative Policy Optimization), a reinforcement learning from human feedback (RLHF) variant that optimizes the model's responses on tax and financial reasoning tasks without requiring a separate reward model. GRPO trains by comparing groups of sampled responses and reinforcing higher-quality answers.

Training focus areas:

Federal and state tax regulations
Tax form interpretation (W-2, 1099, Schedule C, etc.)
Deductions and credits
Tax planning strategies
Financial compliance questions

Limitations

This model is fine-tuned on tax knowledge up to its training cutoff and may not reflect the latest tax law changes.
Always consult a qualified tax professional for official tax advice.
The model is not a substitute for professional legal or financial guidance.

Related Models

daksh-neo/grpo-tax-qwen-1.5b-gguf — Smaller 1.5B version for resource-constrained environments

License

Apache 2.0 — see Qwen2.5 license for base model terms.

Built with NEO — Your Autonomous AI Agent

Downloads last month: 144

GGUF

Model size

3B params

Architecture

qwen2

Hardware compatibility

4-bit

8-bit

Model tree for daksh-neo/grpo-tax-qwen-3b-gguf

Base model

Qwen/Qwen2.5-3B

Finetuned

Qwen/Qwen2.5-3B-Instruct

Quantized

(204)

this model