grpo-tax-qwen-3b-GGUF

Built with NEO โ€” Your Autonomous AI Agent

GGUF quantized versions of Qwen2.5-3B-Instruct fine-tuned with GRPO (Group Relative Policy Optimization) on tax and financial reasoning tasks.

Model Details

Property Value
Base Model Qwen/Qwen2.5-3B-Instruct
Fine-tuning Method GRPO (Group Relative Policy Optimization)
Domain Tax & Financial Reasoning
Architecture Qwen2
Context Length 32,768 tokens
Format GGUF

Available Quantizations

File Quantization Size Use Case
grpo-tax-qwen-3b-Q4_K_M.gguf Q4_K_M ~2.0 GB Best balance of speed and quality
grpo-tax-qwen-3b-Q8_0.gguf Q8_0 ~3.2 GB Higher quality, more RAM required

Usage

With llama.cpp

# Download the model
huggingface-cli download daksh-neo/grpo-tax-qwen-3b-gguf grpo-tax-qwen-3b-Q4_K_M.gguf

# Run inference
./llama-cli -m grpo-tax-qwen-3b-Q4_K_M.gguf \
  -p "<|im_start|>system\nYou are a tax expert assistant.<|im_end|>\n<|im_start|>user\nWhat is the standard deduction for 2024?<|im_end|>\n<|im_start|>assistant\n" \
  -n 512 --temp 0.7

With Ollama

# Create a Modelfile
cat > Modelfile << 'EOF'
FROM ./grpo-tax-qwen-3b-Q4_K_M.gguf
TEMPLATE """<|im_start|>system
{{ .System }}<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""
SYSTEM "You are a helpful tax and financial assistant."
EOF

ollama create grpo-tax-qwen-3b -f Modelfile
ollama run grpo-tax-qwen-3b

With Python (llama-cpp-python)

from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="daksh-neo/grpo-tax-qwen-3b-gguf",
    filename="grpo-tax-qwen-3b-Q4_K_M.gguf",
    n_ctx=4096,
)

response = llm.create_chat_completion(
    messages=[
        {"role": "system", "content": "You are a helpful tax assistant."},
        {"role": "user", "content": "Explain what a W-2 form is."}
    ]
)
print(response["choices"][0]["message"]["content"])

Training Details

This model was fine-tuned using GRPO (Group Relative Policy Optimization), a reinforcement learning from human feedback (RLHF) variant that optimizes the model's responses on tax and financial reasoning tasks without requiring a separate reward model. GRPO trains by comparing groups of sampled responses and reinforcing higher-quality answers.

Training focus areas:

  • Federal and state tax regulations
  • Tax form interpretation (W-2, 1099, Schedule C, etc.)
  • Deductions and credits
  • Tax planning strategies
  • Financial compliance questions

Limitations

  • This model is fine-tuned on tax knowledge up to its training cutoff and may not reflect the latest tax law changes.
  • Always consult a qualified tax professional for official tax advice.
  • The model is not a substitute for professional legal or financial guidance.

Related Models

License

Apache 2.0 โ€” see Qwen2.5 license for base model terms.


Built with NEO โ€” Your Autonomous AI Agent
Downloads last month
144
GGUF
Model size
3B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for daksh-neo/grpo-tax-qwen-3b-gguf

Base model

Qwen/Qwen2.5-3B
Quantized
(204)
this model