grpo-tax-qwen-3b-GGUF
Built with NEO โ Your Autonomous AI Agent
GGUF quantized versions of Qwen2.5-3B-Instruct fine-tuned with GRPO (Group Relative Policy Optimization) on tax and financial reasoning tasks.
Model Details
| Property | Value |
|---|---|
| Base Model | Qwen/Qwen2.5-3B-Instruct |
| Fine-tuning Method | GRPO (Group Relative Policy Optimization) |
| Domain | Tax & Financial Reasoning |
| Architecture | Qwen2 |
| Context Length | 32,768 tokens |
| Format | GGUF |
Available Quantizations
| File | Quantization | Size | Use Case |
|---|---|---|---|
grpo-tax-qwen-3b-Q4_K_M.gguf |
Q4_K_M | ~2.0 GB | Best balance of speed and quality |
grpo-tax-qwen-3b-Q8_0.gguf |
Q8_0 | ~3.2 GB | Higher quality, more RAM required |
Usage
With llama.cpp
# Download the model
huggingface-cli download daksh-neo/grpo-tax-qwen-3b-gguf grpo-tax-qwen-3b-Q4_K_M.gguf
# Run inference
./llama-cli -m grpo-tax-qwen-3b-Q4_K_M.gguf \
-p "<|im_start|>system\nYou are a tax expert assistant.<|im_end|>\n<|im_start|>user\nWhat is the standard deduction for 2024?<|im_end|>\n<|im_start|>assistant\n" \
-n 512 --temp 0.7
With Ollama
# Create a Modelfile
cat > Modelfile << 'EOF'
FROM ./grpo-tax-qwen-3b-Q4_K_M.gguf
TEMPLATE """<|im_start|>system
{{ .System }}<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""
SYSTEM "You are a helpful tax and financial assistant."
EOF
ollama create grpo-tax-qwen-3b -f Modelfile
ollama run grpo-tax-qwen-3b
With Python (llama-cpp-python)
from llama_cpp import Llama
llm = Llama.from_pretrained(
repo_id="daksh-neo/grpo-tax-qwen-3b-gguf",
filename="grpo-tax-qwen-3b-Q4_K_M.gguf",
n_ctx=4096,
)
response = llm.create_chat_completion(
messages=[
{"role": "system", "content": "You are a helpful tax assistant."},
{"role": "user", "content": "Explain what a W-2 form is."}
]
)
print(response["choices"][0]["message"]["content"])
Training Details
This model was fine-tuned using GRPO (Group Relative Policy Optimization), a reinforcement learning from human feedback (RLHF) variant that optimizes the model's responses on tax and financial reasoning tasks without requiring a separate reward model. GRPO trains by comparing groups of sampled responses and reinforcing higher-quality answers.
Training focus areas:
- Federal and state tax regulations
- Tax form interpretation (W-2, 1099, Schedule C, etc.)
- Deductions and credits
- Tax planning strategies
- Financial compliance questions
Limitations
- This model is fine-tuned on tax knowledge up to its training cutoff and may not reflect the latest tax law changes.
- Always consult a qualified tax professional for official tax advice.
- The model is not a substitute for professional legal or financial guidance.
Related Models
- daksh-neo/grpo-tax-qwen-1.5b-gguf โ Smaller 1.5B version for resource-constrained environments
License
Apache 2.0 โ see Qwen2.5 license for base model terms.
Built with NEO โ Your Autonomous AI Agent
- Downloads last month
- 144
Hardware compatibility
Log In to add your hardware
4-bit
8-bit