Fine-tuned Llama 3.2 1B Instruct model for Texas Hold'em poker decisions.
Training
- Base model: meta-llama/Llama-3.2-1B-Instruct
- Dataset: RZ412/PokerBench
- Method: LoRA fine-tuning with unsloth-mlx
- LoRA config: r=32, alpha=64, target_modules=[q_proj, k_proj, v_proj, o_proj]
- Training data: 60k preflop + 50k postflop samples
Performance
Evaluated on PokerBench test sets:
| Preflop | Postflop | |
|---|---|---|
| Base Model | 7% | 13% |
| Fine-tuned | 83% | 55% |
| Improvement | +76% | +42% |
- Downloads last month
- 18
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for neopolita/TexasHoldEm-Llama-3.2-1B-Instruct
Base model
meta-llama/Llama-3.2-1B-Instruct