Llama 3 8B - PokerBench SFT

Fine-tuned Llama 3.1 8B Instruct for poker decision-making using LoRA, trained on PokerBench dataset.

Training Details

  • Base Model: Meta-Llama-3.1-8B-Instruct
  • Training Data: PokerBench (RZ412/PokerBench)
  • Method: LoRA fine-tuning (merged)
  • Training Steps: 5,000
  • Batch Size: 128
  • Learning Rate: 1e-6

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("YiPz/llama3-8b-pokerbench-sft")
tokenizer = AutoTokenizer.from_pretrained("YiPz/llama3-8b-pokerbench-sft")

messages = [
    {"role": "system", "content": "You are an expert poker player. Respond with your action in <action></action> tags."},
    {"role": "user", "content": "Your poker scenario..."}
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
outputs = model.generate(inputs, max_new_tokens=32, temperature=0.1)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Output Format

Actions are returned in <action></action> tags:

  • <action>fold</action>
  • <action>call</action>
  • <action>check</action>
  • <action>raise 15</action>
  • <action>bet 10</action>

GGUF Versions

Quantized GGUF versions for llama.cpp/Ollama: YiPz/llama3-8b-pokerbench-sft-gguf

License

Subject to Llama 3 license.

Downloads last month
1,598
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for YiPz/llama3-8b-pokerbench-sft

Finetuned
(2585)
this model
Quantizations
1 model

Dataset used to train YiPz/llama3-8b-pokerbench-sft