YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

XAT928/gemma-3-1b-it-jp-lora-20250909

LoRA adapter for google/gemma-3-1b-it (Japanese SFT).

Summary

  • Base model: google/gemma-3-1b-it
  • Adapter type: LoRA (PEFT; saved via save_pretrained)
  • Train steps: 200 (early stop by max_steps)
  • BF16: True
  • Dataset (after preprocessing):
    • izumi-lab/llm-japanese-dataset-vanilla:train(解決後 usable 281,334)
    • ローカル ichikara-instruction-003-003-1.json は 7,599 行破損で usable 0 → スキップ
  • Tokenizer note: pad_token = eos_token, eos_token_id は 1 に更新
  • Save path: /content/out_stage1/adapter
  • Exported: 2025-09-09 07:06:33

Usage

Load adapter with PEFT

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

base = AutoModelForCausalLM.from_pretrained("google/gemma-3-1b-it", torch_dtype=torch.bfloat16)
tok  = AutoTokenizer.from_pretrained("google/gemma-3-1b-it", use_fast=True)

model = PeftModel.from_pretrained(base, "XAT928/gemma-3-1b-it-jp-lora-20250909")
model.eval()

prompt = "次の問いに丁寧で簡潔に答えてください。\n\nQ: 富士山の標高は?"
inputs = tok(prompt, return_tensors='pt').to(model.device)
with torch.no_grad():
    out = model.generate(**inputs, max_new_tokens=128)
print(tok.decode(out[0], skip_special_tokens=True))

Training (facts)

  • LR: 1.5e-4
  • per_device_train_batch_size: 1
  • grad_accumulation_steps: 32
  • max_seq_len: 2048
  • max_steps: 200
  • optimizer: paged_adamw_8bit
  • gradient_checkpointing: True
  • seed: 42

Limitations

  • This repo contains LoRA adapter only. Use with google/gemma-3-1b-it.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support