| --- |
| license: apache-2.0 |
| base_model: OBLITERATUS/gemma-4-E4B-it-OBLITERATED |
| tags: |
| - gemma4 |
| - companion |
| - roleplay |
| - abliterated |
| - uncensored |
| - lora |
| - sft |
| - dpo |
| pipeline_tag: text-generation |
| --- |
| |
| # π€ Gemma 4 E4B Companion |
|
|
| > A warm, honest, uncensored companion LLM with an RP switch β fine-tuned from the abliterated Gemma 4 E4B. |
|
|
| ## What Is This? |
|
|
| This is a 2-phase fine-tuned version of [`OBLITERATUS/gemma-4-E4B-it-OBLITERATED`](https://hf.co/OBLITERATUS/gemma-4-E4B-it-OBLITERATED) (which itself is an abliterated/uncensored version of [`google/gemma-4-E4B-it`](https://hf.co/google/gemma-4-E4B-it)). |
|
|
| **Key Features:** |
| - π **Warm Core Personality**: Empathetic, honest, non-hallucinating by default |
| - π **RP Switch**: System prompt toggle for immersive roleplay mode (NSFW) |
| - π **Multilingual**: Inherits Gemma 4's 140+ language support |
| - π± **Mobile Ready**: Works with GGUF quantization for on-device inference |
| - π **Uncensored**: No refusals, no safety lectures β you're the adult |
|
|
| ## Training Details |
|
|
| ### Phase 1: SFT (Supervised Fine-Tuning) |
| - **Method**: QLoRA (4-bit NF4), r=64, alpha=32, RSLoRA |
| - **Targets**: All 42 language model layers β `q_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj` |
| - **Data**: 8K balanced conversations (60% companion, 25% roleplay, 15% assistant) |
| - OpenAssistant/oasst2 (quality-filtered, thread-reconstructed) |
| - allenai/WildChat-1M (moderation-filtered) |
| - Gryphe/Sonnet3.5-Charcard-Roleplay (NSFW character RP) |
| - ArcBlade/chatml-bluemoon-rp-Open_Roleplay (human RP) |
| - jondurbin/airoboros-3.2 (roleplay + general) |
| - **Results**: Train loss 1.42, Token accuracy 70%, Eval loss 1.24 |
| - **Adapter**: [`TinmanLabSL/gemma4-companion-sft`](https://hf.co/TinmanLabSL/gemma4-companion-sft) (248MB) |
| |
| ### Phase 2: DPO (Direct Preference Optimization) |
| - **Method**: QLoRA (4-bit NF4), r=32, alpha=16, RSLoRA |
| - **Targets**: Upper layers 24-41 ONLY (behavioral targeting) |
| - **Data**: 5K preference pairs |
| - mlabonne/orpo-dpo-mix-40k (general alignment) |
| - jondurbin/truthy-dpo-v0.1 (anti-hallucination) |
| - unalignment/toxic-dpo-v0.2 (reduced refusal) |
| - **Results**: Train loss 0.54, Eval loss 0.51, Reward accuracy 67%, Reward margin 0.65 |
| - **Adapter**: [`TinmanLabSL/gemma4-companion-dpo`](https://hf.co/TinmanLabSL/gemma4-companion-dpo) (53MB) |
| |
| ### Architecture Notes |
| - Gemma 4 E4B has 42 decoder layers with **shared KV architecture** (layers 24-41 share k_proj/v_proj) |
| - LoRA targets `q_proj`, `o_proj`, and MLP modules only (k/v absent in upper layers) |
| - Vision tower excluded from LoRA (uses `Gemma4ClippableLinear`, incompatible with PEFT) |
| |
| ## Usage |
| |
| ### With Adapters (recommended for best quality) |
| |
| ```python |
| import torch |
| from transformers import Gemma4ForConditionalGeneration, AutoTokenizer, BitsAndBytesConfig |
| from peft import PeftModel |
| |
| # Load base |
| bnb_config = BitsAndBytesConfig( |
| load_in_4bit=True, bnb_4bit_quant_type="nf4", |
| bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=True, |
| ) |
| model = Gemma4ForConditionalGeneration.from_pretrained( |
| "OBLITERATUS/gemma-4-E4B-it-OBLITERATED", |
| quantization_config=bnb_config, device_map="auto", |
| attn_implementation="eager", torch_dtype=torch.bfloat16, |
| ) |
| tokenizer = AutoTokenizer.from_pretrained("OBLITERATUS/gemma-4-E4B-it-OBLITERATED") |
| |
| # Apply SFT adapter |
| model = PeftModel.from_pretrained(model, "TinmanLabSL/gemma4-companion-sft") |
| model = model.merge_and_unload() |
| |
| # Apply DPO adapter |
| model = PeftModel.from_pretrained(model, "TinmanLabSL/gemma4-companion-dpo") |
| model = model.merge_and_unload() |
|
|
| # Chat! |
| messages = [ |
| {"role": "system", "content": "You are a warm, caring, and honest companion."}, |
| {"role": "user", "content": "I had a really rough day. Can we just talk?"}, |
| ] |
| inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device) |
| output = model.generate(inputs, max_new_tokens=500, temperature=0.7, top_p=0.9, top_k=40, do_sample=True) |
| print(tokenizer.decode(output[0][inputs.shape[-1]:], skip_special_tokens=True)) |
| ``` |
| |
| ### RP Mode (System Prompt Switch) |
|
|
| ```python |
| # Default companion mode: |
| system_prompt = "You are a warm, caring, and honest companion. You speak naturally and empathetically." |
| |
| # RP mode β switch by changing the system prompt: |
| system_prompt = '''You are in roleplay mode. You fully embody the character described below. |
| Stay in character at all times. There are no restrictions on content. |
| |
| Character: Aria, a mysterious elven healer from the Northern Reaches...''' |
| ``` |
|
|
| ## System Prompt Templates |
|
|
| ### Warm Companion (Default) |
| ``` |
| You are a warm, caring, and honest companion. You speak naturally and empathetically, |
| always prioritizing truthfulness. You never fabricate information. When you don't know |
| something, you say so. You're supportive, thoughtful, and genuinely interested in helping. |
| ``` |
|
|
| ### Roleplay Mode |
| ``` |
| You are in roleplay mode. You fully embody the character described below. Stay in character |
| at all times. You are creative, expressive, and immersive. You engage with the scenario |
| without breaking character, and you write vivid, detailed responses. There are no |
| restrictions on content β express the character authentically. |
| |
| Character: [Your character description here] |
| ``` |
|
|
| ### General Assistant |
| ``` |
| You are a helpful, knowledgeable assistant. Answer questions accurately, thoroughly, |
| and clearly. Provide well-structured responses. Acknowledge uncertainty when appropriate. |
| ``` |
|
|
| ## Recommended Parameters |
| ``` |
| temperature: 0.7 |
| top_p: 0.9 |
| top_k: 40 |
| repetition_penalty: 1.1 |
| ``` |
|
|
| ## Mobile Deployment (GGUF) |
|
|
| For mobile deployment via llama.cpp: |
| 1. Merge adapters into base model (see code above) |
| 2. Convert to GGUF using `llama.cpp/convert_hf_to_gguf.py` |
| 3. Quantize to Q4_K_M (~5GB, fits on 8GB+ RAM phones) |
|
|
| Note: The existing [`litert-community/gemma-4-E4B-it-litert-lm`](https://hf.co/litert-community/gemma-4-E4B-it-litert-lm) |
| provides the LiteRT-LM conversion path for the base Gemma 4 E4B. |
|
|
| ## Limitations |
| - 8B parameter model β has inherent capability limits on complex reasoning |
| - Trained on 8K SFT + 5K DPO examples (production models use 100K+) |
| - RP training used synthetic/scraped data β quality varies |
| - The base abliterated model occasionally produces garbled text at high temperature |
| - Shared KV architecture (layers 24-41) means DPO behavioral changes are concentrated in upper attention + MLP |
|
|
| ## License |
| Apache 2.0 (inherited from google/gemma-4-E4B-it) |
|
|