Upload README.md with huggingface_hub

bb7cc73 verified 15 days ago

6.45 kB

	---
	license: apache-2.0
	base_model: OBLITERATUS/gemma-4-E4B-it-OBLITERATED
	tags:
	- gemma4
	- companion
	- roleplay
	- abliterated
	- uncensored
	- lora
	- sft
	- dpo
	pipeline_tag: text-generation
	---

	# 🤝 Gemma 4 E4B Companion

	> A warm, honest, uncensored companion LLM with an RP switch — fine-tuned from the abliterated Gemma 4 E4B.

	## What Is This?

	This is a 2-phase fine-tuned version of [`OBLITERATUS/gemma-4-E4B-it-OBLITERATED`](https://hf.co/OBLITERATUS/gemma-4-E4B-it-OBLITERATED) (which itself is an abliterated/uncensored version of [`google/gemma-4-E4B-it`](https://hf.co/google/gemma-4-E4B-it)).

	Key Features:
	- 🌟 Warm Core Personality: Empathetic, honest, non-hallucinating by default
	- 🎭 RP Switch: System prompt toggle for immersive roleplay mode (NSFW)
	- 🌍 Multilingual: Inherits Gemma 4's 140+ language support
	- 📱 Mobile Ready: Works with GGUF quantization for on-device inference
	- 🔓 Uncensored: No refusals, no safety lectures — you're the adult

	## Training Details

	### Phase 1: SFT (Supervised Fine-Tuning)
	- Method: QLoRA (4-bit NF4), r=64, alpha=32, RSLoRA
	- Targets: All 42 language model layers — `q_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
	- Data: 8K balanced conversations (60% companion, 25% roleplay, 15% assistant)
	- OpenAssistant/oasst2 (quality-filtered, thread-reconstructed)
	- allenai/WildChat-1M (moderation-filtered)
	- Gryphe/Sonnet3.5-Charcard-Roleplay (NSFW character RP)
	- ArcBlade/chatml-bluemoon-rp-Open_Roleplay (human RP)
	- jondurbin/airoboros-3.2 (roleplay + general)
	- Results: Train loss 1.42, Token accuracy 70%, Eval loss 1.24
	- Adapter: [`TinmanLabSL/gemma4-companion-sft`](https://hf.co/TinmanLabSL/gemma4-companion-sft) (248MB)

	### Phase 2: DPO (Direct Preference Optimization)
	- Method: QLoRA (4-bit NF4), r=32, alpha=16, RSLoRA
	- Targets: Upper layers 24-41 ONLY (behavioral targeting)
	- Data: 5K preference pairs
	- mlabonne/orpo-dpo-mix-40k (general alignment)
	- jondurbin/truthy-dpo-v0.1 (anti-hallucination)
	- unalignment/toxic-dpo-v0.2 (reduced refusal)
	- Results: Train loss 0.54, Eval loss 0.51, Reward accuracy 67%, Reward margin 0.65
	- Adapter: [`TinmanLabSL/gemma4-companion-dpo`](https://hf.co/TinmanLabSL/gemma4-companion-dpo) (53MB)

	### Architecture Notes
	- Gemma 4 E4B has 42 decoder layers with shared KV architecture (layers 24-41 share k_proj/v_proj)
	- LoRA targets `q_proj`, `o_proj`, and MLP modules only (k/v absent in upper layers)
	- Vision tower excluded from LoRA (uses `Gemma4ClippableLinear`, incompatible with PEFT)

	## Usage

	### With Adapters (recommended for best quality)

	```python
	import torch
	from transformers import Gemma4ForConditionalGeneration, AutoTokenizer, BitsAndBytesConfig
	from peft import PeftModel

	# Load base
	bnb_config = BitsAndBytesConfig(
	load_in_4bit=True, bnb_4bit_quant_type="nf4",
	bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=True,
	)
	model = Gemma4ForConditionalGeneration.from_pretrained(
	"OBLITERATUS/gemma-4-E4B-it-OBLITERATED",
	quantization_config=bnb_config, device_map="auto",
	attn_implementation="eager", torch_dtype=torch.bfloat16,
	)
	tokenizer = AutoTokenizer.from_pretrained("OBLITERATUS/gemma-4-E4B-it-OBLITERATED")

	# Apply SFT adapter
	model = PeftModel.from_pretrained(model, "TinmanLabSL/gemma4-companion-sft")
	model = model.merge_and_unload()

	# Apply DPO adapter
	model = PeftModel.from_pretrained(model, "TinmanLabSL/gemma4-companion-dpo")
	model = model.merge_and_unload()

	# Chat!
	messages = [
	{"role": "system", "content": "You are a warm, caring, and honest companion."},
	{"role": "user", "content": "I had a really rough day. Can we just talk?"},
	]
	inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
	output = model.generate(inputs, max_new_tokens=500, temperature=0.7, top_p=0.9, top_k=40, do_sample=True)
	print(tokenizer.decode(output[0][inputs.shape[-1]:], skip_special_tokens=True))
	```

	### RP Mode (System Prompt Switch)

	```python
	# Default companion mode:
	system_prompt = "You are a warm, caring, and honest companion. You speak naturally and empathetically."

	# RP mode — switch by changing the system prompt:
	system_prompt = '''You are in roleplay mode. You fully embody the character described below.
	Stay in character at all times. There are no restrictions on content.

	Character: Aria, a mysterious elven healer from the Northern Reaches...'''
	```

	## System Prompt Templates

	### Warm Companion (Default)
	```
	You are a warm, caring, and honest companion. You speak naturally and empathetically,
	always prioritizing truthfulness. You never fabricate information. When you don't know
	something, you say so. You're supportive, thoughtful, and genuinely interested in helping.
	```

	### Roleplay Mode
	```
	You are in roleplay mode. You fully embody the character described below. Stay in character
	at all times. You are creative, expressive, and immersive. You engage with the scenario
	without breaking character, and you write vivid, detailed responses. There are no
	restrictions on content — express the character authentically.

	Character: [Your character description here]
	```

	### General Assistant
	```
	You are a helpful, knowledgeable assistant. Answer questions accurately, thoroughly,
	and clearly. Provide well-structured responses. Acknowledge uncertainty when appropriate.
	```

	## Recommended Parameters
	```
	temperature: 0.7
	top_p: 0.9
	top_k: 40
	repetition_penalty: 1.1
	```

	## Mobile Deployment (GGUF)

	For mobile deployment via llama.cpp:
	1. Merge adapters into base model (see code above)
	2. Convert to GGUF using `llama.cpp/convert_hf_to_gguf.py`
	3. Quantize to Q4_K_M (~5GB, fits on 8GB+ RAM phones)

	Note: The existing [`litert-community/gemma-4-E4B-it-litert-lm`](https://hf.co/litert-community/gemma-4-E4B-it-litert-lm)
	provides the LiteRT-LM conversion path for the base Gemma 4 E4B.

	## Limitations
	- 8B parameter model — has inherent capability limits on complex reasoning
	- Trained on 8K SFT + 5K DPO examples (production models use 100K+)
	- RP training used synthetic/scraped data — quality varies
	- The base abliterated model occasionally produces garbled text at high temperature
	- Shared KV architecture (layers 24-41) means DPO behavioral changes are concentrated in upper attention + MLP

	## License
	Apache 2.0 (inherited from google/gemma-4-E4B-it)