sh-dheeraj
/

phi3-grown-chat

Safetensors

Model card Files Files and versions

xet

Community

mark smith commited on Feb 17

Commit

b0abe5f

verified ·

1 Parent(s): 719724b

Update README.md

Browse files

Files changed (1) hide show

README.md +119 -22

README.md CHANGED Viewed

@@ -1,22 +1,119 @@
----
-base_model: unsloth/phi-3-mini-4k-instruct-bnb-4bit
-tags:
-- text-generation-inference
-- transformers
-- unsloth
-- mistral
-- trl
-license: apache-2.0
-language:
-- en
----
-# Uploaded  model
-- **Developed by:** mark1316
-- **License:** apache-2.0
-- **Finetuned from model :** unsloth/phi-3-mini-4k-instruct-bnb-4bit
-This mistral model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth)
-[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

+# Phi-3 Grown Chat Model (Continual LoRA Adaptation)
+![Phi-3 Mini](https://huggingface.co/unsloth/Phi-3-mini-4k-instruct/resolve/main/thumbnail.png)
+**A custom continual-learning chat model based on Phi-3-mini-4k-instruct**
+Trained with sequential LoRA adapters to simulate "growing new neuron connections" for each learning phase — **no catastrophic forgetting**!
+- **Base Model**: [unsloth/Phi-3-mini-4k-instruct](https://huggingface.co/unsloth/Phi-3-mini-4k-instruct) (3.82B parameters)
+- **Total Effective Size**: ~4.1B parameters (base + ~360M from 3 stacked LoRA adapters)
+- **Dataset**: [HuggingFaceH4/ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k) – one of the best high-quality multi-turn conversation datasets
+- **Training Method**: Continual learning via sequential LoRA (adds new trainable connections per phase while freezing previous knowledge)
+- **Phases**:
+  1. General Chat
+  2. Reasoning & Q&A
+  3. Roleplay & Long Context
+This model excels at natural conversation, reasoning, creative roleplay, and following instructions. It's efficient (4-bit quantized) and runs fast even on consumer GPUs.
+## Quick Start / Inference
+### Installation (One-Time Setup)
+```bash
+# Install Unsloth (fastest for Phi-3 + LoRA inference)
+pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
+pip install --no-deps xformers trl peft accelerate bitsandbytes
+Run Inference (Chat with the Model)
+from unsloth import FastLanguageModel
+import torch
+# Load the model (4-bit for efficiency)
+model, tokenizer = FastLanguageModel.from_pretrained(
+    "yourusername/phi3-grown-chat",  # Replace with your HF repo (or local path: "./phi3-grown-chat-model")
+    dtype = None,                    # Auto-detect (float16/bf16)
+    load_in_4bit = True,             # Saves VRAM
+)
+# Enable fast inference
+FastLanguageModel.for_inference(model)
+# Chat loop example
+while True:
+    user_input = input("You: ")
+    if user_input.lower() in ["exit", "quit"]:
+        break
+    messages = [{"role": "user", "content": user_input}]
+    inputs = tokenizer.apply_chat_template(
+        messages,
+        tokenize=True,
+        add_generation_prompt=True,
+        return_tensors="pt"
+    ).to("cuda")
+    outputs = model.generate(
+        input_ids=inputs,
+        max_new_tokens=512,
+        temperature=0.8,
+        do_sample=True,
+        top_p=0.95,
+    )
+    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+    # Extract only assistant response
+    print("Assistant:", response.split("<|assistant|>")[1].strip() if "<|assistant|>" in response else response)
+Example Prompts to Test
+"Hello! Tell me a fun fact about space."
+"Explain quantum computing like I'm 10 years old."
+"You are a pirate captain. Tell me about your greatest adventure."
+"Write a Python function to check if a number is prime."
+Long context: Paste a paragraph and ask questions about it.
+Training Details (How It Was Built)
+This model uses continual learning with stacked LoRA adapters:
+Base model frozen.
+Each phase adds a new LoRA (r=64, ~119M trainable params per phase).
+Trained sequentially on split UltraChat_200k (69k examples per phase).
+Tool: Unsloth + TRL SFTTrainer (2x faster than standard).
+Quick demo: 60 steps per phase (~30 min total on T4 GPU).
+For stronger results: Increase max_steps=300-500 per phase.
+Full training code (Colab-ready) available in the repo files or original notebook.
+Limitations
+Short training demo → Good but not SOTA (responses may repeat sometimes).
+Text-only (no vision/multimodal).
+English primary (UltraChat is mostly English).
+How to Improve / Extend
+Want to grow it more?
+Add Phase 4: Fine-tune on coding dataset (e.g., add new LoRA for programming).
+Retrain with higher max_steps or larger r=128 for more connections.
+Merge LoRAs fully: model.merge_and_unload() for single-file upload.
+License
+Same as base Phi-3: Microsoft Research License (permissive for research/commercial).
+Made with ❤️ by Mark — continual learning experiment!
+If you use/fork this, star the repo! 🚀
+text