Text Generation
Transformers
Safetensors
English
gemma3_text
gemma
finetune
qlora
chatbot
tars
conversational
text-generation-inference
Instructions to use am-om/tars_ai with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use am-om/tars_ai with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="am-om/tars_ai") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("am-om/tars_ai") model = AutoModelForCausalLM.from_pretrained("am-om/tars_ai") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use am-om/tars_ai with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "am-om/tars_ai" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "am-om/tars_ai", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/am-om/tars_ai
- SGLang
How to use am-om/tars_ai with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "am-om/tars_ai" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "am-om/tars_ai", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "am-om/tars_ai" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "am-om/tars_ai", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use am-om/tars_ai with Docker Model Runner:
docker model run hf.co/am-om/tars_ai
File size: 5,025 Bytes
12d6a33 500663f 12d6a33 500663f 12d6a33 500663f 12d6a33 500663f 12d6a33 500663f 12d6a33 500663f 12d6a33 500663f 12d6a33 500663f 12d6a33 500663f 12d6a33 500663f 12d6a33 500663f 12d6a33 500663f 12d6a33 500663f 12d6a33 500663f 12d6a33 500663f 12d6a33 500663f 12d6a33 500663f 12d6a33 500663f 12d6a33 500663f 12d6a33 500663f 12d6a33 500663f 12d6a33 500663f 12d6a33 500663f 12d6a33 500663f 12d6a33 500663f 12d6a33 500663f 12d6a33 500663f 12d6a33 500663f 12d6a33 500663f 12d6a33 500663f 12d6a33 500663f | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 | ---
library_name: transformers
license: apache-2.0
language:
- en
base_model: google/gemma-3-1b-it
tags:
- gemma
- finetune
- qlora
- chatbot
- tars
---
# Model Card for TARS (Gemma 3 1B Fine-tune)
This is a fine-tuned version of `google/gemma-3-1b-it` trained to act as the **TARS astronaut assistant** from *Interstellar*.
It is designed to be professional for tasks but witty for off-topic chat, and its responses are guided by a simulated user emotion tag.
---
## Model Details
### Model Description
This model is a QLoRA fine-tune of `google/gemma-3-1b-it` on a custom synthetic dataset.
The goal was to create a chatbot that embodies the **TARS persona**:
- **Task-Oriented:** Professional, direct, and helpful for mission-related queries.
- **Persona-Driven:** Witty, empathetic, or humorous for off-topic or personal chat.
- **Emotion-Aware:** The model's response style is influenced by a `[Detected Emotion: ...]` tag.
**Developed by:** (huggingface.co/am-om)
**Shared by:** (Om Singh)
**Model type:** Causal Language Model
**Language(s):** English (`en`)
**License:** apache-2.0
**Finetuned from model:** `google/gemma-3-1b-it`
---
## Model Sources (optional)
- **Repository:** [https://huggingface.co/am-om/tars_ai]
---
## Uses
### Direct Use
This model is intended for **direct use as a chatbot**, following a specific prompt format.
⚠️ **Important:** This model requires a specific prompt format that includes a detected emotion.
Do **not** send raw text as the user query.
#### Prompt Format
The user turn *must* follow this structure:
```
[Detected Emotion: {emotion}]
[User Query: {your_text_here}]
```
**Example:**
```
[Detected Emotion: anxious]
[User Query: Are we going to make it?]
```
### Out-of-Scope Use
This model is not intended for:
* Any use without the required `[Detected Emotion: ...]` and `[User Query: ...]` tags.
* Use as a base model for further fine-tuning.
* Any critical decision-making without human oversight.
## How to Get Started with the Model
Use the code below to get started with the model.
```python
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch
# Load the model from the Hub
model_id = "am-om/tars_ai"
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype=torch.bfloat16
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer
)
# --- Define your chat history ---
# The system prompt is automatically loaded from the tokenizer's chat template.
messages = []
# Example query
user_query = "I'm feeling a bit lonely out here."
emotion = "sad"
# Format the input correctly!
formatted_input = f"[Detected Emotion: {emotion}]\n[User Query: {user_query}]"
messages.append({"role": "user", "content": formatted_input})
# --- Generate the response ---
prompt = pipe.tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
outputs = pipe(
prompt,
max_new_tokens=256,
do_sample=True,
temperature=0.7,
top_p=0.95,
pad_token_id=pipe.tokenizer.eos_token_id
)
# Extract and print just the new response
response = outputs[0]["generated_text"][len(prompt):].strip()
print(f"TARS: {response}")
```
## Training Details
### Training Data
This model was fine-tuned on a custom, synthetically-generated dataset of 344 prompt/response pairs. The dataset was designed to teach the model to differentiate between task-oriented and persona-driven queries based on the emotion tag.
### Training Procedure
The model was fine-tuned using QLoRA for 3 epochs. The adapter (from checkpoint-156, the best-performing epoch) was then merged with the base model.
#### Training Hyperparameters
* **Framework:** TRL (Transformer Reinforcement Learning)
* **Quantization:** 4-bit (bnb_4bit_quant_type="nf4")
* **LoRA `r`:** 16
* **LoRA `alpha`:** 32
* **LoRA `dropout`:** 0.05
* **Optimizer:** paged_adamw_8bit
* **Learning Rate:** 5e-5
* **LR Scheduler:** constant
* **Epochs:** 3
* **Batch Size:** 4
## Environmental Impact
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
* **Hardware Type:** NVIDIA T4
* **Hours used:** ~4hours
* **Cloud Provider:** Google Colab
* **Compute Region:** (e.g., us-central1 - *check your Colab instance*)
* **Carbon Emitted:** ~5.5 g CO2eq (Estimated)
## Technical Specifications [optional]
### Model Architecture and Objective
This is a standard decoder-only Transformer (Gemma 3) fine-tuned with a Causal Language Modeling objective.
### Compute Infrastructure
#### Hardware
* NVIDIA T4 16GB (Google Collab )
#### Software
* `transformers`
* `trl`
* `bitsandbytes`
* `accelerate`
* `peft`
## Model Card Authors [optional]
(Om Singh)(huggingface.co/am-om)
## Model Card Contact
(huggingface.co/am-om)
|