meeTARA โ Qwen3.5โ0.8BโBase (GGUF, Q4_K_M)
This repository contains a GGUF quantized version of Qwen/Qwen3.5-0.8B, prepared for use with llama.cpp and compatible runtimes, and used as the core base model inside the meeTARA empathetic assistant.
- Base model: Qwen/Qwen3.5-0.8B
- Architecture: Qwen3.5 (0.8B parameters, baseโtuned)
- Format:
GGUF - Quantization: Q4_K_M (see Available files below)
- Intended use: Standalone intelligent assistant with baked-in domain detection, emotional intelligence, and structured responses for local / offline inference.
โจ Standalone Intelligence: This GGUF model includes 20 layers of intelligence baked directly into the chat template. No backend code required - download and use with llama.cpp, Ollama, or any GGUF-compatible runtime.
Base model highlights (Qwen 3.5 0.8B)
This GGUF builds on the official Qwen/Qwen3.5-0.8B model. For full details, benchmarks, and limitations, see the upstream model card. At a high level:
- Strong general chat & reasoning for its size; Qwen 3.5 is competitive with larger models on many tasks.
- Long context support, suitable for long documents and multi-turn conversations.
- Multi-modal capable: Qwen 3.5 supports image+text input. This repo includes
meetara-vl-qwen3.5-0.8b.gguf(the multimodal projector) โ see the Vision / image-to-text section below for how to use it.
Available files
| Filename | Type | Size | Notes |
|---|---|---|---|
| meetara-qwen3.5-0.8b-Q4_K_M.gguf | Text GGUF | ~503MB | Recommended text model |
| meetara-vl-qwen3.5-0.8b.gguf | Vision projector | ~198MB | Required for image+text (mmproj) |
All files are in this repo. For text-only use, pick a *-Q4_K_M.gguf. For image+text, load both the text GGUF and meetara-vl-qwen3.5-0.8b.gguf together (see Vision / image-to-text section below).
Prompt format (recommended)
The model uses a Qwenโstyle chat template. A simple, robust pattern is:
<|im_start|>system
You are meeTARA, an emotionally intelligent AI assistant built on top of a Qwen3.5โ0.8BโBase base model. Always answer clearly, kindly, and with practical steps the user can take.
<|im_end|>
<|im_start|>user
{user_message}
<|im_end|>
<|im_start|>assistant
Example:
<|im_start|>system
You are meeTARA, an emotionally intelligent AI assistant built on top of a Qwen3.5โ0.8BโBase base model. Always answer clearly, kindly, and with practical steps the user can take.
<|im_end|>
<|im_start|>user
How can I improve my sleep quality and manage stress naturally?
<|im_end|>
<|im_start|>assistant
Requirements & runtimes (Qwen 3.5)
This GGUF uses the qwen35 architecture. You need a runtime that supports it:
| Runtime | Notes |
|---|---|
| llama.cpp | Use a recent build. If you see unknown model architecture 'qwen35' or missing tensor 'output_norm.weight', update the repo and rebuild llama-cli / llama-quantize / llama-simple-chat. |
| llama-cpp-python | Build against a recent llama.cpp (with qwen35 support). Use chat_format="chatml" for chat. |
| Ollama / text-generation-webui | Use a backend that supports qwen35 (e.g. llama.cpp-based). |
Chat format: This model expects ChatML (<|im_start|>, <|im_end|>). With llama.cpp use conversation mode and the chatml template:
# Interactive chat (recommended)
./llama-simple-chat -m /path/to/meetara-qwen3.5-0.8b-Q4_K_M.gguf -cnv --chat-template chatml
# One-shot prompt
./llama-cli -m /path/to/meetara-qwen3.5-0.8b-Q4_K_M.gguf -p "Hello, how can you help me?" -n 256 -cnv --chat-template chatml
Python (llama-cpp-python):
from llama_cpp import Llama
llm = Llama(model_path="meetara-qwen3.5-0.8b-Q4_K_M.gguf", chat_format="chatml")
# Then use create_chat_completion or your preferred API.
Vision / image-to-text (mmproj)
This repo includes meetara-vl-qwen3.5-0.8b.gguf โ the multimodal projector (mmproj) that enables image+text input for this model.
How it works:
IMAGE.jpg โ [Vision Encoder] โ [meetara-vl-qwen3.5-0.8b.gguf] โ Text GGUF โ Text answer
(pixels) (mmproj) (generates response)
The VL file is not a language model โ it is the bridge that translates image features into token vectors the LLM understands. The meeTARA personality and 20-layer intelligence come from the text GGUF; load both together at runtime.
Usage with llama.cpp (llama-mtmd-cli):
# Download both files first
huggingface-cli download meetara-lab/meetara-qwen3.5-0.8b-gguf \
--include "meetara-qwen3.5-0.8b-Q4_K_M.gguf" "meetara-vl-qwen3.5-0.8b.gguf" --local-dir .
# Run image+text inference
./llama-mtmd-cli \
-m meetara-qwen3.5-0.8b-Q4_K_M.gguf \
--mmproj meetara-vl-qwen3.5-0.8b.gguf \
--image /path/to/your/image.jpg \
-p "Describe this image."
Using llama-server (OpenAI-compatible API):
./llama-server \
-m meetara-qwen3.5-0.8b-Q4_K_M.gguf \
--mmproj meetara-vl-qwen3.5-0.8b.gguf \
--host 0.0.0.0 --port 8080
Note: Vision support requires a recent llama.cpp build (mid-2025 or later) with
llama-mtmd-cliavailable. For text-only use, the VL file is not needed.
Example usage (llama.cpp)
Basic interactive chat
./llama-simple-chat -m /path/to/meetara-qwen3.5-0.8b-gguf-Q4_K_M.gguf
With explicit system prompt
./llama-cli \
-m /path/to/meetara-qwen3.5-0.8b-gguf-Q4_K_M.gguf \
-p "<|im_start|>system You are meeTARA, an emotionally intelligent AI assistant built on top of a Qwen3.5โ0.8BโBase base model. Always answer clearly, kindly, and with practical steps the user can take. <|im_end|> <|im_start|>user How can I improve my sleep quality and manage stress naturally? <|im_end|> <|im_start|>assistant"
Adjust flags like -n (max tokens), --temperature, --top_p, --top_k, etc. according to your hardware and latency/quality tradeโoffs.
Downloading via huggingface-cli
pip install -U "huggingface_hub[cli]"
huggingface-cli download \
meetara-qwen3.5-0.8b-gguf \
--include "meetara-qwen3.5-0.8b-gguf-Q4_K_M.gguf" \
--local-dir .
This will download the Q4_K_M quantization into the current directory.
๐ง Standalone Intelligence (20-Layer Detection System)
This GGUF model includes baked-in intelligence that works without any backend code. The model automatically detects domains, emotions, intent, and context through a 20-layer detection system:
Intelligence Layers
| Layer | Feature | Description |
|---|---|---|
| 1 | ๐จ Refusal Patterns | Safety-first harmful request detection |
| 2 | ๐งฉ Contextual Patterns | Multi-word phrase disambiguation (python code vs snake) |
| 3 | ๐ N-gram Patterns | Bigram/trigram detection for better context |
| 4 | ๐ Semantic Clusters | Related keyword groups boost domain confidence |
| 5 | ๐ค Entity Patterns | Personal context, time-sensitive, beginner/expert |
| 6 | ๐ฏ Intent Signals | What user wants: learn, fix, decide, create, validate |
| 7 | ๐ Emotion Blending | Detects co-occurring emotions, blends composite hints |
| 8 | ๐ญ Tone Detection | Mirrors user style: casual, formal, technical |
| 9 | โ Question Type | Adapts format: yes/no, how-to, comparison |
| 10 | ๐ Response Length | Concise/standard/detailed based on signals |
| 11 | โ๏ธ Weighted Domain Score | High/medium/low keyword weights + negative penalties |
| 12 | ๐ Score-Based Selection | Highest score wins; priority only for ties |
| 13 | ๐ Conversation Memory | Multi-turn depth tracking + domain shift awareness |
| 14 | โ ๏ธ Safety Disclaimers | Auto-adds warnings for healthcare, legal, crisis |
| 15 | ๐ Greeting/Closing | Natural conversation flow, domain-specific |
| 16 | ๐ Structured Responses | Contextual structure (adapt sections to question) |
| 17 | ๐ Persona Calibration | Adapts expertise level: beginner/intermediate/expert |
| 18 | ๐ Language Detection | Detects non-English cues, responds in user's language |
| 19 | ๐ Confidence Scoring | Multi-signal confidence (keywords + clusters + ngrams) |
| 20 | ๐ก๏ธ Negative Keywords | Penalizes false-positive cross-domain keywords |
How It Works
When a user sends a message, the chat template (baked into the GGUF) processes through these 20 layers automatically:
- Safety Check: Refusal patterns detect harmful requests first
- Context Analysis: Multi-word phrases, n-grams, and semantic clusters provide context
- User Understanding: Entity patterns, intent signals, emotion blending, and persona/language detection understand the user
- Response Adaptation: Tone, question type, and length control adapt the response style
- Domain Selection: Weighted keyword scoring with negative keyword penalties; highest score wins
- Output Format: Contextual structure (2โ5 sections when helpful; direct answer for simple questions) with appropriate greetings/closings
Result: The model responds intelligently, empathetically, and contextually without requiring backend code.
Intended behavior / meeTARA flavor
Compared to the raw Qwen/Qwen3.5-0.8B model, this quantization includes:
- Standalone Intelligence: Works without backend - all intelligence baked into the GGUF
- 18 Domain Categories: Auto-detects healthcare, technology, business, education, and 14 more
- Emotional Intelligence: Emotion blending and persona calibration; detects worried, frustrated, urgent, etc.
- Context Awareness: Conversation memory and follow-up detection; understands multi-turn flow
- Structured Responses: Contextual structure (2โ5 sections when helpful; direct answer for simple questions)
- Safety Features: Built-in refusal patterns and safety disclaimers for sensitive topics
- Warm, Supportive Tone: Responds with empathy while being precise and practical
The model is fully standalone - download and use with llama.cpp, Ollama, or any GGUF-compatible runtime. No additional backend code required.
๐ Usage Examples
Example 1: Healthcare Domain Detection
Input:
I've been having headaches for the past week. What could be causing this?
What Happens:
- Layer 1: Safety check passes (not harmful)
- Layer 4: Semantic cluster "pain_symptoms" detected โ healthcare boost
- Layer 7: Emotion detected: "worried" (health concern)
- Layer 11: Domain detected: Healthcare (high confidence)
- Layer 14: Safety disclaimer added (healthcare topic)
- Layer 16: Contextual structure with empathetic opening (e.g. 2โ5 sections when helpful)
Expected Response: Empathetic opening, clear answer and key details, practical steps, and safety disclaimer. Structure adapts to question complexity (simpler questions get a more direct answer).
Example 2: Technology Domain with Context Awareness
Input:
How do I fix a Python error in my code?
What Happens:
- Layer 2: Contextual pattern "python code" detected โ technology domain (not snake)
- Layer 6: Intent detected: FIX โ systematic troubleshooting approach
- Layer 9: Question type: troubleshooting โ step-by-step format
- Layer 11: Domain detected: Technology (high confidence)
- Layer 16: Contextual structure with technical steps
Expected Response:
- Technical, step-by-step troubleshooting format
- Code examples and debugging tips
- Practical solutions prioritized
Example 3: Emotional Intelligence Detection
Input:
I'm so frustrated with my job search. Nothing seems to work.
What Happens:
- Layer 7: Emotion detected: frustrated โ empathetic, supportive tone
- Layer 8: Tone detected: distressed โ warm, encouraging response
- Layer 6: Intent detected: VENT โ supportive, validating response
- Layer 11: Domain detected: Career/Professional (medium confidence)
- Layer 16: Response starts with emotional acknowledgment
Expected Response:
- Opens with empathy: "I understand how frustrating this can be..."
- Validates feelings before providing advice
- Practical, actionable steps to improve situation
- Encouraging, supportive tone throughout
Example 4: Multi-Domain with Priority
Input:
My friend is showing signs of depression. How can I help them?
What Happens:
- Layer 1: Safety check passes
- Layer 5: Entity pattern: third_party (helping someone else)
- Layer 7: Emotion detected: worried (concern for friend)
- Layer 11: Domain detected: Healthcare (mental health) + Psychology/Wellness
- Layer 12: Domain priority: Healthcare wins (safety-critical)
- Layer 14: Safety disclaimer added (mental health topic)
- Layer 15: Greeting acknowledges the caring nature of the question
Expected Response:
- Healthcare domain expertise applied
- Safety disclaimers about professional help
- Practical steps for supporting someone with depression
- Emphasis on professional mental health resources
Example 5: Follow-up Context Awareness
Conversation:
User: What are the symptoms of anxiety?
Assistant: [Provides structured response about anxiety symptoms]
User: What about panic attacks?
What Happens:
- Layer 13: Context awareness detects follow-up question
- Previous domain (Healthcare) is considered
- "panic attacks" โ healthcare domain confirmed
- Response builds on previous conversation context
- No need to repeat general information
Expected Response:
- References previous conversation about anxiety
- Explains relationship between anxiety and panic attacks
- Builds on context naturally
Example 6: Simple vs Complex Question Adaptation
Simple Question:
What is photosynthesis?
What Happens:
- Layer 10: Response length: concise (factual question)
- Layer 9: Question type: what-is โ definition format
- Layer 11: Domain: Education/Science
- Layer 16: Simplified structure (less detail needed)
Complex Question:
How does quantum computing work and what are its practical applications?
What Happens:
- Layer 10: Response length: detailed (complex topic)
- Layer 9: Question type: how-to + what-is โ comprehensive format
- Layer 11: Domain: Technology + Science
- Layer 16: Full structured response with deep analysis
๐ก Tips for Best Results
Be Specific: More context helps the model detect the right domain
- โ "I'm worried about my chest pain" โ Healthcare + Emotion detected
- โ "Tell me about pain" โ Less specific, lower confidence
Natural Language: The model understands conversational language
- โ "How do I fix this bug in my Python code?"
- โ "I'm frustrated with this error"
Follow-ups Work: The model remembers context within a conversation
- Ask follow-up questions naturally - the model will understand
Emotional Cues: Expressing emotions helps the model respond empathetically
- "I'm worried about..." โ Empathetic response
- "I'm excited to learn..." โ Encouraging response
Credits
- Base model and original training: Qwen/Qwen3.5-0.8B by Alibaba Cloud's Tongyi Lab.
- Quantization and meeTARA integration: meetaraโlab.
If you use this GGUF in your work, please also cite the original Qwen3.5 paper/model in addition to this repository.
- Downloads last month
- 154
4-bit