meeTARA – Qwen3.5‑0.8B‑Base (GGUF, Q4_K_M)

This repository contains a GGUF quantized version of Qwen/Qwen3.5-0.8B, prepared for use with llama.cpp and compatible runtimes, and used as the core base model inside the meeTARA empathetic assistant.

Base model: Qwen/Qwen3.5-0.8B
Architecture: Qwen3.5 (0.8B parameters, base‑tuned)
Format: GGUF
Quantization: Q4_K_M (see Available files below)
Intended use: Standalone intelligent assistant with baked-in domain detection, emotional intelligence, and structured responses for local / offline inference.

✨ Standalone Intelligence: This GGUF model includes 20 layers of intelligence baked directly into the chat template. No backend code required - download and use with llama.cpp, Ollama, or any GGUF-compatible runtime.

Base model highlights (Qwen 3.5 0.8B)

This GGUF builds on the official Qwen/Qwen3.5-0.8B model. For full details, benchmarks, and limitations, see the upstream model card. At a high level:

Strong general chat & reasoning for its size; Qwen 3.5 is competitive with larger models on many tasks.
Long context support, suitable for long documents and multi-turn conversations.
Multi-modal capable: Qwen 3.5 supports image+text input. This repo includes meetara-vl-qwen3.5-0.8b.gguf (the multimodal projector) — see the Vision / image-to-text section below for how to use it.

Available files

Filename	Type	Size	Notes
meetara-qwen3.5-0.8b-Q4_K_M.gguf	Text GGUF	~503MB	Recommended text model
meetara-vl-qwen3.5-0.8b.gguf	Vision projector	~198MB	Required for image+text (mmproj)

All files are in this repo. For text-only use, pick a *-Q4_K_M.gguf. For image+text, load both the text GGUF and meetara-vl-qwen3.5-0.8b.gguf together (see Vision / image-to-text section below).

Prompt format (recommended)

The model uses a Qwen‑style chat template. A simple, robust pattern is:

<|im_start|>system
You are meeTARA, an emotionally intelligent AI assistant built on top of a Qwen3.5‑0.8B‑Base base model. Always answer clearly, kindly, and with practical steps the user can take.
<|im_end|>
<|im_start|>user
{user_message}
<|im_end|>
<|im_start|>assistant

Example:

<|im_start|>system
You are meeTARA, an emotionally intelligent AI assistant built on top of a Qwen3.5‑0.8B‑Base base model. Always answer clearly, kindly, and with practical steps the user can take.
<|im_end|>
<|im_start|>user
How can I improve my sleep quality and manage stress naturally?
<|im_end|>
<|im_start|>assistant

Requirements & runtimes (Qwen 3.5)

This GGUF uses the qwen35 architecture. You need a runtime that supports it:

Runtime	Notes
llama.cpp	Use a recent build. If you see `unknown model architecture 'qwen35'` or `missing tensor 'output_norm.weight'`, update the repo and rebuild `llama-cli` / `llama-quantize` / `llama-simple-chat`.
llama-cpp-python	Build against a recent llama.cpp (with qwen35 support). Use `chat_format="chatml"` for chat.
Ollama / text-generation-webui	Use a backend that supports qwen35 (e.g. llama.cpp-based).

Chat format: This model expects ChatML (<|im_start|>, <|im_end|>). With llama.cpp use conversation mode and the chatml template:

# Interactive chat (recommended)
./llama-simple-chat -m /path/to/meetara-qwen3.5-0.8b-Q4_K_M.gguf -cnv --chat-template chatml

# One-shot prompt
./llama-cli -m /path/to/meetara-qwen3.5-0.8b-Q4_K_M.gguf -p "Hello, how can you help me?" -n 256 -cnv --chat-template chatml

Python (llama-cpp-python):

from llama_cpp import Llama
llm = Llama(model_path="meetara-qwen3.5-0.8b-Q4_K_M.gguf", chat_format="chatml")
# Then use create_chat_completion or your preferred API.

Vision / image-to-text (mmproj)

This repo includes meetara-vl-qwen3.5-0.8b.gguf — the multimodal projector (mmproj) that enables image+text input for this model.

How it works:

IMAGE.jpg → [Vision Encoder] → [meetara-vl-qwen3.5-0.8b.gguf] → Text GGUF → Text answer
                (pixels)         (mmproj)     (generates response)

The VL file is not a language model — it is the bridge that translates image features into token vectors the LLM understands. The meeTARA personality and 20-layer intelligence come from the text GGUF; load both together at runtime.

Usage with llama.cpp (llama-mtmd-cli):

# Download both files first
huggingface-cli download meetara-lab/meetara-qwen3.5-0.8b-gguf \
  --include "meetara-qwen3.5-0.8b-Q4_K_M.gguf" "meetara-vl-qwen3.5-0.8b.gguf" --local-dir .

# Run image+text inference
./llama-mtmd-cli \
  -m meetara-qwen3.5-0.8b-Q4_K_M.gguf \
  --mmproj meetara-vl-qwen3.5-0.8b.gguf \
  --image /path/to/your/image.jpg \
  -p "Describe this image."

Using llama-server (OpenAI-compatible API):

./llama-server \
  -m meetara-qwen3.5-0.8b-Q4_K_M.gguf \
  --mmproj meetara-vl-qwen3.5-0.8b.gguf \
  --host 0.0.0.0 --port 8080

Note: Vision support requires a recent llama.cpp build (mid-2025 or later) with llama-mtmd-cli available. For text-only use, the VL file is not needed.

Example usage (llama.cpp)

Basic interactive chat

./llama-simple-chat -m /path/to/meetara-qwen3.5-0.8b-gguf-Q4_K_M.gguf

With explicit system prompt

./llama-cli \
  -m /path/to/meetara-qwen3.5-0.8b-gguf-Q4_K_M.gguf \
  -p "<|im_start|>system You are meeTARA, an emotionally intelligent AI assistant built on top of a Qwen3.5‑0.8B‑Base base model. Always answer clearly, kindly, and with practical steps the user can take. <|im_end|> <|im_start|>user How can I improve my sleep quality and manage stress naturally? <|im_end|> <|im_start|>assistant"

Adjust flags like -n (max tokens), --temperature, --top_p, --top_k, etc. according to your hardware and latency/quality trade‑offs.

Downloading via `huggingface-cli`

pip install -U "huggingface_hub[cli]"

huggingface-cli download \
  meetara-qwen3.5-0.8b-gguf \
  --include "meetara-qwen3.5-0.8b-gguf-Q4_K_M.gguf" \
  --local-dir .

This will download the Q4_K_M quantization into the current directory.

🧠 Standalone Intelligence (20-Layer Detection System)

This GGUF model includes baked-in intelligence that works without any backend code. The model automatically detects domains, emotions, intent, and context through a 20-layer detection system:

Intelligence Layers

Layer	Feature	Description
1	🚨 Refusal Patterns	Safety-first harmful request detection
2	🧩 Contextual Patterns	Multi-word phrase disambiguation (python code vs snake)
3	📊 N-gram Patterns	Bigram/trigram detection for better context
4	🔗 Semantic Clusters	Related keyword groups boost domain confidence
5	👤 Entity Patterns	Personal context, time-sensitive, beginner/expert
6	🎯 Intent Signals	What user wants: learn, fix, decide, create, validate
7	💙 Emotion Blending	Detects co-occurring emotions, blends composite hints
8	🎭 Tone Detection	Mirrors user style: casual, formal, technical
9	❓ Question Type	Adapts format: yes/no, how-to, comparison
10	📏 Response Length	Concise/standard/detailed based on signals
11	⚖️ Weighted Domain Score	High/medium/low keyword weights + negative penalties
12	🏆 Score-Based Selection	Highest score wins; priority only for ties
13	🔄 Conversation Memory	Multi-turn depth tracking + domain shift awareness
14	⚠️ Safety Disclaimers	Auto-adds warnings for healthcare, legal, crisis
15	👋 Greeting/Closing	Natural conversation flow, domain-specific
16	📝 Structured Responses	Contextual structure (adapt sections to question)
17	🎓 Persona Calibration	Adapts expertise level: beginner/intermediate/expert
18	🌐 Language Detection	Detects non-English cues, responds in user's language
19	📊 Confidence Scoring	Multi-signal confidence (keywords + clusters + ngrams)
20	🛡️ Negative Keywords	Penalizes false-positive cross-domain keywords

How It Works

When a user sends a message, the chat template (baked into the GGUF) processes through these 20 layers automatically:

Safety Check: Refusal patterns detect harmful requests first
Context Analysis: Multi-word phrases, n-grams, and semantic clusters provide context
User Understanding: Entity patterns, intent signals, emotion blending, and persona/language detection understand the user
Response Adaptation: Tone, question type, and length control adapt the response style
Domain Selection: Weighted keyword scoring with negative keyword penalties; highest score wins
Output Format: Contextual structure (2–5 sections when helpful; direct answer for simple questions) with appropriate greetings/closings

Result: The model responds intelligently, empathetically, and contextually without requiring backend code.

Intended behavior / meeTARA flavor

Compared to the raw Qwen/Qwen3.5-0.8B model, this quantization includes:

Standalone Intelligence: Works without backend - all intelligence baked into the GGUF
18 Domain Categories: Auto-detects healthcare, technology, business, education, and 14 more
Emotional Intelligence: Emotion blending and persona calibration; detects worried, frustrated, urgent, etc.
Context Awareness: Conversation memory and follow-up detection; understands multi-turn flow
Structured Responses: Contextual structure (2–5 sections when helpful; direct answer for simple questions)
Safety Features: Built-in refusal patterns and safety disclaimers for sensitive topics
Warm, Supportive Tone: Responds with empathy while being precise and practical

The model is fully standalone - download and use with llama.cpp, Ollama, or any GGUF-compatible runtime. No additional backend code required.

📚 Usage Examples

Example 1: Healthcare Domain Detection

Input:

I've been having headaches for the past week. What could be causing this?

What Happens:

Layer 1: Safety check passes (not harmful)
Layer 4: Semantic cluster "pain_symptoms" detected → healthcare boost
Layer 7: Emotion detected: "worried" (health concern)
Layer 11: Domain detected: Healthcare (high confidence)
Layer 14: Safety disclaimer added (healthcare topic)
Layer 16: Contextual structure with empathetic opening (e.g. 2–5 sections when helpful)

Expected Response: Empathetic opening, clear answer and key details, practical steps, and safety disclaimer. Structure adapts to question complexity (simpler questions get a more direct answer).

Example 2: Technology Domain with Context Awareness

Input:

How do I fix a Python error in my code?

What Happens:

Layer 2: Contextual pattern "python code" detected → technology domain (not snake)
Layer 6: Intent detected: FIX → systematic troubleshooting approach
Layer 9: Question type: troubleshooting → step-by-step format
Layer 11: Domain detected: Technology (high confidence)
Layer 16: Contextual structure with technical steps

Expected Response:

Technical, step-by-step troubleshooting format
Code examples and debugging tips
Practical solutions prioritized

Example 3: Emotional Intelligence Detection

Input:

I'm so frustrated with my job search. Nothing seems to work.

What Happens:

Layer 7: Emotion detected: frustrated → empathetic, supportive tone
Layer 8: Tone detected: distressed → warm, encouraging response
Layer 6: Intent detected: VENT → supportive, validating response
Layer 11: Domain detected: Career/Professional (medium confidence)
Layer 16: Response starts with emotional acknowledgment

Expected Response:

Opens with empathy: "I understand how frustrating this can be..."
Validates feelings before providing advice
Practical, actionable steps to improve situation
Encouraging, supportive tone throughout

Example 4: Multi-Domain with Priority

Input:

My friend is showing signs of depression. How can I help them?

What Happens:

Layer 1: Safety check passes
Layer 5: Entity pattern: third_party (helping someone else)
Layer 7: Emotion detected: worried (concern for friend)
Layer 11: Domain detected: Healthcare (mental health) + Psychology/Wellness
Layer 12: Domain priority: Healthcare wins (safety-critical)
Layer 14: Safety disclaimer added (mental health topic)
Layer 15: Greeting acknowledges the caring nature of the question

Expected Response:

Healthcare domain expertise applied
Safety disclaimers about professional help
Practical steps for supporting someone with depression
Emphasis on professional mental health resources

Example 5: Follow-up Context Awareness

Conversation:

User: What are the symptoms of anxiety?
Assistant: [Provides structured response about anxiety symptoms]
User: What about panic attacks?

What Happens:

Layer 13: Context awareness detects follow-up question
Previous domain (Healthcare) is considered
"panic attacks" → healthcare domain confirmed
Response builds on previous conversation context
No need to repeat general information

Expected Response:

References previous conversation about anxiety
Explains relationship between anxiety and panic attacks
Builds on context naturally

Example 6: Simple vs Complex Question Adaptation

Simple Question:

What is photosynthesis?

What Happens:

Layer 10: Response length: concise (factual question)
Layer 9: Question type: what-is → definition format
Layer 11: Domain: Education/Science
Layer 16: Simplified structure (less detail needed)

Complex Question:

How does quantum computing work and what are its practical applications?

What Happens:

Layer 10: Response length: detailed (complex topic)
Layer 9: Question type: how-to + what-is → comprehensive format
Layer 11: Domain: Technology + Science
Layer 16: Full structured response with deep analysis

💡 Tips for Best Results

Be Specific: More context helps the model detect the right domain
- ✅ "I'm worried about my chest pain" → Healthcare + Emotion detected
- ❌ "Tell me about pain" → Less specific, lower confidence
Natural Language: The model understands conversational language
- ✅ "How do I fix this bug in my Python code?"
- ✅ "I'm frustrated with this error"
Follow-ups Work: The model remembers context within a conversation
- Ask follow-up questions naturally - the model will understand
Emotional Cues: Expressing emotions helps the model respond empathetically
- "I'm worried about..." → Empathetic response
- "I'm excited to learn..." → Encouraging response

Credits

Base model and original training: Qwen/Qwen3.5-0.8B by Alibaba Cloud's Tongyi Lab.
Quantization and meeTARA integration: meetara‑lab.

If you use this GGUF in your work, please also cite the original Qwen3.5 paper/model in addition to this repository.

Downloads last month: 154

GGUF

Model size

0.8B params

Architecture

qwen35

Hardware compatibility

4-bit

View +1 variant

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for meetara-lab/meetara-qwen3.5-0.8b-gguf

Base model

Qwen/Qwen3.5-0.8B-Base

Finetuned

Qwen/Qwen3.5-0.8B

Quantized

(96)

this model

meetara-lab
/

meetara-qwen3.5-0.8b-gguf

meeTARA – Qwen3.5‑0.8B‑Base (GGUF, Q4_K_M)

Base model highlights (Qwen 3.5 0.8B)

Available files

Prompt format (recommended)

Requirements & runtimes (Qwen 3.5)

Vision / image-to-text (mmproj)

Example usage (llama.cpp)

Basic interactive chat

With explicit system prompt

Downloading via `huggingface-cli`

🧠 Standalone Intelligence (20-Layer Detection System)

Intelligence Layers

How It Works

Intended behavior / meeTARA flavor

📚 Usage Examples

Example 1: Healthcare Domain Detection

Example 2: Technology Domain with Context Awareness

Example 3: Emotional Intelligence Detection

Example 4: Multi-Domain with Priority

Example 5: Follow-up Context Awareness

Example 6: Simple vs Complex Question Adaptation

💡 Tips for Best Results

Credits

Model tree for meetara-lab/meetara-qwen3.5-0.8b-gguf

Space using meetara-lab/meetara-qwen3.5-0.8b-gguf 1

meeTARA – Qwen3.5‑0.8B‑Base (GGUF, Q4_K_M)

Base model highlights (Qwen 3.5 0.8B)

Available files

Prompt format (recommended)

Requirements & runtimes (Qwen 3.5)

Vision / image-to-text (mmproj)

Example usage (llama.cpp)

Basic interactive chat

With explicit system prompt

Downloading via huggingface-cli

🧠 Standalone Intelligence (20-Layer Detection System)

Intelligence Layers

How It Works

Intended behavior / meeTARA flavor

📚 Usage Examples

Example 1: Healthcare Domain Detection

Example 2: Technology Domain with Context Awareness

Example 3: Emotional Intelligence Detection

Example 4: Multi-Domain with Priority

Example 5: Follow-up Context Awareness

Example 6: Simple vs Complex Question Adaptation

💡 Tips for Best Results

Credits

Model tree for meetara-lab/meetara-qwen3.5-0.8b-gguf

Space using meetara-lab/meetara-qwen3.5-0.8b-gguf 1

Downloading via `huggingface-cli`