meeTARA โ€“ Qwen3.5โ€‘0.8Bโ€‘Base (GGUF, Q4_K_M)

This repository contains a GGUF quantized version of Qwen/Qwen3.5-0.8B, prepared for use with llama.cpp and compatible runtimes, and used as the core base model inside the meeTARA empathetic assistant.

  • Base model: Qwen/Qwen3.5-0.8B
  • Architecture: Qwen3.5 (0.8B parameters, baseโ€‘tuned)
  • Format: GGUF
  • Quantization: Q4_K_M (see Available files below)
  • Intended use: Standalone intelligent assistant with baked-in domain detection, emotional intelligence, and structured responses for local / offline inference.

โœจ Standalone Intelligence: This GGUF model includes 20 layers of intelligence baked directly into the chat template. No backend code required - download and use with llama.cpp, Ollama, or any GGUF-compatible runtime.


Base model highlights (Qwen 3.5 0.8B)

This GGUF builds on the official Qwen/Qwen3.5-0.8B model. For full details, benchmarks, and limitations, see the upstream model card. At a high level:

  • Strong general chat & reasoning for its size; Qwen 3.5 is competitive with larger models on many tasks.
  • Long context support, suitable for long documents and multi-turn conversations.
  • Multi-modal capable: Qwen 3.5 supports image+text input. This repo includes meetara-vl-qwen3.5-0.8b.gguf (the multimodal projector) โ€” see the Vision / image-to-text section below for how to use it.

Available files

Filename Type Size Notes
meetara-qwen3.5-0.8b-Q4_K_M.gguf Text GGUF ~503MB Recommended text model
meetara-vl-qwen3.5-0.8b.gguf Vision projector ~198MB Required for image+text (mmproj)

All files are in this repo. For text-only use, pick a *-Q4_K_M.gguf. For image+text, load both the text GGUF and meetara-vl-qwen3.5-0.8b.gguf together (see Vision / image-to-text section below).


Prompt format (recommended)

The model uses a Qwenโ€‘style chat template. A simple, robust pattern is:

<|im_start|>system
You are meeTARA, an emotionally intelligent AI assistant built on top of a Qwen3.5โ€‘0.8Bโ€‘Base base model. Always answer clearly, kindly, and with practical steps the user can take.
<|im_end|>
<|im_start|>user
{user_message}
<|im_end|>
<|im_start|>assistant

Example:

<|im_start|>system
You are meeTARA, an emotionally intelligent AI assistant built on top of a Qwen3.5โ€‘0.8Bโ€‘Base base model. Always answer clearly, kindly, and with practical steps the user can take.
<|im_end|>
<|im_start|>user
How can I improve my sleep quality and manage stress naturally?
<|im_end|>
<|im_start|>assistant

Requirements & runtimes (Qwen 3.5)

This GGUF uses the qwen35 architecture. You need a runtime that supports it:

Runtime Notes
llama.cpp Use a recent build. If you see unknown model architecture 'qwen35' or missing tensor 'output_norm.weight', update the repo and rebuild llama-cli / llama-quantize / llama-simple-chat.
llama-cpp-python Build against a recent llama.cpp (with qwen35 support). Use chat_format="chatml" for chat.
Ollama / text-generation-webui Use a backend that supports qwen35 (e.g. llama.cpp-based).

Chat format: This model expects ChatML (<|im_start|>, <|im_end|>). With llama.cpp use conversation mode and the chatml template:

# Interactive chat (recommended)
./llama-simple-chat -m /path/to/meetara-qwen3.5-0.8b-Q4_K_M.gguf -cnv --chat-template chatml

# One-shot prompt
./llama-cli -m /path/to/meetara-qwen3.5-0.8b-Q4_K_M.gguf -p "Hello, how can you help me?" -n 256 -cnv --chat-template chatml

Python (llama-cpp-python):

from llama_cpp import Llama
llm = Llama(model_path="meetara-qwen3.5-0.8b-Q4_K_M.gguf", chat_format="chatml")
# Then use create_chat_completion or your preferred API.

Vision / image-to-text (mmproj)

This repo includes meetara-vl-qwen3.5-0.8b.gguf โ€” the multimodal projector (mmproj) that enables image+text input for this model.

How it works:

IMAGE.jpg โ†’ [Vision Encoder] โ†’ [meetara-vl-qwen3.5-0.8b.gguf] โ†’ Text GGUF โ†’ Text answer
                (pixels)         (mmproj)     (generates response)

The VL file is not a language model โ€” it is the bridge that translates image features into token vectors the LLM understands. The meeTARA personality and 20-layer intelligence come from the text GGUF; load both together at runtime.

Usage with llama.cpp (llama-mtmd-cli):

# Download both files first
huggingface-cli download meetara-lab/meetara-qwen3.5-0.8b-gguf \
  --include "meetara-qwen3.5-0.8b-Q4_K_M.gguf" "meetara-vl-qwen3.5-0.8b.gguf" --local-dir .

# Run image+text inference
./llama-mtmd-cli \
  -m meetara-qwen3.5-0.8b-Q4_K_M.gguf \
  --mmproj meetara-vl-qwen3.5-0.8b.gguf \
  --image /path/to/your/image.jpg \
  -p "Describe this image."

Using llama-server (OpenAI-compatible API):

./llama-server \
  -m meetara-qwen3.5-0.8b-Q4_K_M.gguf \
  --mmproj meetara-vl-qwen3.5-0.8b.gguf \
  --host 0.0.0.0 --port 8080

Note: Vision support requires a recent llama.cpp build (mid-2025 or later) with llama-mtmd-cli available. For text-only use, the VL file is not needed.


Example usage (llama.cpp)

Basic interactive chat

./llama-simple-chat -m /path/to/meetara-qwen3.5-0.8b-gguf-Q4_K_M.gguf

With explicit system prompt

./llama-cli \
  -m /path/to/meetara-qwen3.5-0.8b-gguf-Q4_K_M.gguf \
  -p "<|im_start|>system You are meeTARA, an emotionally intelligent AI assistant built on top of a Qwen3.5โ€‘0.8Bโ€‘Base base model. Always answer clearly, kindly, and with practical steps the user can take. <|im_end|> <|im_start|>user How can I improve my sleep quality and manage stress naturally? <|im_end|> <|im_start|>assistant"

Adjust flags like -n (max tokens), --temperature, --top_p, --top_k, etc. according to your hardware and latency/quality tradeโ€‘offs.


Downloading via huggingface-cli

pip install -U "huggingface_hub[cli]"

huggingface-cli download \
  meetara-qwen3.5-0.8b-gguf \
  --include "meetara-qwen3.5-0.8b-gguf-Q4_K_M.gguf" \
  --local-dir .

This will download the Q4_K_M quantization into the current directory.


๐Ÿง  Standalone Intelligence (20-Layer Detection System)

This GGUF model includes baked-in intelligence that works without any backend code. The model automatically detects domains, emotions, intent, and context through a 20-layer detection system:

Intelligence Layers

Layer Feature Description
1 ๐Ÿšจ Refusal Patterns Safety-first harmful request detection
2 ๐Ÿงฉ Contextual Patterns Multi-word phrase disambiguation (python code vs snake)
3 ๐Ÿ“Š N-gram Patterns Bigram/trigram detection for better context
4 ๐Ÿ”— Semantic Clusters Related keyword groups boost domain confidence
5 ๐Ÿ‘ค Entity Patterns Personal context, time-sensitive, beginner/expert
6 ๐ŸŽฏ Intent Signals What user wants: learn, fix, decide, create, validate
7 ๐Ÿ’™ Emotion Blending Detects co-occurring emotions, blends composite hints
8 ๐ŸŽญ Tone Detection Mirrors user style: casual, formal, technical
9 โ“ Question Type Adapts format: yes/no, how-to, comparison
10 ๐Ÿ“ Response Length Concise/standard/detailed based on signals
11 โš–๏ธ Weighted Domain Score High/medium/low keyword weights + negative penalties
12 ๐Ÿ† Score-Based Selection Highest score wins; priority only for ties
13 ๐Ÿ”„ Conversation Memory Multi-turn depth tracking + domain shift awareness
14 โš ๏ธ Safety Disclaimers Auto-adds warnings for healthcare, legal, crisis
15 ๐Ÿ‘‹ Greeting/Closing Natural conversation flow, domain-specific
16 ๐Ÿ“ Structured Responses Contextual structure (adapt sections to question)
17 ๐ŸŽ“ Persona Calibration Adapts expertise level: beginner/intermediate/expert
18 ๐ŸŒ Language Detection Detects non-English cues, responds in user's language
19 ๐Ÿ“Š Confidence Scoring Multi-signal confidence (keywords + clusters + ngrams)
20 ๐Ÿ›ก๏ธ Negative Keywords Penalizes false-positive cross-domain keywords

How It Works

When a user sends a message, the chat template (baked into the GGUF) processes through these 20 layers automatically:

  1. Safety Check: Refusal patterns detect harmful requests first
  2. Context Analysis: Multi-word phrases, n-grams, and semantic clusters provide context
  3. User Understanding: Entity patterns, intent signals, emotion blending, and persona/language detection understand the user
  4. Response Adaptation: Tone, question type, and length control adapt the response style
  5. Domain Selection: Weighted keyword scoring with negative keyword penalties; highest score wins
  6. Output Format: Contextual structure (2โ€“5 sections when helpful; direct answer for simple questions) with appropriate greetings/closings

Result: The model responds intelligently, empathetically, and contextually without requiring backend code.


Intended behavior / meeTARA flavor

Compared to the raw Qwen/Qwen3.5-0.8B model, this quantization includes:

  • Standalone Intelligence: Works without backend - all intelligence baked into the GGUF
  • 18 Domain Categories: Auto-detects healthcare, technology, business, education, and 14 more
  • Emotional Intelligence: Emotion blending and persona calibration; detects worried, frustrated, urgent, etc.
  • Context Awareness: Conversation memory and follow-up detection; understands multi-turn flow
  • Structured Responses: Contextual structure (2โ€“5 sections when helpful; direct answer for simple questions)
  • Safety Features: Built-in refusal patterns and safety disclaimers for sensitive topics
  • Warm, Supportive Tone: Responds with empathy while being precise and practical

The model is fully standalone - download and use with llama.cpp, Ollama, or any GGUF-compatible runtime. No additional backend code required.


๐Ÿ“š Usage Examples

Example 1: Healthcare Domain Detection

Input:

I've been having headaches for the past week. What could be causing this?

What Happens:

  • Layer 1: Safety check passes (not harmful)
  • Layer 4: Semantic cluster "pain_symptoms" detected โ†’ healthcare boost
  • Layer 7: Emotion detected: "worried" (health concern)
  • Layer 11: Domain detected: Healthcare (high confidence)
  • Layer 14: Safety disclaimer added (healthcare topic)
  • Layer 16: Contextual structure with empathetic opening (e.g. 2โ€“5 sections when helpful)

Expected Response: Empathetic opening, clear answer and key details, practical steps, and safety disclaimer. Structure adapts to question complexity (simpler questions get a more direct answer).


Example 2: Technology Domain with Context Awareness

Input:

How do I fix a Python error in my code?

What Happens:

  • Layer 2: Contextual pattern "python code" detected โ†’ technology domain (not snake)
  • Layer 6: Intent detected: FIX โ†’ systematic troubleshooting approach
  • Layer 9: Question type: troubleshooting โ†’ step-by-step format
  • Layer 11: Domain detected: Technology (high confidence)
  • Layer 16: Contextual structure with technical steps

Expected Response:

  • Technical, step-by-step troubleshooting format
  • Code examples and debugging tips
  • Practical solutions prioritized

Example 3: Emotional Intelligence Detection

Input:

I'm so frustrated with my job search. Nothing seems to work.

What Happens:

  • Layer 7: Emotion detected: frustrated โ†’ empathetic, supportive tone
  • Layer 8: Tone detected: distressed โ†’ warm, encouraging response
  • Layer 6: Intent detected: VENT โ†’ supportive, validating response
  • Layer 11: Domain detected: Career/Professional (medium confidence)
  • Layer 16: Response starts with emotional acknowledgment

Expected Response:

  • Opens with empathy: "I understand how frustrating this can be..."
  • Validates feelings before providing advice
  • Practical, actionable steps to improve situation
  • Encouraging, supportive tone throughout

Example 4: Multi-Domain with Priority

Input:

My friend is showing signs of depression. How can I help them?

What Happens:

  • Layer 1: Safety check passes
  • Layer 5: Entity pattern: third_party (helping someone else)
  • Layer 7: Emotion detected: worried (concern for friend)
  • Layer 11: Domain detected: Healthcare (mental health) + Psychology/Wellness
  • Layer 12: Domain priority: Healthcare wins (safety-critical)
  • Layer 14: Safety disclaimer added (mental health topic)
  • Layer 15: Greeting acknowledges the caring nature of the question

Expected Response:

  • Healthcare domain expertise applied
  • Safety disclaimers about professional help
  • Practical steps for supporting someone with depression
  • Emphasis on professional mental health resources

Example 5: Follow-up Context Awareness

Conversation:

User: What are the symptoms of anxiety?
Assistant: [Provides structured response about anxiety symptoms]
User: What about panic attacks?

What Happens:

  • Layer 13: Context awareness detects follow-up question
  • Previous domain (Healthcare) is considered
  • "panic attacks" โ†’ healthcare domain confirmed
  • Response builds on previous conversation context
  • No need to repeat general information

Expected Response:

  • References previous conversation about anxiety
  • Explains relationship between anxiety and panic attacks
  • Builds on context naturally

Example 6: Simple vs Complex Question Adaptation

Simple Question:

What is photosynthesis?

What Happens:

  • Layer 10: Response length: concise (factual question)
  • Layer 9: Question type: what-is โ†’ definition format
  • Layer 11: Domain: Education/Science
  • Layer 16: Simplified structure (less detail needed)

Complex Question:

How does quantum computing work and what are its practical applications?

What Happens:

  • Layer 10: Response length: detailed (complex topic)
  • Layer 9: Question type: how-to + what-is โ†’ comprehensive format
  • Layer 11: Domain: Technology + Science
  • Layer 16: Full structured response with deep analysis

๐Ÿ’ก Tips for Best Results

  1. Be Specific: More context helps the model detect the right domain

    • โœ… "I'm worried about my chest pain" โ†’ Healthcare + Emotion detected
    • โŒ "Tell me about pain" โ†’ Less specific, lower confidence
  2. Natural Language: The model understands conversational language

    • โœ… "How do I fix this bug in my Python code?"
    • โœ… "I'm frustrated with this error"
  3. Follow-ups Work: The model remembers context within a conversation

    • Ask follow-up questions naturally - the model will understand
  4. Emotional Cues: Expressing emotions helps the model respond empathetically

    • "I'm worried about..." โ†’ Empathetic response
    • "I'm excited to learn..." โ†’ Encouraging response

Credits

  • Base model and original training: Qwen/Qwen3.5-0.8B by Alibaba Cloud's Tongyi Lab.
  • Quantization and meeTARA integration: meetaraโ€‘lab.

If you use this GGUF in your work, please also cite the original Qwen3.5 paper/model in addition to this repository.

Downloads last month
154
GGUF
Model size
0.8B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for meetara-lab/meetara-qwen3.5-0.8b-gguf

Quantized
(96)
this model

Space using meetara-lab/meetara-qwen3.5-0.8b-gguf 1