# Chatbox2 - Qwen3-14B Update

## Summary of Changes

Your chatbox has been successfully upgraded to use **Qwen3-14B** with thinking/non-thinking mode capabilities!

## What Changed

### 1. **Model Upgrade**
- **Old Model**: `anaspro/Shako-iraqi-4B-it` (multimodal)
- **New Model**: `Qwen/Qwen3-14B` (text-only with thinking capabilities)

### 2. **New Features**

#### **Thinking Mode Toggle** 🤔
You can now switch between two modes:

- **Thinking Mode ON** (default):
  - Best for: Math problems, coding, complex reasoning
  - The model shows its reasoning process in `<think>...</think>` tags
  - Uses Temperature=0.6, TopP=0.95, TopK=20
  - More detailed and thorough responses

- **Thinking Mode OFF**:
  - Best for: General conversation, quick responses
  - Faster responses without showing reasoning
  - Uses Temperature=0.7, TopP=0.8, TopK=20
  - More efficient for casual chat

### 3. **Updated Parameters**
- Max tokens increased from 2048 to 32768 (matching Qwen3's capabilities)
- Optimized generation parameters based on mode
- Removed multimodal support (images/videos) as Qwen3-14B is text-only

### 4. **UI Improvements**
- Added checkbox to toggle thinking mode
- Updated title and description
- New examples showcasing both modes

## How to Use

### Basic Usage
1. Type your message in the textbox
2. Adjust settings in the sidebar:
   - **System Prompt**: Customize the AI's behavior (default: Iraqi dialect)
   - **Max New Tokens**: Control response length (100-32768)
   - **Enable Thinking Mode**: Toggle between thinking/non-thinking

### When to Use Thinking Mode

✅ **Enable Thinking Mode for:**
- Math problems
- Coding challenges
- Complex logical reasoning
- Step-by-step explanations
- Problem-solving tasks

❌ **Disable Thinking Mode for:**
- General conversation
- Quick questions
- Creative writing
- Casual chat
- When you need faster responses

### Advanced: Soft Switching with `/think` and `/no_think`

When **Enable Thinking Mode** checkbox is ON, you can dynamically control thinking behavior per message using soft switches:

- Add `/think` to your message to **force thinking** for that specific turn
- Add `/no_think` to your message to **skip thinking** for that specific turn

**Important Notes:**
- Soft switches only work when the "Enable Thinking Mode" checkbox is checked (ON)
- When using `/no_think`, the model still outputs `<think>...</think>` tags, but they will be empty
- The model follows the most recent instruction in multi-turn conversations
- You can add the switch anywhere in your message (beginning or end)

**Examples:**

```
User: What is the capital of France? /no_think
Bot: 💬 Response: Paris is the capital of France.
```

```
User: Solve this complex equation: x^3 + 2x^2 - 5x + 1 = 0 /think
Bot: 🤔 Thinking Process: Let me approach this step by step...
     💬 Response: The solutions are approximately...
```

```
User: How many r's in strawberry? /think
Bot: 🤔 Thinking Process: Let me count each letter: s-t-r-a-w-b-e-r-r-y...
     💬 Response: There are 3 r's in "strawberry".

User: What about blueberry? /no_think
Bot: 💬 Response: There are 2 r's in "blueberry".

User: Really? /think
Bot: 🤔 Thinking Process: Let me recount: b-l-u-e-b-e-r-r-y...
     💬 Response: Yes, there are 2 r's in "blueberry" (positions 7 and 8).
```

**When Soft Switches Don't Work:**
- If "Enable Thinking Mode" checkbox is OFF, soft switches are ignored
- The model will not generate any `<think>` tags regardless of `/think` or `/no_think` in your message

## Technical Details

### Dependencies Updated
- `transformers>=4.51.0` (required for Qwen3 support)
- Removed: `av`, `timm`, `gTTS` (no longer needed)

### Model Configuration
```python
model_id = "Qwen/Qwen3-14B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.bfloat16
)
```

### Generation Parameters

**Thinking Mode:**
- Temperature: 0.6
- Top-P: 0.95
- Top-K: 20
- Min-P: 0.0

**Non-Thinking Mode:**
- Temperature: 0.7
- Top-P: 0.8
- Top-K: 20
- Min-P: 0.0

## Running the Application

```bash
python app.py
```

The app will launch on `http://localhost:7860` by default.

## Notes

1. **Text-Only**: Qwen3-14B doesn't support images, videos, or audio. The multimodal features have been removed.

2. **Context Length**: The model supports up to 32,768 tokens natively. For longer contexts (up to 131,072), you can enable YaRN scaling (see Qwen3 documentation).

3. **Iraqi Dialect**: The default system prompt is configured for Iraqi Arabic dialect. You can modify this in the System Prompt field.

4. **GPU Requirements**: Qwen3-14B requires significant GPU memory. Make sure you have adequate resources.

## Reference

For more information about Qwen3-14B capabilities, visit:
- Model Page: https://huggingface.co/Qwen/Qwen3-14B
- Documentation: https://qwenlm.github.io/blog/qwen3/

## Troubleshooting

**Issue**: `KeyError: 'qwen3'`
**Solution**: Make sure you have `transformers>=4.51.0` installed

**Issue**: Out of memory errors
**Solution**: Reduce `max_new_tokens` or use a smaller batch size

**Issue**: Slow responses
**Solution**: Disable thinking mode for faster generation