# Chatbox2 - Qwen3-14B Update ## Summary of Changes Your chatbox has been successfully upgraded to use **Qwen3-14B** with thinking/non-thinking mode capabilities! ## What Changed ### 1. **Model Upgrade** - **Old Model**: `anaspro/Shako-iraqi-4B-it` (multimodal) - **New Model**: `Qwen/Qwen3-14B` (text-only with thinking capabilities) ### 2. **New Features** #### **Thinking Mode Toggle** 🤔 You can now switch between two modes: - **Thinking Mode ON** (default): - Best for: Math problems, coding, complex reasoning - The model shows its reasoning process in `...` tags - Uses Temperature=0.6, TopP=0.95, TopK=20 - More detailed and thorough responses - **Thinking Mode OFF**: - Best for: General conversation, quick responses - Faster responses without showing reasoning - Uses Temperature=0.7, TopP=0.8, TopK=20 - More efficient for casual chat ### 3. **Updated Parameters** - Max tokens increased from 2048 to 32768 (matching Qwen3's capabilities) - Optimized generation parameters based on mode - Removed multimodal support (images/videos) as Qwen3-14B is text-only ### 4. **UI Improvements** - Added checkbox to toggle thinking mode - Updated title and description - New examples showcasing both modes ## How to Use ### Basic Usage 1. Type your message in the textbox 2. Adjust settings in the sidebar: - **System Prompt**: Customize the AI's behavior (default: Iraqi dialect) - **Max New Tokens**: Control response length (100-32768) - **Enable Thinking Mode**: Toggle between thinking/non-thinking ### When to Use Thinking Mode ✅ **Enable Thinking Mode for:** - Math problems - Coding challenges - Complex logical reasoning - Step-by-step explanations - Problem-solving tasks ❌ **Disable Thinking Mode for:** - General conversation - Quick questions - Creative writing - Casual chat - When you need faster responses ### Advanced: Soft Switching with `/think` and `/no_think` When **Enable Thinking Mode** checkbox is ON, you can dynamically control thinking behavior per message using soft switches: - Add `/think` to your message to **force thinking** for that specific turn - Add `/no_think` to your message to **skip thinking** for that specific turn **Important Notes:** - Soft switches only work when the "Enable Thinking Mode" checkbox is checked (ON) - When using `/no_think`, the model still outputs `...` tags, but they will be empty - The model follows the most recent instruction in multi-turn conversations - You can add the switch anywhere in your message (beginning or end) **Examples:** ``` User: What is the capital of France? /no_think Bot: 💬 Response: Paris is the capital of France. ``` ``` User: Solve this complex equation: x^3 + 2x^2 - 5x + 1 = 0 /think Bot: 🤔 Thinking Process: Let me approach this step by step... 💬 Response: The solutions are approximately... ``` ``` User: How many r's in strawberry? /think Bot: 🤔 Thinking Process: Let me count each letter: s-t-r-a-w-b-e-r-r-y... 💬 Response: There are 3 r's in "strawberry". User: What about blueberry? /no_think Bot: 💬 Response: There are 2 r's in "blueberry". User: Really? /think Bot: 🤔 Thinking Process: Let me recount: b-l-u-e-b-e-r-r-y... 💬 Response: Yes, there are 2 r's in "blueberry" (positions 7 and 8). ``` **When Soft Switches Don't Work:** - If "Enable Thinking Mode" checkbox is OFF, soft switches are ignored - The model will not generate any `` tags regardless of `/think` or `/no_think` in your message ## Technical Details ### Dependencies Updated - `transformers>=4.51.0` (required for Qwen3 support) - Removed: `av`, `timm`, `gTTS` (no longer needed) ### Model Configuration ```python model_id = "Qwen/Qwen3-14B" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, device_map="auto", torch_dtype=torch.bfloat16 ) ``` ### Generation Parameters **Thinking Mode:** - Temperature: 0.6 - Top-P: 0.95 - Top-K: 20 - Min-P: 0.0 **Non-Thinking Mode:** - Temperature: 0.7 - Top-P: 0.8 - Top-K: 20 - Min-P: 0.0 ## Running the Application ```bash python app.py ``` The app will launch on `http://localhost:7860` by default. ## Notes 1. **Text-Only**: Qwen3-14B doesn't support images, videos, or audio. The multimodal features have been removed. 2. **Context Length**: The model supports up to 32,768 tokens natively. For longer contexts (up to 131,072), you can enable YaRN scaling (see Qwen3 documentation). 3. **Iraqi Dialect**: The default system prompt is configured for Iraqi Arabic dialect. You can modify this in the System Prompt field. 4. **GPU Requirements**: Qwen3-14B requires significant GPU memory. Make sure you have adequate resources. ## Reference For more information about Qwen3-14B capabilities, visit: - Model Page: https://huggingface.co/Qwen/Qwen3-14B - Documentation: https://qwenlm.github.io/blog/qwen3/ ## Troubleshooting **Issue**: `KeyError: 'qwen3'` **Solution**: Make sure you have `transformers>=4.51.0` installed **Issue**: Out of memory errors **Solution**: Reduce `max_new_tokens` or use a smaller batch size **Issue**: Slow responses **Solution**: Disable thinking mode for faster generation