Qwen3.5-9B-UD-japanese-imatrix developed by dahara1@webbigdata

Qwen3.5-9B 日本語能力に特化させたGGUFモデル
Qwen3.5-9B GGUF model specializing in Japanese language proficiency.

特徴 / Features

一言で言えば沢山の細かい改善をして出来上がった強力なggufモデルです。
In short, it's a powerful small gguf model with many improvements.

このggufの特徴

コミュニティが過去に発見した不具合を適用して誤作動割合を減らしています
UnslothのDynamic Quantization 2.0形式を採用しています
日本語が大目のキャリブレーションデータを使用しています

Features of this gguf

We've applied bugs previously discovered by the community to reduce the rate of malfunctions.
This model uses Unsloth's Dynamic Quantization 2.0 format.
Use calibration data with a large amount of Japanese text.

動かし方 / How to Run

GPUがなくても動きますが、システムメモリは16GB以上、ディスク容量が6GB以上必要です。
It will run without a GPU, but you will need at least 16GB of system memory and 6GB of disk space.

Linux terminalでの実行

llama.cppを使います。直近でQwen3.5対応のアップデートがいくつかあったため、最新版を使う事をおすすめします。(本件の動作確認はversion: 8007 (098595411)で行っています)
We will be using llama.cpp. Since there have been several recent updates to support Qwen 3.5, we recommend using the latest version. (This issue was confirmed to work with version: 8007 (098595411)).

llama.cppからお使いのハードウェア用のZIPファイルをダウンロードして設定します。
沢山種類があるので迷うかもしれませんが、chatGPTなりGeminiなりCaludeなりに聞いて適切なものを選んでください
Download the zip file for your hardware from llama.cpp and set it up.
There are many options, so you may be confused, but please ask chatGPT, Gemini, or Calude to help you choose the right one.

ダウンロードしたzipを解凍後し、ターミナル、PowerShell、端末から以下のコマンドを打ち込んで起動します
After unzipping the downloaded zip file, run it via Terminal, PowerShell, or the terminal by typing the following command.

Linuxでのターミナルでの実行例です
Here is an example of running the command on Linux terminal:

まずhf commandをインストールしてください
First, please install the hf command.

# モデルのダウンロード / download model
hf download dahara1/Qwen3.5-9B-UD-japanese-imatrix Qwen3.5-9B-UD-Q4_K_XL.gguf --local-dir Qwen3.5-9B-UD-japanese-imatrix
# 念の為jinjaテンプレートのダウンロード / download jinja template
hf download dahara1/Qwen3.5-9B-UD-japanese-imatrix chat_template.jinja --local-dir Qwen3.5-9B-UD-japanese-imatrix

./llama-cli \
  -m Qwen3.5-9B-UD-japanese-imatrix/Qwen3.5-9B-UD-Q4_K_XL.gguf \
  --temp 0.6 \
  --top-p 0.8 \
  --top-k 20 \
  --min-p 0.0 \
  --ctx-size 12000 \
  --presence_penalty 1.5 \
  --jinja \
  --chat-template-kwargs '{"enable_thinking":true}' \
  --chat-template-file Qwen3.5-9B-UD-japanese-imatrix/chat_template.jinja \
  -ub 2048 \
  -b 2048

ctx-sizeが扱える文章の長さです。長くすると複数ターンの長い会話も扱えるようになりますが、必要メモリ量も増えます。
ctx-size specifies the length of text that can be handled. Increasing this value allows for longer conversations with multiple turns, but it also increases the amount of memory required.

GPUをお持ちの方へ(for GPU User)

16GBのGPUメモリがあると比較的快適に動かす事ができます。上記のコマンドに-ngl 99を追加してください
If you have 16GB of GPU memory, it will run relatively smoothly. Add -ngl 99 to the above command.

Windows AMD CPU / iGPU 用の例

AMD Ryzen 9 7940HS w/ Radeon 780M Graphics システムメモリ32GBのミニPC、Vulkanセットアップ済み、GPUには8Gを割り当て済みのPCでのコマンド例
llama.cppはgithubより「Windows x64 (Vulkan)」をダウンロードします
-ngl 99 を付与すれば高速実行することができます

AMD Ryzen 9 7940HS w/ Radeon 780M Graphics Mini PC with 32GB of system memory, Vulkan setuped, and 8GB allocated to the GPU. Download "Windows x64 (Vulkan)" for llama.cpp from github.
You can run it faster by adding -ngl 99.

.\llama-server ^
  -m ..\Qwen3.5-9B-UD-japanese-imatrix\Qwen3.5-9B-UD-Q4_K_XL.gguf ^
  --host 0.0.0.0 ^
  --port 8081 ^
  --top-p 0.8 ^
  --top-k 20 ^
  --min-p 0.0 ^
  --ctx-size 24000 ^
  --presence_penalty 1.5 ^
  --chat-template-kwargs "{\"enable_thinking\":true}" ^
  --chat-template-file ..\Qwen3.5-9B-UD-japanese-imatrix\chat_template.jinja ^
  --jinja ^
  -ub 2048 ^
  -ngl 99 ^
  -b 2048

サンプルスクリプト / sample script

クライアント/サーバー型式でスクリプトでアクセスしたい場合は上記のAMD版のコマンドを参考にしてください
If you want to access it via script in a client/server format, please refer to the AMD version command above.

ブラウザで、モデルを実行しているサーバーのローカルアドレス、ポートを指定して開いて下さい。例(http://127.0.0.1:8081/)
In your browser, open the local address and port of the server running the model. For example, http://127.0.0.1:8081/

client script sample

ツールを利用したスーパーマーケットの案内音声作成AIエージェントのデモです
This is a demo of an AI agent that creates voice guidance for supermarkets using a tool.

音声合成にはwebbigdata/VoiceCoreを使っていますが、お好みの音声合成に差し替え可能です
We are using webbigdata/VoiceCore for speech synthesis, but you can replace it with your preferred speech synthesis software.

VoiceCoreのサンプルスクリプト
 VoiceCore sample script

VoiceCoreをVLLMで以下で動かしている事が前提のデモです
This demo assumes that VoiceCore is running with VLLM as described below.

http://192.168.1.16:8000/v1/completions

※モデルダウンロード時に以下のようなエラーが出る事がありました
*The following error sometimes occurred when downloading the model.

httpx.DecodingError: brotli: decoder process called with data when 'can_accept_more_data()' is False

以下のライブラリアップデートで解決したのでこのエラーが出た際は参考にしてください
The following library update resolved the issue, so please refer to it if you encounter this error.

pip install --upgrade httpx brotli brotlicffi

import os
os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"

import json
import sys
import time
import random
import re
import argparse
import subprocess
import urllib.request
from datetime import datetime
from openai import OpenAI

# ============================================================
#  ツールチェーンデモ — スーパー店内アナウンス生成エージェント
#  店長の指示 → 天気・イベント・ナレッジ等を収集 → 原稿生成 → TTS読み上げ
# ============================================================

client = OpenAI(
    base_url="http://localhost:8081/v1",
    api_key="dummy"
)

# --- TTS設定 ---
TTS_SERVER_URL = "http://192.168.1.16:8000/v1/completions"
TTS_TOKENIZER_PATH = "webbigdata/VoiceCore_smoothquant"
TTS_MODEL_NAME = "VoiceCore_smoothquant"
TTS_SPEAKER = "matsukaze_male[neutral]"

# ============================================================
# 色定義
# ============================================================
class C:
    BOLD      = "\033[1m"
    DIM       = "\033[2m"
    RESET     = "\033[0m"
    CYAN      = "\033[96m"
    YELLOW    = "\033[93m"
    GREEN     = "\033[92m"
    BLUE      = "\033[94m"
    MAGENTA   = "\033[95m"
    THINK     = "\033[38;5;213m"  # 明るいピンクパープル（Think表示用）
    RED       = "\033[91m"
    WHITE     = "\033[97m"
    BG_RED    = "\033[41m"
    BG_GREEN  = "\033[42m"
    BG_BLUE   = "\033[44m"
    BG_MAGENTA = "\033[45m"
    BG_CYAN   = "\033[46m"
    GRAY      = "\033[90m"

# ============================================================
# 店舗ロケーション（4地点からランダムに1つ選択）
# ============================================================
STORE_LOCATIONS = [
    {
        "name": "フレッシュマート 練馬店",
        "area": "東京都練馬区",
        "forecast_code": "130000",
        "nearby_schools": ["練馬区立大泉小学校", "練馬区立大泉中学校"],
    },
    {
        "name": "フレッシュマート 梅田店",
        "area": "大阪府大阪市北区",
        "forecast_code": "270000",
        "nearby_schools": ["大阪市立扇町小学校", "大阪市立天満中学校"],
    },
    {
        "name": "フレッシュマート 博多店",
        "area": "福岡県福岡市博多区",
        "forecast_code": "400000",
        "nearby_schools": ["福岡市立博多小学校", "福岡市立博多中学校"],
    },
    {
        "name": "フレッシュマート 札幌店",
        "area": "北海道札幌市中央区",
        "forecast_code": "016000",
        "nearby_schools": ["札幌市立円山小学校", "札幌市立向陵中学校"],
    },
]

# 起動時に1つ選択
CURRENT_STORE = random.choice(STORE_LOCATIONS)

# ============================================================
# イベントパターン（春 or 秋をランダム選択）
# ============================================================
def _generate_events():
    pattern = random.choice(["spring", "autumn"])
    schools = CURRENT_STORE["nearby_schools"]

    if pattern == "spring":
        return {
            "season": "春",
            "general_events": [
                {"name": "お花見シーズン", "period": "3月下旬〜4月上旬", "note": "公園でのお花見が盛況"},
                {"name": "新生活準備", "period": "3月〜4月", "note": "引越し・一人暮らし開始"},
            ],
            "school_events": [
                {"school": schools[0], "event": "卒業式", "date": "今週水曜日"},
                {"school": schools[0], "event": "入学式", "date": "来週月曜日"},
                {"school": schools[1], "event": "入学式", "date": "来週火曜日"},
            ],
        }
    else:
        return {
            "season": "秋",
            "general_events": [
                {"name": "秋の行楽シーズン", "period": "10月", "note": "ピクニック・ハイキング需要"},
                {"name": "ハロウィン", "period": "10月末", "note": "お菓子・仮装グッズ需要"},
            ],
            "school_events": [
                {"school": schools[0], "event": "運動会", "date": "今週土曜日"},
                {"school": schools[1], "event": "文化祭", "date": "来週金曜日・土曜日"},
            ],
        }

CURRENT_EVENTS = _generate_events()

# ============================================================
# ナレッジDB（ベテラン店長・店員の知見）
# ============================================================
KNOWLEDGE_DB = [
    {
        "keywords": ["運動会", "体育祭", "スポーツ"],
        "content": (
            "【運動会シーズンの売れ筋 — 田中店長の経験則】\n"
            "・スポーツドリンク（2Lペットボトル）が通常の3倍売れる\n"
            "・お弁当用の唐揚げ・ウインナー・卵焼きの材料が前日夕方〜当日朝に集中\n"
            "・観戦用のビール（350ml缶6本パック）、チューハイも好調\n"
            "・レジャーシートや紙皿・紙コップも忘れずに前出し\n"
            "・日焼け止め・虫除けスプレーも意外と出る"
        ),
    },
    {
        "keywords": ["入学式", "卒業式", "入園", "卒園", "新生活", "セレモニー"],
        "content": (
            "【入学・卒業シーズンの売れ筋 — 佐藤副店長の経験則】\n"
            "・お赤飯、紅白まんじゅう、ケーキ材料が伸びる\n"
            "・記念写真の後に家族で食事するパターンが多く、夕方にお寿司や刺身が売れる\n"
            "・お祝い用ののし袋、祝儀袋を目立つ場所に\n"
            "・朝は慌ただしいのでおにぎりやサンドイッチ等の軽食も出る"
        ),
    },
    {
        "keywords": ["花見", "お花見", "桜", "ピクニック"],
        "content": (
            "【お花見シーズンの売れ筋 — 田中店長の経験則】\n"
            "・ビール、チューハイ、ワインなどアルコール類が爆発的に売れる\n"
            "・オードブル、お惣菜の盛り合わせ、寿司パックが人気\n"
            "・使い捨て容器、割り箸、ウェットティッシュ、ゴミ袋のセット売りが効果的\n"
            "・防寒用にカイロもまだ需要あり（夜は冷える）\n"
            "・デザートにいちご大福や団子を推すと反応が良い"
        ),
    },
    {
        "keywords": ["雨", "雨天", "台風", "梅雨", "天気が悪い"],
        "content": (
            "【雨の日の傾向 — 鈴木チーフの経験則】\n"
            "・来客数は2〜3割減るが、客単価は上がる傾向（まとめ買い）\n"
            "・鍋物、シチュー、カレーなど温かい料理の材料が伸びる\n"
            "・傘、カッパを入口付近に配置すると衝動買いされる\n"
            "・お惣菜やお弁当は少し多めに作っても売り切れる（自炊を避ける心理）"
        ),
    },
    {
        "keywords": ["暑い", "猛暑", "真夏", "熱中症"],
        "content": (
            "【猛暑日の傾向 — 田中店長の経験則】\n"
            "・アイス、かき氷、冷やし麺の売上が通常の2倍以上\n"
            "・スポーツドリンク、経口補水液は切らさないこと\n"
            "・冷しゃぶ、サラダ、そうめんつゆのセット提案が効果的\n"
            "・ビール・炭酸飲料の冷蔵在庫を頻繁にチェック"
        ),
    },
    {
        "keywords": ["ハロウィン", "仮装", "お菓子"],
        "content": (
            "【ハロウィンの売れ筋 — 佐藤副店長の経験則】\n"
            "・小分けの個包装お菓子（チョコ、キャンディ）が大量に売れる\n"
            "・かぼちゃ関連商品（まるごとかぼちゃ、かぼちゃプリン材料）\n"
            "・パーティー用のジュース、ポテトチップス、ポップコーン\n"
            "・仮装グッズは早めに展開しないと他店に取られる"
        ),
    },
    {
        "keywords": ["給料日", "月末", "25日"],
        "content": (
            "【給料日前後の傾向 — 鈴木チーフの経験則】\n"
            "・給料日直後はちょっと良い肉（ステーキ用、すき焼き用）が動く\n"
            "・刺身盛り合わせ、寿司パックなどのご褒美系惣菜が伸びる\n"
            "・ビール・ワインなどアルコールもワンランク上のものが売れる\n"
            "・逆に給料日前はもやし、豆腐、卵など節約食材を前面に"
        ),
    },
    {
        "keywords": ["週末", "土曜", "日曜", "休日"],
        "content": (
            "【週末の傾向 — 田中店長の経験則】\n"
            "・家族連れが増えるのでファミリーパック、大容量商品が動く\n"
            "・BBQ・焼肉用の肉、野菜、タレのセット提案が効果的\n"
            "・朝はパン・牛乳がよく出る（平日より遅い時間帯にピーク）\n"
            "・日曜夕方は翌週分のまとめ買い需要"
        ),
    },
]

def search_knowledge(query):
    """ナレッジDBをフリーワード検索"""
    results = []
    query_lower = query.lower()
    for entry in KNOWLEDGE_DB:
        for kw in entry["keywords"]:
            if kw in query_lower or query_lower in kw:
                results.append(entry["content"])
                break
    if not results:
        return json.dumps({
            "query": query,
            "found": False,
            "message": f"「{query}」に関する知見は見つかりませんでした。",
        }, ensure_ascii=False)
    return json.dumps({
        "query": query,
        "found": True,
        "count": len(results),
        "knowledge": "\n\n".join(results),
    }, ensure_ascii=False)


# ============================================================
# ツール実装
# ============================================================
def tool_get_store_info():
    """店舗情報を返す"""
    now = datetime.now()
    # 午前/午後ランダム
    is_morning = random.choice([True, False])
    if is_morning:
        period = "午前"
        hours = "9:00〜13:00"
        peak_note = "午前中のお買い物ピークは10:30〜11:30頃です"
    else:
        period = "午後"
        hours = "13:00〜21:00"
        peak_note = "夕方のお買い物ピークは16:00〜18:00頃です"

    return json.dumps({
        "store_name": CURRENT_STORE["name"],
        "area": CURRENT_STORE["area"],
        "current_period": period,
        "operating_hours": f"本日の営業時間: {hours}",
        "peak_note": peak_note,
        "nearby_schools": CURRENT_STORE["nearby_schools"],
    }, ensure_ascii=False)


def tool_get_current_datetime():
    """現在日時を返す"""
    now = datetime.now()
    weekdays = ["月", "火", "水", "木", "金", "土", "日"]
    wd = weekdays[now.weekday()]
    return json.dumps({
        "datetime": f"{now.year}年{now.month:02d}月{now.day:02d}日({wd}) {now.hour:02d}:{now.minute:02d}",
        "weekday": f"{wd}曜日",
        "is_weekend": now.weekday() >= 5,
        "day_of_month": now.day,
        "is_near_payday": 23 <= now.day <= 27,
    }, ensure_ascii=False)


def tool_get_weather():
    """気象庁の天気概況JSONを取得"""
    code = CURRENT_STORE["forecast_code"]
    area = CURRENT_STORE["area"]
    url = f"https://www.jma.go.jp/bosai/forecast/data/overview_forecast/{code}.json"

    try:
        req = urllib.request.Request(url, headers={"User-Agent": "Mozilla/5.0"})
        with urllib.request.urlopen(req, timeout=10) as resp:
            data = json.loads(resp.read().decode("utf-8"))
        return json.dumps({
            "area": area,
            "reporting_time": data.get("reportDatetime", ""),
            "headline": data.get("headlineText", ""),
            "overview": data.get("text", ""),
        }, ensure_ascii=False)
    except Exception as e:
        return json.dumps({
            "area": area,
            "error": f"天気情報の取得に失敗: {str(e)}",
            "fallback": "天気情報を取得できませんでした。天気に関する言及は省略してください。",
        }, ensure_ascii=False)


def tool_get_events():
    """地域イベント・近隣学校行事を返す"""
    return json.dumps(CURRENT_EVENTS, ensure_ascii=False)


def tool_search_knowledge(query):
    """ベテラン店員のナレッジDBを検索"""
    return search_knowledge(query)


def tool_synthesize_speech(text):
    """VoiceCoreサーバーでTTS合成・再生"""
    print(f"\n    {C.GREEN}🔊 TTS合成開始...{C.RESET}")
    print(f"    {C.DIM}原稿: {text[:80]}...{C.RESET}")

    try:
        import torch
        from transformers import AutoTokenizer
        from snac import SNAC
        import sounddevice as sd
        import queue
        import threading

        # Tokenizer & SNACロード
        print(f"    {C.DIM}   Tokenizer/SNACモデルをロード中...{C.RESET}")
        tts_tokenizer = AutoTokenizer.from_pretrained(TTS_TOKENIZER_PATH)
        snac_model = SNAC.from_pretrained("hubertsiuzdak/snac_24khz").to("cpu")

        start_token, end_tokens = [128259], [128009, 128260, 128261]
        audio_start_token = 128257

        prompt_ = (f"{TTS_SPEAKER}: " + text) if TTS_SPEAKER else text
        input_ids = tts_tokenizer.encode(prompt_)
        final_token_ids = start_token + input_ids + end_tokens

        payload = {
            "model": TTS_MODEL_NAME, "prompt": final_token_ids,
            "max_tokens": 8192, "temperature": 0.6, "top_p": 0.90,
            "repetition_penalty": 1.1, "stop_token_ids": [128258],
            "stream": True,
        }

        # SNACデコーダー
        def redistribute_codes(code_list):
            if len(code_list) % 7 != 0:
                return torch.tensor([])
            layer_1, layer_2, layer_3 = [], [], []
            for i in range(len(code_list) // 7):
                layer_1.append(code_list[7*i])
                layer_2.append(code_list[7*i+1] - 4096)
                layer_3.append(code_list[7*i+2] - (2*4096))
                layer_3.append(code_list[7*i+3] - (3*4096))
                layer_2.append(code_list[7*i+4] - (4*4096))
                layer_3.append(code_list[7*i+5] - (5*4096))
                layer_3.append(code_list[7*i+6] - (6*4096))
            codes = [torch.tensor(layer).unsqueeze(0) for layer in [layer_1, layer_2, layer_3]]
            return snac_model.decode(codes)

        # 音声再生ワーカー
        audio_queue = queue.Queue()
        def audio_playback_worker(q, stream):
            while True:
                data = q.get()
                if data is None:
                    break
                stream.write(data)

        playback_stream = sd.OutputStream(samplerate=24000, channels=1, dtype='float32')
        playback_stream.start()
        playback_thread = threading.Thread(target=audio_playback_worker, args=(audio_queue, playback_stream))
        playback_thread.start()

        token_buffer = []
        found_audio_start = False
        CHUNK_SIZE = 28

        import requests as req_lib
        print(f"    {C.DIM}   TTSサーバーにリクエスト送信中...{C.RESET}")
        response = req_lib.post(TTS_SERVER_URL, headers={"Content-Type": "application/json"}, json=payload, stream=True)
        response.raise_for_status()
        print(f"    {C.GREEN}   ▶ 音声再生中...{C.RESET}")

        for line in response.iter_lines():
            if line:
                decoded_line = line.decode('utf-8')
                if decoded_line.startswith('data: '):
                    content = decoded_line[6:]
                    if content == '[DONE]':
                        break
                    chunk = json.loads(content)
                    text_chunk = chunk['choices'][0]['text']
                    if text_chunk:
                        token_buffer.extend(tts_tokenizer.encode(text_chunk, add_special_tokens=False))

                    if not found_audio_start:
                        try:
                            start_index = token_buffer.index(audio_start_token)
                            token_buffer = token_buffer[start_index + 1:]
                            found_audio_start = True
                        except ValueError:
                            continue

                    while len(token_buffer) >= CHUNK_SIZE:
                        tokens_to_process, token_buffer = token_buffer[:CHUNK_SIZE], token_buffer[CHUNK_SIZE:]
                        code_list = [t - 128266 for t in tokens_to_process]
                        samples = redistribute_codes(code_list)
                        if samples.numel() > 0:
                            audio_queue.put(samples.detach().squeeze().numpy())

        # 残りバッファ処理
        if found_audio_start and token_buffer:
            remaining = (len(token_buffer) // 7) * 7
            if remaining > 0:
                code_list = [t - 128266 for t in token_buffer[:remaining]]
                samples = redistribute_codes(code_list)
                if samples.numel() > 0:
                    audio_queue.put(samples.detach().squeeze().numpy())

        audio_queue.put(None)
        playback_thread.join()
        playback_stream.stop()
        playback_stream.close()

        return json.dumps({
            "status": "success",
            "message": "音声の再生が完了しました。",
        }, ensure_ascii=False)

    except Exception as e:
        return json.dumps({
            "status": "error",
            "message": f"TTS処理でエラーが発生しました: {str(e)}",
        }, ensure_ascii=False)


# ============================================================
# ツールディスパッチ
# ============================================================
def execute_tool(func_name, args):
    if func_name == "get_store_info":
        return tool_get_store_info()
    elif func_name == "get_current_datetime":
        return tool_get_current_datetime()
    elif func_name == "get_weather":
        return tool_get_weather()
    elif func_name == "get_events":
        return tool_get_events()
    elif func_name == "search_knowledge":
        return tool_search_knowledge(args.get("query", ""))
    elif func_name == "synthesize_speech":
        return tool_synthesize_speech(args.get("text", ""))
    else:
        return json.dumps({"error": f"Unknown tool: {func_name}"}, ensure_ascii=False)


# ============================================================
# ツール定義（LLMに渡す）
# ============================================================
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_store_info",
            "description": "店舗情報（店舗名、所在地、営業時間、近隣学校名など）を取得します。",
            "parameters": {
                "type": "object",
                "properties": {},
                "required": [],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "get_current_datetime",
            "description": "現在の日時、曜日、給料日付近かどうかなどの情報を取得します。",
            "parameters": {
                "type": "object",
                "properties": {},
                "required": [],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "店舗所在地の天気概況（天気予報テキスト）を気象庁から取得します。",
            "parameters": {
                "type": "object",
                "properties": {},
                "required": [],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "get_events",
            "description": (
                "地域の一般的なイベント情報（お花見、ハロウィン等の季節イベント）と、"
                "近隣の学校行事（運動会、入学式、卒業式等）の情報を取得します。"
            ),
            "parameters": {
                "type": "object",
                "properties": {},
                "required": [],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "search_knowledge",
            "description": (
                "ベテラン店長・店員の経験に基づくナレッジDBを検索します。"
                "イベント名、天気、季節などのキーワードで検索すると、"
                "過去の販売傾向や売れ筋商品の知見が得られます。"
                "複数のキーワードで個別に検索すると、より多くの知見が得られます。"
            ),
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "検索キーワード（例: 運動会, 雨, お花見, 給料日）",
                    },
                },
                "required": ["query"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "synthesize_speech",
            "description": (
                "完成した店内アナウンス原稿を音声合成（TTS）で読み上げます。"
                "全ての情報収集と原稿作成が完了した後に、最終的な原稿テキストを渡して呼び出してください。"
            ),
            "parameters": {
                "type": "object",
                "properties": {
                    "text": {
                        "type": "string",
                        "description": "読み上げる店内アナウンス原稿のテキスト",
                    },
                },
                "required": ["text"],
            },
        },
    },
]


# ============================================================
# ストリーミング表示
# ============================================================
# ============================================================
# フォールバック: thinking内 <tool_call> のパーサー
# ============================================================
_TOOL_CALL_ID_COUNTER = 0

def _parse_tool_calls_from_text(text):
    """
    LLMがthinking内に出力した <tool_call>...</tool_call> をパースして
    tool_calls_list 互換の辞書リストに変換する。
    
    対応フォーマット:
      <tool_call>
      <function=func_name>
      <parameter=key>value</parameter>
      ...
      </function>
      </tool_call>
    """
    global _TOOL_CALL_ID_COUNTER
    results = []

    # <tool_call>...</tool_call> ブロックを全て抽出
    blocks = re.findall(r'<tool_call>(.*?)</tool_call>', text, re.DOTALL)
    if not blocks:
        return results

    for block in blocks:
        # 関数名を抽出
        func_match = re.search(r'<function=(\w+)>', block)
        if not func_match:
            continue
        func_name = func_match.group(1)

        # パラメータを抽出
        params = {}
        param_matches = re.findall(r'<parameter=(\w+)>\s*(.*?)\s*</parameter>', block, re.DOTALL)
        for key, value in param_matches:
            value = value.strip()
            # 数値・真偽値の変換
            if value.lower() == 'true':
                params[key] = True
            elif value.lower() == 'false':
                params[key] = False
            else:
                try:
                    params[key] = int(value)
                except ValueError:
                    try:
                        params[key] = float(value)
                    except ValueError:
                        params[key] = value

        _TOOL_CALL_ID_COUNTER += 1
        results.append({
            "id": f"fallback_{_TOOL_CALL_ID_COUNTER}",
            "name": func_name,
            "arguments": json.dumps(params, ensure_ascii=False),
        })

    return results


# ============================================================
# ストリーミング表示
# ============================================================
def stream_response(messages, debug=False, silent=False):
    if debug and not silent:
        print(f"\n  {C.YELLOW}⏳ LLM呼び出し中 (streaming)...{C.RESET}")

    try:
        stream = client.chat.completions.create(
            model="qwen3.5",
            messages=messages,
            tools=tools,
            tool_choice="auto",
            temperature=0.8,
            stream=True,
        )
    except Exception as e:
        if not silent:
            print(f"\n  {C.BG_RED}{C.WHITE} ❌ API ERROR {C.RESET}")
            print(f"  {C.RED}{type(e).__name__}: {e}{C.RESET}")
        return None, None, None, "error"

    full_content = ""
    full_reasoning = ""
    tool_calls_map = {}
    finish_reason = None
    in_reasoning = False
    in_content = False

    for chunk in stream:
        delta = chunk.choices[0].delta if chunk.choices else None
        if not delta:
            continue
        if chunk.choices[0].finish_reason:
            finish_reason = chunk.choices[0].finish_reason

        reasoning_text = getattr(delta, "reasoning_content", None)
        if reasoning_text:
            full_reasoning += reasoning_text
            if not silent:
                if not in_reasoning:
                    in_reasoning = True
                    print(f"\n  {C.THINK}💭 <think>{C.RESET}")
                    print(f"  {C.THINK}", end="", flush=True)
                print(f"{C.THINK}{reasoning_text}{C.RESET}", end="", flush=True)
            else:
                in_reasoning = True

        if delta.content:
            text = delta.content
            full_content += text
            if not silent:
                if in_reasoning:
                    in_reasoning = False
                    print(f"{C.RESET}")
                    print(f"  {C.THINK}💭 </think>{C.RESET}")
                if not in_content:
                    in_content = True
                    print(f"\n  {C.WHITE}💬 ", end="", flush=True)
                print(f"{C.WHITE}{text}{C.RESET}", end="", flush=True)
            else:
                in_reasoning = False
                in_content = True

        if delta.tool_calls:
            for tc_delta in delta.tool_calls:
                idx = tc_delta.index
                if idx not in tool_calls_map:
                    tool_calls_map[idx] = {"id": tc_delta.id or "", "name": "", "arguments": ""}
                if tc_delta.id:
                    tool_calls_map[idx]["id"] = tc_delta.id
                if tc_delta.function:
                    if tc_delta.function.name:
                        tool_calls_map[idx]["name"] = tc_delta.function.name
                    if tc_delta.function.arguments:
                        tool_calls_map[idx]["arguments"] += tc_delta.function.arguments

    if not silent:
        if in_reasoning:
            print(f"{C.RESET}")
            print(f"  {C.THINK}💭 </think>{C.RESET}")
        if in_reasoning or in_content:
            print(f"{C.RESET}")

    tool_calls_list = [tool_calls_map[idx] for idx in sorted(tool_calls_map.keys())]

    # ─── フォールバック: thinking/content内の <tool_call> を自前パース ───
    if not tool_calls_list:
        raw_text = (full_reasoning or "") + (full_content or "")
        parsed = _parse_tool_calls_from_text(raw_text)
        if parsed:
            if not silent:
                print(f"\n  {C.YELLOW}⚠ LLMがthinking内にtool_callを出力 → フォールバックパース ({len(parsed)}件){C.RESET}")
            tool_calls_list = parsed
            finish_reason = "tool_calls"
            # tool_call部分をcontentから除去（履歴汚染を防ぐ）
            full_content = re.sub(
                r'<tool_call>.*?</tool_call>', '', full_content or '', flags=re.DOTALL
            ).strip()
            full_reasoning = re.sub(
                r'<tool_call>.*?</tool_call>', '', full_reasoning or '', flags=re.DOTALL
            ).strip()

    return full_content, full_reasoning, tool_calls_list, finish_reason


# ============================================================
# デバッグ用
# ============================================================
def dump_messages_summary(messages):
    print(f"\n  {C.DIM}{'─'*50}{C.RESET}")
    print(f"  {C.DIM}📋 メッセージ履歴: {len(messages)} 件{C.RESET}")
    for i, msg in enumerate(messages):
        role = msg.get("role", "?")
        content = msg.get("content", "")
        content_len = len(content) if isinstance(content, str) else 0
        has_tc = "tool_calls" in msg
        name = msg.get("name", "")

        if role == "system":
            print(f"  {C.DIM}  [{i}] system: ({content_len}文字){C.RESET}")
        elif role == "user":
            preview = (content[:40] + "...") if content_len > 40 else content
            print(f"  {C.DIM}  [{i}] user: \"{preview}\" ({content_len}文字){C.RESET}")
        elif role == "assistant":
            tc_info = ""
            if has_tc:
                tc_names = [tc.get("function", {}).get("name", "?") for tc in msg["tool_calls"]]
                tc_info = f" + tool_calls: [{', '.join(tc_names)}]"
            print(f"  {C.DIM}  [{i}] assistant: ({content_len}文字){tc_info}{C.RESET}")
        elif role == "tool":
            print(f"  {C.DIM}  [{i}] tool({name}): ({content_len}文字){C.RESET}")
    print(f"  {C.DIM}{'─'*50}{C.RESET}")


# ============================================================
# 店長入力のサンプル（デモ用）
# ============================================================
SAMPLE_INPUTS = [
    "今日の特売のノルウェー産サーモンはもう売り切れた。週末にもう一度セールするから予告して。あと、国産鶏もも肉がまだ大量に残ってるから強めに推して。",
    "午後から雨が降りそうだから、鍋物セットを推したい。あと白菜が入荷しすぎたので半額にする。",
    "明日が近所の小学校の運動会だから、お弁当材料をアピールして。唐揚げ用の鶏肉は今日中なら2割引にする。",
]


# ============================================================
# メインループ
# ============================================================
def main():
    parser = argparse.ArgumentParser(description="スーパー店内アナウンス生成デモ")
    parser.add_argument("--debug", action="store_true", help="デバッグ情報を表示")
    parser.add_argument("--sample", action="store_true", help="サンプル入力を使用")
    args = parser.parse_args()
    debug = args.debug

    print(f"\n{C.BOLD}{C.CYAN}{'='*62}{C.RESET}")
    print(f"{C.BOLD}{C.CYAN}  🏪 ツールチェーンデモ — スーパー店内アナウンス生成{C.RESET}")
    if debug:
        print(f"{C.BOLD}{C.YELLOW}  🔍 デバッグモード ON{C.RESET}")
    print(f"{C.BOLD}{C.CYAN}{'='*62}{C.RESET}")
    print(f"\n  {C.DIM}🏬 店舗: {CURRENT_STORE['name']} ({CURRENT_STORE['area']}){C.RESET}")
    print(f"  {C.DIM}📅 イベントパターン: {CURRENT_EVENTS['season']}{C.RESET}")
    print(f"  {C.DIM}🌤  天気地域コード: {CURRENT_STORE['forecast_code']}{C.RESET}\n")

    # 店長入力
    if args.sample:
        manager_input = random.choice(SAMPLE_INPUTS)
        print(f"  {C.BLUE}👤 店長（サンプル入力）:{C.RESET}")
        print(f"  {C.BLUE}  「{manager_input}」{C.RESET}\n")
    else:
        print(f"  {C.BLUE}👤 店長からの指示を入力してください:{C.RESET}")
        print(f"  {C.DIM}   例: 特売のサーモンは売り切れた。鶏もも肉がまだ残ってるので推して。{C.RESET}")
        manager_input = input(f"  {C.BLUE}> {C.RESET}")
        if not manager_input.strip():
            print(f"  {C.RED}入力が空です。終了します。{C.RESET}")
            return

    # システムプロンプト
    system_prompt = (
        "あなたはスーパーマーケットの店内アナウンス原稿を作成するAIアシスタントです。\n"
        "店長からの指示に基づいて、魅力的で効果的な店内放送用の原稿を作成してください。\n\n"
        "【手順】\n"
        "1. まず get_store_info, get_current_datetime, get_weather, get_events を呼び出して基本情報を収集する\n"
        "   - これらは並列で呼び出してください\n"
        "2. イベント情報や天気情報を元に、search_knowledge で関連するベテラン店員の知見を検索する\n"
        "   - 例: 運動会が近ければ「運動会」で検索、雨なら「雨」で検索など\n"
        "   - 複数のキーワードが考えられる場合は、それぞれ別々に検索してください\n"
        "3. 収集した全ての情報と店長の指示を総合して、店内アナウンス原稿を作成する\n"
        "4. 最後に synthesize_speech ツールで原稿を音声に変換して店内放送する\n\n"
        "【原稿作成のルール】\n"
        "- 明るく親しみやすいトーンで\n"
        "- 特売情報は具体的な商品名と価格/割引率を含める\n"
        "- 天気やイベントに関連した提案を自然に織り込む\n"
        "- 長すぎない（150〜300文字程度）\n"
        "- 「本日は〜」「いらっしゃいませ」などの定型的な書き出しでOK\n"
        "- ナレッジDBの知見を活用して、売れ筋商品の提案を盛り込む\n\n"
        "【重要】\n"
        "- 全てのツールを使って情報を集めてから原稿を作成してください\n"
        "- 原稿が完成したら必ず synthesize_speech で読み上げてください\n"
    )

    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": f"【店長からの指示】\n{manager_input}"},
    ]

    step = 0
    max_steps = 15

    while step < max_steps:
        step += 1
        print(f"\n{C.BOLD}{C.WHITE}{C.BG_BLUE} STEP {step} {C.RESET}")

        if debug:
            dump_messages_summary(messages)

        full_content, full_reasoning, tool_calls_list, finish_reason = stream_response(messages, debug)

        if full_content is None:
            break

        if debug:
            print(f"\n  {C.CYAN}  finish_reason: {C.BOLD}{finish_reason}{C.RESET}")

        # ツール呼び出しなし → 最終応答
        if not tool_calls_list:
            print(f"\n  {C.GREEN}✅ アナウンス原稿生成完了{C.RESET}")
            break

        # アシスタントメッセージを履歴追加
        assistant_msg = {"role": "assistant", "content": full_content or ""}
        assistant_msg["tool_calls"] = [
            {
                "id": tc["id"],
                "type": "function",
                "function": {"name": tc["name"], "arguments": tc["arguments"]},
            }
            for tc in tool_calls_list
        ]
        messages.append(assistant_msg)

        # ツール実行
        print(f"\n  {C.YELLOW}{'─'*50}{C.RESET}")
        print(f"  {C.YELLOW}⚡ ツール実行: {len(tool_calls_list)}件{C.RESET}")

        for tc in tool_calls_list:
            func_name = tc["name"]
            try:
                tc_args = json.loads(tc["arguments"])
            except json.JSONDecodeError:
                tc_args = {}

            # ツール名表示
            icon_map = {
                "get_store_info": "🏬",
                "get_current_datetime": "📅",
                "get_weather": "🌤 ",
                "get_events": "🎉",
                "search_knowledge": "📚",
                "synthesize_speech": "🔊",
            }
            icon = icon_map.get(func_name, "⚙️")
            print(f"\n    {C.YELLOW}{icon} {func_name}{C.RESET}")

            if func_name == "search_knowledge":
                print(f"    {C.GRAY}   query: \"{tc_args.get('query', '')}\"{C.RESET}")
            elif func_name == "synthesize_speech":
                text_preview = tc_args.get("text", "")[:80]
                print(f"    {C.GRAY}   text: \"{text_preview}...\"{C.RESET}")

            result = execute_tool(func_name, tc_args)
            result_obj = json.loads(result)

            # 結果表示
            if func_name == "get_store_info":
                print(f"    {C.GREEN}   ✅ {result_obj.get('store_name', '')} / {result_obj.get('current_period', '')}{C.RESET}")
            elif func_name == "get_current_datetime":
                print(f"    {C.GREEN}   ✅ {result_obj.get('datetime', '')}{C.RESET}")
            elif func_name == "get_weather":
                if "error" in result_obj:
                    print(f"    {C.RED}   ❌ {result_obj['error']}{C.RESET}")
                else:
                    overview = result_obj.get("overview", "")[:80]
                    print(f"    {C.GREEN}   ✅ {overview}...{C.RESET}")
            elif func_name == "get_events":
                season = result_obj.get("season", "")
                ev_count = len(result_obj.get("general_events", [])) + len(result_obj.get("school_events", []))
                print(f"    {C.GREEN}   ✅ {season}パターン / {ev_count}件のイベント{C.RESET}")
            elif func_name == "search_knowledge":
                if result_obj.get("found"):
                    print(f"    {C.GREEN}   ✅ {result_obj.get('count', 0)}件の知見がヒット{C.RESET}")
                    # ナレッジ内容の一部を表示
                    knowledge = result_obj.get("knowledge", "")
                    for line in knowledge.split("\n")[:3]:
                        print(f"    {C.DIM}   {line}{C.RESET}")
                    print(f"    {C.DIM}   ...{C.RESET}")
                else:
                    print(f"    {C.YELLOW}   ⚠ ヒットなし{C.RESET}")
            elif func_name == "synthesize_speech":
                if result_obj.get("status") == "success":
                    print(f"    {C.GREEN}   ✅ 音声再生完了{C.RESET}")
                else:
                    print(f"    {C.RED}   ❌ {result_obj.get('message', '')}{C.RESET}")

            messages.append({
                "role": "tool",
                "tool_call_id": tc["id"],
                "name": func_name,
                "content": result,
            })

        print(f"  {C.YELLOW}{'─'*50}{C.RESET}")

    # サマリー
    print(f"\n{C.DIM}{'='*62}{C.RESET}")
    print(f"{C.DIM}最終ステップ: {step}{C.RESET}")
    print(f"{C.DIM}メッセージ数: {len(messages)}{C.RESET}")
    print(f"{C.DIM}{'='*62}{C.RESET}\n")


if __name__ == "__main__":
    main()

ベンチマーク結果/benchmark result

shisa-ai/M-IFEval を使って計測した日本語における指示追従性能は以下です。
Ability to follow Japanese instructions measured using shisa-ai/M-IFEval is as follows.

Unslothは量子化モデルで世界的に有名であるため、今回、彼らのモデルに挑戦しました。
英語をメインに使用する場合はUnslothのモデルの方が性能が高いと思われるので留意してください。
9Bが4Bより低くなっている理由はモデルが気をまわした結果、減点される事があるからです。
例えば、物語の書き始めは「一章」から始めてください、という指示に対して

# 一章

のように整形して減点される事があります。

Since Unsloth are world-renowned experts in quantization models, I decided to try their models this time.
Please note that their models are likely to perform better if you primarily use English.
The reason why 9B is lower than 4B is that the model may have to pay attention to details, which can result in deductions.
For example, if the instruction is to start the story with "Chapter One,"

# Chapter One

you might lose points for formatting it like this.

Model Name	Strict Prompt	Strict Inst	Loose Prompt	Loose Inst
Unsloth-Q4_K_XL	0.5756	0.6062	0.6220	0.6416
Qwen3.5-9B-UD-japanese-imatrix-Q4_K_XL	0.6047	0.6504	0.6570	0.6903

update

2026/04/09 fix prompt template for cache reuse issue

謝辞 / Acknowledgments

Qwen
Unsloth
bartowski
llama.cpp
Thank you to all AI researchers and practitioners.

作成者 / Developer

開発：dahara1@Webbigdata / Developed by dahara1@Webbigdata

Downloads last month: 10,707

GGUF

Model size

9B params

Architecture

qwen35

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support