Osaurus-AI commited on 12 days ago

Commit

9e6f5b7

verified ·

1 Parent(s): a59493f

DSV4-Flash JANGTQ2 (uniform 2-bit, --variant std, MTP preserved-disabled)

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.gitattributes +2 -0
DeepSeek_V4.pdf +3 -0
LICENSE +21 -0
README.md +117 -0
config.json +145 -0
encoding/README.md +156 -0
encoding/__pycache__/encoding_dsv4.cpython-312.pyc +0 -0
encoding/__pycache__/encoding_dsv4.cpython-313.pyc +0 -0
encoding/encoding_dsv4.py +744 -0
encoding/test_encoding_dsv4.py +89 -0
encoding/tests/test_input_1.json +81 -0
encoding/tests/test_input_2.json +24 -0
encoding/tests/test_input_3.json +159 -0
encoding/tests/test_input_4.json +28 -0
encoding/tests/test_output_1.txt +36 -0
encoding/tests/test_output_2.txt +1 -0
encoding/tests/test_output_3.txt +38 -0
encoding/tests/test_output_4.txt +29 -0
generation_config.json +12 -0
jang_config.json +101 -0
jangtq_runtime.safetensors +3 -0
model-00001-of-00075.safetensors +3 -0
model-00002-of-00075.safetensors +3 -0
model-00003-of-00075.safetensors +3 -0
model-00004-of-00075.safetensors +3 -0
model-00005-of-00075.safetensors +3 -0
model-00006-of-00075.safetensors +3 -0
model-00007-of-00075.safetensors +3 -0
model-00008-of-00075.safetensors +3 -0
model-00009-of-00075.safetensors +3 -0
model-00010-of-00075.safetensors +3 -0
model-00011-of-00075.safetensors +3 -0
model-00012-of-00075.safetensors +3 -0
model-00013-of-00075.safetensors +3 -0
model-00014-of-00075.safetensors +3 -0
model-00015-of-00075.safetensors +3 -0
model-00016-of-00075.safetensors +3 -0
model-00017-of-00075.safetensors +3 -0
model-00018-of-00075.safetensors +3 -0
model-00019-of-00075.safetensors +3 -0
model-00020-of-00075.safetensors +3 -0
model-00021-of-00075.safetensors +3 -0
model-00022-of-00075.safetensors +3 -0
model-00023-of-00075.safetensors +3 -0
model-00024-of-00075.safetensors +3 -0
model-00025-of-00075.safetensors +3 -0
model-00026-of-00075.safetensors +3 -0
model-00027-of-00075.safetensors +3 -0
model-00028-of-00075.safetensors +3 -0
model-00029-of-00075.safetensors +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+*.pdf filter=lfs diff=lfs merge=lfs -text
+*.png filter=lfs diff=lfs merge=lfs -text

DeepSeek_V4.pdf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f4cbe4fcbd2888b25b2890a98cc6ef4ce0489df7c93e140b6f853c451d3f5c52
+size 4479907

LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2023 DeepSeek
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

README.md ADDED Viewed

	@@ -0,0 +1,117 @@

+---
+license: mit
+license_name: deepseek-license
+library_name: mlx
+base_model: deepseek-ai/DeepSeek-V4-Flash
+base_model_relation: quantized
+pipeline_tag: text-generation
+tags:
+  - mlx
+  - jang
+  - jangtq
+  - jangtq2
+  - jangtq-prestack
+  - mxtq
+  - deepseek
+  - deepseek-v4
+  - deepseek-v4-flash
+  - moe
+  - mla
+  - hash-layers
+  - mtp
+  - apple-silicon
+  - osaurus
+---
+<p align="center"><img src="osaurus-x-banner.png" width="100%" alt="OsaurusAI"/></p>
+# DeepSeek-V4-Flash-JANGTQ2
+**DeepSeek-V4-Flash — 79.6 GB on disk** (down from 149 GB FP4+FP8 source) —
+uniform **2-bit JANGTQ** quantization on routed experts + 8-bit affine on
+everything else + preserved MTP head.
+- **Source:** [deepseek-ai/DeepSeek-V4-Flash](https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash)
+  (43 transformer layers + 1 MTP head, **256 routed experts top-6 + 1
+  shared expert**, **3 hash layers**, MLA + mHC residuals, ~284 B total)
+- **Quantization:** uniform **2-bit MXTQ** on routed-expert MLP +
+  8-bit affine on attention (`wq_a/wq_b/wkv/wo_a/wo_b`) / shared
+  expert / Compressor / Indexer / embed / lm_head / MTP. RMSNorms,
+  router gate, mHC fn matrices, attn_sink, ape stay fp16/fp32
+  passthrough.
+- **Variant:** `std` (preserves MTP layer 43; one-token-per-forward
+  until a JANG runtime ships the accept/reject speculative-decode loop).
+  The companion `DeepSeek-V4-Flash-JANGTQ-K` variant drops MTP for a
+  smaller bundle.
+- **Routed-expert layout:** **pre-stacked** along axis 0 under
+  `ffn.experts.switch_mlp.{{gate_proj, up_proj, down_proj}}` per the
+  JANGTQ-PRESTACK STANDARD. Sidecar `jangtq_runtime.safetensors`
+  (~24 KB) ships both `(in=2048, bits=2)` and `(in=4096, bits=2)`
+  codebooks + sign-flip vectors for Swift runtimes.
+- **Bundle size:** **~79.6 GB on-disk**
+- **Runs on:** M4 Max 128 GB / M5 Max 128 GB / Mac Studio 192 GB+
+## Why top-6 + 2-bit holds
+DSV4-Flash routes through **6 of 256 experts per token** plus 1 always-on
+shared expert and 3 hash layers — so per-token output averages
+codebook noise across 7+ pathways. That's a much weaker quality
+constraint than top-1 architectures (where every token rides a single
+expert's quant error). MiniMax (top-2) and Hy3-preview (top-8) both
+ship coherent uniform JANGTQ2; DSV4 sits between them.
+## Loading (Python)
+```bash
+pip install jang-tools mlx-lm
+```
+```python
+from jang_tools.load_jangtq import load_jangtq_model
+model, tokenizer = load_jangtq_model("OsaurusAI/DeepSeek-V4-Flash-JANGTQ2")
+chat = tokenizer.apply_chat_template(
+    [{{"role": "user", "content": "What is 2 + 2? Answer briefly."}}],
+    tokenize=False,
+    add_generation_prompt=True,
+)
+```
+`load_jangtq_model` auto-registers `model_type=deepseek_v4` via
+`jang_tools.dsv4` before building the MLX skeleton. The loader applies
+the DSV4-specific MLA absorb + fp32 SDPA + mHC + Compressor + Indexer
+patches automatically.
+## Runtime support matrix
+| Surface | Status |
+|---|---|
+| `jang-tools` Python (`load_jangtq_model`) | ✅ working |
+| `vmlx-swift-lm` Swift | ✅ working — `DeepseekV4JANGTQ` family path |
+| MTP speculative decode | preserved-disabled — weights present (variant=std); accept/reject loop not yet in any JANG runtime |
+## Validated runtime contract
+- 43 transformer layers + 1 MTP head materialize; 40 sparse-MoE layers
+  hydrate routed experts via TurboQuantLinear (2-bit MXTQ).
+- 33,792 MXTQ tensors / 522 affine / 706 passthrough.
+- Capabilities: `family=deepseek_v4`, `reasoning_parser=deepseek_r1`,
+  `tool_parser=dsml`, `think_in_template=True`, `cache_type=mla`.
+## Reasoning + tools
+- **Reasoning parser:** `deepseek_r1`
+- **Tool parser:** `dsml` (DeepSeek Markup Language — distinct from
+  `deepseek_tool_parser`; see `~/jang/research/DSV4-EVAL-NUANCES.md`)
+- **Reasoning template:** `<｜thinking_begin｜>...<｜thinking_end｜>` blocks
+  via `enable_thinking=True` (default off — pass-through chat mode).
+  Greedy `T=0` with `enable_thinking=True` collapses into repetition on
+  DSV4; use `T=0.6` for pass@1 like the original DeepSeek release.
+- **Cache:** `mla` (Multi-head Latent Attention with kv_lora_rank=512)
+## Credits
+- **Quantization + MLX runtime:** Jinho Jang (eric@osaurus.ai)
+- **Source model:** DeepSeek AI
+- **License:** MIT, inherited from upstream

config.json ADDED Viewed

	@@ -0,0 +1,145 @@

+{
+  "architectures": [
+    "DeepseekV4ForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": 0,
+  "eos_token_id": [
+    1,
+    128803
+  ],
+  "expert_dtype": "fp4",
+  "hc_eps": 1e-06,
+  "hc_mult": 4,
+  "hc_sinkhorn_iters": 20,
+  "head_dim": 512,
+  "hidden_act": "silu",
+  "hidden_size": 4096,
+  "index_head_dim": 128,
+  "index_n_heads": 64,
+  "index_topk": 512,
+  "initializer_range": 0.02,
+  "max_position_embeddings": 1048576,
+  "model_type": "deepseek_v4",
+  "moe_intermediate_size": 2048,
+  "n_routed_experts": 256,
+  "n_shared_experts": 1,
+  "norm_topk_prob": true,
+  "num_attention_heads": 64,
+  "num_experts_per_tok": 6,
+  "num_hidden_layers": 43,
+  "num_hash_layers": 3,
+  "num_key_value_heads": 1,
+  "num_nextn_predict_layers": 1,
+  "o_groups": 8,
+  "o_lora_rank": 1024,
+  "q_lora_rank": 1024,
+  "qk_rope_head_dim": 64,
+  "rms_norm_eps": 1e-06,
+  "rope_theta": 10000,
+  "routed_scaling_factor": 1.5,
+  "scoring_func": "sqrtsoftplus",
+  "sliding_window": 128,
+  "swiglu_limit": 10.0,
+  "tie_word_embeddings": false,
+  "topk_method": "noaux_tc",
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.57.1",
+  "use_cache": true,
+  "vocab_size": 129280,
+  "compress_rope_theta": 160000,
+  "compress_ratios": [
+    0,
+    0,
+    4,
+    128,
+    4,
+    128,
+    4,
+    128,
+    4,
+    128,
+    4,
+    128,
+    4,
+    128,
+    4,
+    128,
+    4,
+    128,
+    4,
+    128,
+    4,
+    128,
+    4,
+    128,
+    4,
+    128,
+    4,
+    128,
+    4,
+    128,
+    4,
+    128,
+    4,
+    128,
+    4,
+    128,
+    4,
+    128,
+    4,
+    128,
+    4,
+    128,
+    4,
+    0
+  ],
+  "rope_parameters": {
+    "beta_fast": 32.0,
+    "beta_slow": 1.0,
+    "factor": 16.0,
+    "original_max_position_embeddings": 65536,
+    "rope_type": "yarn",
+    "rope_theta": 10000.0
+  },
+  "quantization": {
+    "bits": 8,
+    "group_size": 32,
+    "mode": "affine",
+    "routed_expert_bits": 2,
+    "routed_expert_bit_plan": {
+      "default_bits": 2,
+      "codec": "mxtq"
+    },
+    "mxtq_bits": {
+      "routed_expert": 2,
+      "attention": 8,
+      "shared_expert": 8,
+      "compressor": 8,
+      "indexer": 8,
+      "embed_tokens": 8,
+      "lm_head": 8,
+      "norms_router_hc": 16
+    }
+  },
+  "weight_format": "mxtq",
+  "routed_expert_bits": 2,
+  "routed_expert_bit_plan": {
+    "default_bits": 2,
+    "codec": "mxtq"
+  },
+  "mxtq_bits": {
+    "routed_expert": 2,
+    "attention": 8,
+    "shared_expert": 8,
+    "compressor": 8,
+    "indexer": 8,
+    "embed_tokens": 8,
+    "lm_head": 8,
+    "norms_router_hc": 16
+  },
+  "mxtq_seed": 42,
+  "group_size": 32,
+  "_name_or_path": "DSV4-Flash-JANGTQ2"
+}

encoding/README.md ADDED Viewed

	@@ -0,0 +1,156 @@

+# DeepSeek-V4 Encoding
+This document describes the prompt encoding format used by DeepSeek-V4 series models. The encoding handles multi-turn conversations, tool calling, extended thinking (reasoning), and quick instruction tasks.
+A self-contained reference implementation is provided in `encoding_dsv4.py`.
+## Quick Start
+```python
+from encoding_dsv4 import encode_messages, parse_message_from_completion_text
+# Encode a conversation
+messages = [
+    {"role": "system", "content": "You are a helpful assistant."},
+    {"role": "user", "content": "What is 2+2?"},
+]
+prompt = encode_messages(messages, thinking_mode="thinking")
+# => "<｜begin▁of▁sentence｜>You are a helpful assistant.<｜User｜>What is 2+2?<｜Assistant｜><think>"
+# Parse model output back to structured message
+completion = "Simple arithmetic.</think>2 + 2 = 4.<｜end▁of▁sentence｜>"
+parsed = parse_message_from_completion_text(completion, thinking_mode="thinking")
+# => {"role": "assistant", "reasoning_content": "Simple arithmetic.", "content": "2 + 2 = 4.", "tool_calls": []}
+```
+> **Note:** The `parse_message_from_completion_text` function is designed to handle well-formatted model output only. It does not attempt to correct or recover from malformed output that the model might occasionally generate. For production use, additional error handling is recommended.
+## Message Format
+### Special Tokens
+| Token | Purpose |
+|-------|---------|
+| `<｜begin▁of▁sentence｜>` | Beginning of sequence (BOS) |
+| `<｜end▁of▁sentence｜>` | End of assistant turn (EOS) |
+| `<｜User｜>` | User turn prefix |
+| `<｜Assistant｜>` | Assistant turn prefix |
+| `<｜latest_reminder｜>` | Latest reminder (date, locale, etc.) |
+| `<think>` / `</think>` | Reasoning block delimiters |
+| `｜DSML｜` | DSML markup token |
+### Roles
+The encoding supports the following message roles: `system`, `user`, `assistant`, `tool`, `latest_reminder`, and `developer`.
+> **Note on the `developer` role:** The `developer` role is used exclusively in the internal search agent pipeline. It is not needed for general-purpose chat or tool-calling tasks, and the official API does not accept messages with this role.
+### Basic Chat
+A simple multi-turn conversation is encoded as:
+```
+<｜begin▁of▁sentence｜>{system_prompt}
+<｜User｜>{user_message}<｜Assistant｜></think>{response}<｜end▁of▁sentence｜>
+<｜User｜>{user_message_2}<｜Assistant｜></think>{response_2}<｜end▁of▁sentence｜>
+```
+- The BOS token is prepended at the very beginning of the conversation.
+- In **chat mode** (`thinking_mode="chat"`), `</think>` is placed right after `<｜Assistant｜>` to immediately close the thinking block, so the model generates content directly.
+### Interleaved Thinking Mode
+In **thinking mode** (`thinking_mode="thinking"`), the model produces explicit reasoning inside `<think>...</think>` blocks before responding.
+```
+<｜begin▁of▁sentence｜>{system_prompt}
+<｜User｜>{message}<｜Assistant｜><think>{reasoning}</think>{response}<｜end▁of▁sentence｜>
+```
+The `drop_thinking` parameter (default `True`) controls whether reasoning from earlier turns is preserved:
+- **Without tools**: `drop_thinking` takes effect. Reasoning content from assistant turns **before** the last user message is stripped. Only the final assistant turn retains its `<think>...</think>` block.
+- **With tools** (on system or developer message): `drop_thinking` is automatically disabled. All turns retain their reasoning, because tool-calling conversations require full context for the model to track multi-step reasoning across tool calls.
+### Tool Calling (DSML Format)
+Tools are defined on the `system` or `developer` message via the `tools` field (OpenAI-compatible format). When tools are present, the following schema block is injected into the system/user prompt:
+```
+## Tools
+You have access to a set of tools to help answer the user's question. You can invoke tools by writing a "<｜DSML｜tool_calls>" block like the following:
+<｜DSML｜tool_calls>
+<｜DSML｜invoke name="$TOOL_NAME">
+<｜DSML｜parameter name="$PARAMETER_NAME" string="true|false">$PARAMETER_VALUE</｜DSML｜parameter>
+...
+</｜DSML｜invoke>
+<｜DSML｜invoke name="$TOOL_NAME2">
+...
+</｜DSML｜invoke>
+</｜DSML｜tool_calls>
+String parameters should be specified as is and set `string="true"`. For all other types (numbers, booleans, arrays, objects), pass the value in JSON format and set `string="false"`.
+If thinking_mode is enabled (triggered by <think>), you MUST output your complete reasoning inside <think>...</think> BEFORE any tool calls or final response.
+Otherwise, output directly after </think> with tool calls or final response.
+### Available Tool Schemas
+{tool_definitions_json}
+You MUST strictly follow the above defined tool name and parameter schemas to invoke tool calls.
+```
+An actual tool call in the assistant turn looks like:
+```xml
+<｜DSML｜tool_calls>
+<｜DSML｜invoke name="function_name">
+<｜DSML｜parameter name="param" string="true">string_value</｜DSML｜parameter>
+<｜DSML｜parameter name="count" string="false">5</｜DSML｜parameter>
+</｜DSML｜invoke>
+</｜DSML｜tool_calls><｜end▁of▁sentence｜>
+```
+- `string="true"`: the parameter value is a raw string.
+- `string="false"`: the parameter value is JSON (number, boolean, array, object).
+Tool execution results are wrapped in `<tool_result>` tags within user messages:
+```
+<｜User｜><tool_result>{result_json}</tool_result><｜Assistant｜><think>...
+```
+When multiple tool results are present, they are sorted by the order of the corresponding `tool_calls` in the preceding assistant message.
+### Reasoning Effort
+When `reasoning_effort="max"` is set, a special prefix is prepended at the very beginning of the prompt (before the system message) to instruct the model to maximize its reasoning depth:
+```
+Reasoning Effort: Absolute maximum with no shortcuts permitted.
+You MUST be very thorough in your thinking and comprehensively decompose the problem to resolve the root cause, rigorously stress-testing your logic against all potential paths, edge cases, and adversarial scenarios.
+Explicitly write out your entire deliberation process, documenting every intermediate step, considered alternative, and rejected hypothesis to ensure absolutely no assumption is left unchecked.
+```
+### Quick Instruction Special Tokens
+Quick instruction tokens are used for auxiliary classification and generation tasks. They are appended to messages via the `"task"` field to trigger specialized model behavior for a single-token or short-form output.
+| Special Token | Description | Format |
+|:---|:---|:---|
+| `<｜action｜>` | Determines whether the user prompt requires a web search or can be answered directly. | `...<｜User｜>{prompt}<｜Assistant｜><think><｜action｜>` |
+| `<｜title｜>` | Generates a concise conversation title after the first assistant response. | `...<｜Assistant｜>{response}<｜end▁of▁sentence｜><｜title｜>` |
+| `<｜query｜>` | Generates search queries for the user prompt. | `...<｜User｜>{prompt}<｜query｜>` |
+| `<｜authority｜>` | Classifies the user prompt's demand for source authoritativeness. | `...<｜User｜>{prompt}<｜authority｜>` |
+| `<｜domain｜>` | Identifies the domain of the user prompt. | `...<｜User｜>{prompt}<｜domain｜>` |
+| `<｜extracted_url｜>` `<｜read_url｜>` | Determines whether each URL in the user prompt should be fetched and read. | `...<｜User｜>{prompt}<｜extracted_url｜>{url}<｜read_url｜>` |
+Usage in message format:
+- **`action`** on a user message: the `<｜action｜>` token is placed after the assistant prefix and thinking token, triggering a routing decision (e.g., "Search" or "Answer").
+- **Other tasks** (`query`, `authority`, `domain`, `read_url`) on a user message: the task token is appended directly after the user content.
+- **`title`** on an assistant message: the `<｜title｜>` token is appended after the assistant's EOS. The next assistant message provides the generated title.

encoding/__pycache__/encoding_dsv4.cpython-312.pyc ADDED Viewed

Binary file (28.6 kB). View file

encoding/__pycache__/encoding_dsv4.cpython-313.pyc ADDED Viewed

Binary file (28.6 kB). View file

encoding/encoding_dsv4.py ADDED Viewed

	@@ -0,0 +1,744 @@

+"""
+DeepSeek-V4 Encoding
+A self-contained implementation for encoding/decoding DeepSeek-V4 chat messages
+with tool calling, thinking mode, and quick instruction task support.
+"""
+from typing import Any, Dict, List, Union, Optional, Tuple
+import copy
+import json
+import re
+# ============================================================
+# Special Tokens
+# ============================================================
+bos_token: str = "<｜begin▁of▁sentence｜>"
+eos_token: str = "<｜end▁of▁sentence｜>"
+thinking_start_token: str = "<think>"
+thinking_end_token: str = "</think>"
+dsml_token: str = "｜DSML｜"
+USER_SP_TOKEN = "<｜User｜>"
+ASSISTANT_SP_TOKEN = "<｜Assistant｜>"
+LATEST_REMINDER_SP_TOKEN = "<｜latest_reminder｜>"
+# Task special tokens for internal classification tasks
+DS_TASK_SP_TOKENS = {
+    "action": "<｜action｜>",
+    "query": "<｜query｜>",
+    "authority": "<｜authority｜>",
+    "domain": "<｜domain｜>",
+    "title": "<｜title｜>",
+    "read_url": "<｜read_url｜>",
+}
+VALID_TASKS = set(DS_TASK_SP_TOKENS.keys())
+# ============================================================
+# Templates
+# ============================================================
+system_msg_template: str = "{content}"
+user_msg_template: str = "{content}"
+latest_reminder_msg_template: str = "{content}"
+assistant_msg_template: str = "{reasoning}{content}{tool_calls}" + eos_token
+assistant_msg_wo_eos_template: str = "{reasoning}{content}{tool_calls}"
+thinking_template: str = "{reasoning_content}"
+response_format_template: str = (
+    "## Response Format:\n\nYou MUST strictly adhere to the following schema to reply:\n{schema}"
+)
+tool_call_template: str = (
+    "<{dsml_token}invoke name=\"{name}\">\n{arguments}\n</{dsml_token}invoke>"
+)
+tool_calls_template = (
+    "<{dsml_token}{tc_block_name}>\n{tool_calls}\n</{dsml_token}{tc_block_name}>"
+)
+tool_calls_block_name: str = "tool_calls"
+tool_output_template: str = (
+    "<tool_result>{content}</tool_result>"
+)
+REASONING_EFFORT_MAX = (
+    "Reasoning Effort: Absolute maximum with no shortcuts permitted.\n"
+    "You MUST be very thorough in your thinking and comprehensively decompose the problem to resolve the root cause, rigorously stress-testing your logic against all potential paths, edge cases, and adversarial scenarios.\n"
+    "Explicitly write out your entire deliberation process, documenting every intermediate step, considered alternative, and rejected hypothesis to ensure absolutely no assumption is left unchecked.\n\n"
+)
+TOOLS_TEMPLATE = """## Tools
+You have access to a set of tools to help answer the user's question. You can invoke tools by writing a "<{dsml_token}tool_calls>" block like the following:
+<{dsml_token}tool_calls>
+<{dsml_token}invoke name="$TOOL_NAME">
+<{dsml_token}parameter name="$PARAMETER_NAME" string="true|false">$PARAMETER_VALUE</{dsml_token}parameter>
+...
+</{dsml_token}invoke>
+<{dsml_token}invoke name="$TOOL_NAME2">
+...
+</{dsml_token}invoke>
+</{dsml_token}tool_calls>
+String parameters should be specified as is and set `string="true"`. For all other types (numbers, booleans, arrays, objects), pass the value in JSON format and set `string="false"`.
+If thinking_mode is enabled (triggered by {thinking_start_token}), you MUST output your complete reasoning inside {thinking_start_token}...{thinking_end_token} BEFORE any tool calls or final response.
+Otherwise, output directly after {thinking_end_token} with tool calls or final response.
+### Available Tool Schemas
+{tool_schemas}
+You MUST strictly follow the above defined tool name and parameter schemas to invoke tool calls.
+"""
+# ============================================================
+# Utility Functions
+# ============================================================
+def to_json(value: Any) -> str:
+    """Serialize a value to JSON string."""
+    try:
+        return json.dumps(value, ensure_ascii=False)
+    except:
+        return json.dumps(value, ensure_ascii=True)
+def tools_from_openai_format(tools):
+    """Extract function definitions from OpenAI-format tool list."""
+    return [tool["function"] for tool in tools]
+def tool_calls_from_openai_format(tool_calls):
+    """Convert OpenAI-format tool calls to internal format."""
+    return [
+        {
+            "name": tool_call["function"]["name"],
+            "arguments": tool_call["function"]["arguments"],
+        }
+        for tool_call in tool_calls
+    ]
+def tool_calls_to_openai_format(tool_calls):
+    """Convert internal tool calls to OpenAI format."""
+    return [
+        {
+            "type": "function",
+            "function": {
+                "name": tool_call["name"],
+                "arguments": tool_call["arguments"],
+            }
+        }
+        for tool_call in tool_calls
+    ]
+def encode_arguments_to_dsml(tool_call: Dict[str, str]) -> str:
+    """
+    Encode tool call arguments into DSML parameter format.
+    Args:
+        tool_call: Dict with "name" and "arguments" (JSON string) keys.
+    Returns:
+        DSML-formatted parameter string.
+    """
+    p_dsml_template = '<{dsml_token}parameter name="{key}" string="{is_str}">{value}</{dsml_token}parameter>'
+    P_dsml_strs = []
+    try:
+        arguments = json.loads(tool_call["arguments"])
+    except Exception as err:
+        arguments = {"arguments": tool_call["arguments"]}
+    for k, v in arguments.items():
+        p_dsml_str = p_dsml_template.format(
+            dsml_token=dsml_token,
+            key=k,
+            is_str="true" if isinstance(v, str) else "false",
+            value=v if isinstance(v, str) else to_json(v),
+        )
+        P_dsml_strs.append(p_dsml_str)
+    return "\n".join(P_dsml_strs)
+def decode_dsml_to_arguments(tool_name: str, tool_args: Dict[str, Tuple[str, str]]) -> Dict[str, str]:
+    """
+    Decode DSML parameters back to a tool call dict.
+    Args:
+        tool_name: Name of the tool.
+        tool_args: Dict mapping param_name -> (value, is_string_flag).
+    Returns:
+        Dict with "name" and "arguments" (JSON string) keys.
+    """
+    def _decode_value(key: str, value: str, string: str):
+        if string == "true":
+            value = to_json(value)
+        return f"{to_json(key)}: {value}"
+    tool_args_json = "{" + ", ".join([_decode_value(k, v, string=is_str) for k, (v, is_str) in tool_args.items()]) + "}"
+    return dict(name=tool_name, arguments=tool_args_json)
+def render_tools(tools: List[Dict[str, Union[str, Dict[str, Any]]]]) -> str:
+    """
+    Render tool schemas into the system prompt format.
+    Args:
+        tools: List of tool schema dicts (each with name, description, parameters).
+    Returns:
+        Formatted tools section string.
+    """
+    tools_json = [to_json(t) for t in tools]
+    return TOOLS_TEMPLATE.format(
+        tool_schemas="\n".join(tools_json),
+        dsml_token=dsml_token,
+        thinking_start_token=thinking_start_token,
+        thinking_end_token=thinking_end_token,
+    )
+def find_last_user_index(messages: List[Dict[str, Any]]) -> int:
+    """Find the index of the last user/developer message."""
+    last_user_index = -1
+    for idx in range(len(messages) - 1, -1, -1):
+        if messages[idx].get("role") in ["user", "developer"]:
+            last_user_index = idx
+            break
+    return last_user_index
+# ============================================================
+# Message Rendering
+# ============================================================
+def render_message(index: int, messages: List[Dict[str, Any]], thinking_mode: str, drop_thinking: bool = True, reasoning_effort: Optional[str] = None) -> str:
+    """
+    Render a single message at the given index into its encoded string form.
+    This is the core function that converts each message in the conversation
+    into the DeepSeek-V4 format.
+    Args:
+        index: Index of the message to render.
+        messages: Full list of messages in the conversation.
+        thinking_mode: Either "chat" or "thinking".
+        drop_thinking: Whether to drop reasoning content from earlier turns.
+        reasoning_effort: Optional reasoning effort level ("max", "high", or None).
+    Returns:
+        Encoded string for this message.
+    """
+    assert 0 <= index < len(messages)
+    assert thinking_mode in ["chat", "thinking"], f"Invalid thinking_mode `{thinking_mode}`"
+    prompt = ""
+    msg = messages[index]
+    last_user_idx = find_last_user_index(messages)
+    role = msg.get("role")
+    content = msg.get("content")
+    tools = msg.get("tools")
+    response_format = msg.get("response_format")
+    tool_calls = msg.get("tool_calls")
+    reasoning_content = msg.get("reasoning_content")
+    wo_eos = msg.get("wo_eos", False)
+    if tools:
+        tools = tools_from_openai_format(tools)
+    if tool_calls:
+        tool_calls = tool_calls_from_openai_format(tool_calls)
+    # Reasoning effort prefix (only at index 0 in thinking mode with max effort)
+    assert reasoning_effort in ['max', None, 'high'], f"Invalid reasoning effort: {reasoning_effort}"
+    if index == 0 and thinking_mode == "thinking" and reasoning_effort == 'max':
+        prompt += REASONING_EFFORT_MAX
+    if role == "system":
+        prompt += system_msg_template.format(content=content or "")
+        if tools:
+            prompt += "\n\n" + render_tools(tools)
+        if response_format:
+            prompt += "\n\n" + response_format_template.format(schema=to_json(response_format))
+    elif role == "developer":
+        assert content, f"Invalid message for role `{role}`: {msg}"
+        content_developer = USER_SP_TOKEN
+        content_developer += content
+        if tools:
+            content_developer += "\n\n" + render_tools(tools)
+        if response_format:
+            content_developer += "\n\n" + response_format_template.format(schema=to_json(response_format))
+        prompt += user_msg_template.format(content=content_developer)
+    elif role == "user":
+        prompt += USER_SP_TOKEN
+        # Handle content blocks (tool results mixed with text)
+        content_blocks = msg.get("content_blocks")
+        if content_blocks:
+            parts = []
+            for block in content_blocks:
+                block_type = block.get("type")
+                if block_type == "text":
+                    parts.append(block.get("text", ""))
+                elif block_type == "tool_result":
+                    tool_content = block.get("content", "")
+                    if isinstance(tool_content, list):
+                        text_parts = []
+                        for b in tool_content:
+                            if b.get("type") == "text":
+                                text_parts.append(b.get("text", ""))
+                            else:
+                                text_parts.append(f"[Unsupported {b.get('type')}]")
+                        tool_content = "\n\n".join(text_parts)
+                    parts.append(tool_output_template.format(content=tool_content))
+                else:
+                    parts.append(f"[Unsupported {block_type}]")
+            prompt += "\n\n".join(parts)
+        else:
+            prompt += content or ""
+    elif role == "latest_reminder":
+        prompt += LATEST_REMINDER_SP_TOKEN + latest_reminder_msg_template.format(content=content)
+    elif role == "tool":
+        raise NotImplementedError("deepseek_v4 merges tool messages into user; please preprocess with merge_tool_messages()")
+    elif role == "assistant":
+        thinking_part = ""
+        tc_content = ""
+        if tool_calls:
+            tc_list = [
+                tool_call_template.format(
+                    dsml_token=dsml_token,
+                    name=tc.get("name"),
+                    arguments=encode_arguments_to_dsml(tc)
+                )
+                for tc in tool_calls
+            ]
+            tc_content += '\n\n' + tool_calls_template.format(
+                dsml_token=dsml_token,
+                tool_calls="\n".join(tc_list),
+                tc_block_name=tool_calls_block_name,
+            )
+        summary_content = content or ""
+        rc = reasoning_content or ""
+        # Check if previous message has a task - if so, this is a task output (no thinking)
+        prev_has_task = index - 1 >= 0 and messages[index - 1].get("task") is not None
+        if thinking_mode == "thinking" and not prev_has_task:
+            if not drop_thinking or index > last_user_idx:
+                thinking_part = thinking_template.format(reasoning_content=rc) + thinking_end_token
+            else:
+                thinking_part = ""
+        if wo_eos:
+            prompt += assistant_msg_wo_eos_template.format(
+                reasoning=thinking_part,
+                content=summary_content,
+                tool_calls=tc_content,
+            )
+        else:
+            prompt += assistant_msg_template.format(
+                reasoning=thinking_part,
+                content=summary_content,
+                tool_calls=tc_content,
+            )
+    else:
+        raise NotImplementedError(f"Unknown role: {role}")
+    # Append transition tokens based on what follows
+    if index + 1 < len(messages) and messages[index + 1].get("role") not in ["assistant", "latest_reminder"]:
+        return prompt
+    task = messages[index].get("task")
+    if task is not None:
+        # Task special token for internal classification tasks
+        assert task in VALID_TASKS, f"Invalid task: '{task}'. Valid tasks are: {list(VALID_TASKS)}"
+        task_sp_token = DS_TASK_SP_TOKENS[task]
+        if task != "action":
+            # Non-action tasks: append task sp token directly after the message
+            prompt += task_sp_token
+        else:
+            # Action task: append Assistant + thinking token + action sp token
+            prompt += ASSISTANT_SP_TOKEN
+            prompt += thinking_end_token if thinking_mode != "thinking" else thinking_start_token
+            prompt += task_sp_token
+    elif messages[index].get("role") in ["user", "developer"]:
+        # Normal generation: append Assistant + thinking token
+        prompt += ASSISTANT_SP_TOKEN
+        if not drop_thinking and thinking_mode == "thinking":
+            prompt += thinking_start_token
+        elif drop_thinking and thinking_mode == "thinking" and index >= last_user_idx:
+            prompt += thinking_start_token
+        else:
+            prompt += thinking_end_token
+    return prompt
+# ============================================================
+# Preprocessing
+# ============================================================
+def merge_tool_messages(messages: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
+    """
+    Merge tool messages into the preceding user message using content_blocks format.
+    DeepSeek-V4 does not have a standalone "tool" role; instead, tool results
+    are encoded as <tool_result> blocks within user messages.
+    This function converts a standard OpenAI-format conversation (with separate
+    "tool" role messages) into V4 format where tool results are merged into
+    user messages.
+    Args:
+        messages: List of message dicts in OpenAI format.
+    Returns:
+        Processed message list with tool messages merged into user messages.
+    """
+    merged: List[Dict[str, Any]] = []
+    for msg in messages:
+        msg = copy.deepcopy(msg)
+        role = msg.get("role")
+        if role == "tool":
+            # Convert tool message to a user message with tool_result block
+            tool_block = {
+                "type": "tool_result",
+                "tool_use_id": msg.get("tool_call_id", ""),
+                "content": msg.get("content", ""),
+            }
+            # Merge into previous message if it's already a user (merged tool)
+            if merged and merged[-1].get("role") == "user" and "content_blocks" in merged[-1]:
+                merged[-1]["content_blocks"].append(tool_block)
+            else:
+                merged.append({
+                    "role": "user",
+                    "content_blocks": [tool_block],
+                })
+        elif role == "user":
+            text_block = {"type": "text", "text": msg.get("content", "")}
+            if merged and merged[-1].get("role") == "user" and "content_blocks" in merged[-1] and merged[-1].get("task") is None:
+                merged[-1]["content_blocks"].append(text_block)
+            else:
+                new_msg = {
+                    "role": "user",
+                    "content": msg.get("content", ""),
+                    "content_blocks": [text_block],
+                }
+                # Preserve extra fields (task, wo_eos, mask, etc.)
+                for key in ("task", "wo_eos", "mask"):
+                    if key in msg:
+                        new_msg[key] = msg[key]
+                merged.append(new_msg)
+        else:
+            merged.append(msg)
+    return merged
+def sort_tool_results_by_call_order(messages: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
+    """
+    Sort tool_result blocks within user messages by the order of tool_calls
+    in the preceding assistant message.
+    Args:
+        messages: Preprocessed message list (after merge_tool_messages).
+    Returns:
+        Message list with sorted tool result blocks.
+    """
+    last_tool_call_order: Dict[str, int] = {}
+    for msg in messages:
+        role = msg.get("role")
+        if role == "assistant" and msg.get("tool_calls"):
+            last_tool_call_order = {}
+            for idx, tc in enumerate(msg["tool_calls"]):
+                tc_id = tc.get("id") or tc.get("function", {}).get("id", "")
+                if tc_id:
+                    last_tool_call_order[tc_id] = idx
+        elif role == "user" and msg.get("content_blocks"):
+            tool_blocks = [b for b in msg["content_blocks"] if b.get("type") == "tool_result"]
+            if len(tool_blocks) > 1 and last_tool_call_order:
+                sorted_blocks = sorted(
+                    tool_blocks,
+                    key=lambda b: last_tool_call_order.get(b.get("tool_use_id", ""), 0)
+                )
+                sorted_idx = 0
+                new_blocks = []
+                for block in msg["content_blocks"]:
+                    if block.get("type") == "tool_result":
+                        new_blocks.append(sorted_blocks[sorted_idx])
+                        sorted_idx += 1
+                    else:
+                        new_blocks.append(block)
+                msg["content_blocks"] = new_blocks
+    return messages
+# ============================================================
+# Main Encoding Function
+# ============================================================
+def encode_messages(
+    messages: List[Dict[str, Any]],
+    thinking_mode: str,
+    context: Optional[List[Dict[str, Any]]] = None,
+    drop_thinking: bool = True,
+    add_default_bos_token: bool = True,
+    reasoning_effort: Optional[str] = None,
+) -> str:
+    """
+    Encode a list of messages into the DeepSeek-V4 prompt format.
+    This is the main entry point for encoding conversations. It handles:
+    - BOS token insertion
+    - Thinking mode with optional reasoning content dropping
+    - Tool message merging into user messages
+    - Multi-turn conversation context
+    Args:
+        messages: List of message dicts to encode.
+        thinking_mode: Either "chat" or "thinking".
+        context: Optional preceding context messages (already encoded prefix).
+        drop_thinking: If True, drop reasoning_content from earlier assistant turns
+                      (only keep reasoning for messages after the last user message).
+        add_default_bos_token: Whether to prepend BOS token at conversation start.
+        reasoning_effort: Optional reasoning effort level ("max", "high", or None).
+    Returns:
+        The encoded prompt string.
+    """
+    context = context if context else []
+    # Preprocess: merge tool messages and sort tool results
+    messages = merge_tool_messages(messages)
+    messages = sort_tool_results_by_call_order(context + messages)[len(context):]
+    if context:
+        context = merge_tool_messages(context)
+        context = sort_tool_results_by_call_order(context)
+    full_messages = context + messages
+    prompt = bos_token if add_default_bos_token and len(context) == 0 else ""
+    # Resolve drop_thinking: if any message has tools defined, don't drop thinking
+    effective_drop_thinking = drop_thinking
+    if any(m.get("tools") for m in full_messages):
+        effective_drop_thinking = False
+    if thinking_mode == "thinking" and effective_drop_thinking:
+        full_messages = _drop_thinking_messages(full_messages)
+        # After dropping, recalculate how many messages to render
+        # (context may have shrunk too)
+        num_to_render = len(full_messages) - len(_drop_thinking_messages(context))
+        context_len = len(full_messages) - num_to_render
+    else:
+        num_to_render = len(messages)
+        context_len = len(context)
+    for idx in range(num_to_render):
+        prompt += render_message(
+            idx + context_len,
+            full_messages,
+            thinking_mode=thinking_mode,
+            drop_thinking=effective_drop_thinking,
+            reasoning_effort=reasoning_effort,
+        )
+    return prompt
+def _drop_thinking_messages(messages: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
+    """
+    Drop reasoning_content and non-essential messages before the last user message.
+    Behavior:
+    - Messages with role in ["user", "system", "tool", "latest_reminder"] are always kept.
+    - Messages at or after the last user index are always kept.
+    - Assistant messages before the last user get reasoning_content removed.
+    - Developer messages before the last user are dropped entirely.
+    """
+    last_user_idx = find_last_user_index(messages)
+    result = []
+    keep_roles = {"user", "system", "tool", "latest_reminder", "direct_search_results"}
+    for idx, msg in enumerate(messages):
+        role = msg.get("role")
+        if role in keep_roles or idx >= last_user_idx:
+            result.append(msg)
+        elif role == "assistant":
+            msg = copy.copy(msg)
+            msg.pop("reasoning_content", None)
+            result.append(msg)
+        # developer and other roles before last_user_idx are dropped
+    return result
+# ============================================================
+# Parsing (Decoding model output)
+# ============================================================
+def _read_until_stop(index: int, text: str, stop: List[str]) -> Tuple[int, str, Optional[str]]:
+    """
+    Read text from index until one of the stop strings is found.
+    Returns:
+        Tuple of (new_index, content_before_stop, matched_stop_string_or_None).
+    """
+    min_pos = len(text)
+    matched_stop = None
+    for s in stop:
+        pos = text.find(s, index)
+        if pos != -1 and pos < min_pos:
+            min_pos = pos
+            matched_stop = s
+    if matched_stop:
+        content = text[index:min_pos]
+        return min_pos + len(matched_stop), content, matched_stop
+    else:
+        content = text[index:]
+        return len(text), content, None
+def parse_tool_calls(index: int, text: str) -> Tuple[int, Optional[str], List[Dict[str, str]]]:
+    """
+    Parse DSML tool calls from text starting at the given index.
+    Args:
+        index: Starting position in text.
+        text: The full text to parse.
+    Returns:
+        Tuple of (new_index, last_stop_token, list_of_tool_call_dicts).
+        Each tool call dict has "name" and "arguments" keys.
+    """
+    tool_calls: List[Dict[str, Any]] = []
+    stop_token = None
+    tool_calls_end_token = f"</{dsml_token}{tool_calls_block_name}>"
+    while index < len(text):
+        index, _, stop_token = _read_until_stop(index, text, [f"<{dsml_token}invoke", tool_calls_end_token])
+        if _ != ">\n":
+            raise ValueError(f"Tool call format error: expected '>\\n' but got '{_}'")
+        if stop_token == tool_calls_end_token:
+            break
+        if stop_token is None:
+            raise ValueError("Missing special token in tool calls")
+        index, tool_name_content, stop_token = _read_until_stop(index, text, [f"<{dsml_token}parameter", f"</{dsml_token}invoke"])
+        p_tool_name = re.findall(r'^\s*name="(.*?)">\n$', tool_name_content, flags=re.DOTALL)
+        if len(p_tool_name) != 1:
+            raise ValueError(f"Tool name format error: '{tool_name_content}'")
+        tool_name = p_tool_name[0]
+        tool_args: Dict[str, Tuple[str, str]] = {}
+        while stop_token == f"<{dsml_token}parameter":
+            index, param_content, stop_token = _read_until_stop(index, text, [f"/{dsml_token}parameter"])
+            param_kv = re.findall(r'^ name="(.*?)" string="(true|false)">(.*?)<$', param_content, flags=re.DOTALL)
+            if len(param_kv) != 1:
+                raise ValueError(f"Parameter format error: '{param_content}'")
+            param_name, string, param_value = param_kv[0]
+            if param_name in tool_args:
+                raise ValueError(f"Duplicate parameter name: '{param_name}'")
+            tool_args[param_name] = (param_value, string)
+            index, content, stop_token = _read_until_stop(index, text, [f"<{dsml_token}parameter", f"</{dsml_token}invoke"])
+            if content != ">\n":
+                raise ValueError(f"Parameter format error: expected '>\\n' but got '{content}'")
+        tool_call = decode_dsml_to_arguments(tool_name=tool_name, tool_args=tool_args)
+        tool_calls.append(tool_call)
+    return index, stop_token, tool_calls
+def parse_message_from_completion_text(text: str, thinking_mode: str) -> Dict[str, Any]:
+    """
+    Parse a model completion text into a structured assistant message.
+    This function takes the raw text output from the model (a single assistant turn)
+    and extracts:
+    - reasoning_content (thinking block)
+    - content (summary/response)
+    - tool_calls (if any)
+    NOTE: This function is designed to parse only correctly formatted strings and
+    will raise ValueError for malformed output.
+    Args:
+        text: The raw completion text (including EOS token).
+        thinking_mode: Either "chat" or "thinking".
+    Returns:
+        Dict with keys: "role", "content", "reasoning_content", "tool_calls".
+        tool_calls are in OpenAI format.
+    """
+    summary_content, reasoning_content, tool_calls = "", "", []
+    index, stop_token = 0, None
+    tool_calls_start_token = f"\n\n<{dsml_token}{tool_calls_block_name}"
+    is_thinking = thinking_mode == "thinking"
+    is_tool_calling = False
+    if is_thinking:
+        index, content_delta, stop_token = _read_until_stop(index, text, [thinking_end_token, tool_calls_start_token])
+        reasoning_content = content_delta
+        assert stop_token == thinking_end_token, "Invalid thinking format: missing </think>"
+    index, content_delta, stop_token = _read_until_stop(index, text, [eos_token, tool_calls_start_token])
+    summary_content = content_delta
+    if stop_token == tool_calls_start_token:
+        is_tool_calling = True
+    else:
+        assert stop_token == eos_token, "Invalid format: missing EOS token"
+    if is_tool_calling:
+        index, stop_token, tool_calls = parse_tool_calls(index, text)
+        index, tool_ends_text, stop_token = _read_until_stop(index, text, [eos_token])
+        assert not tool_ends_text, "Unexpected content after tool calls"
+    assert len(text) == index and stop_token in [eos_token, None], "Unexpected content at end"
+    for sp_token in [bos_token, eos_token, thinking_start_token, thinking_end_token, dsml_token]:
+        assert sp_token not in summary_content and sp_token not in reasoning_content, \
+            f"Unexpected special token '{sp_token}' in content"
+    return {
+        "role": "assistant",
+        "content": summary_content,
+        "reasoning_content": reasoning_content,
+        "tool_calls": tool_calls_to_openai_format(tool_calls)
+    }

encoding/test_encoding_dsv4.py ADDED Viewed

	@@ -0,0 +1,89 @@

+"""
+Test suite for DeepSeek-V4 Encoding.
+Run: python test_encoding_dsv4.py
+"""
+import json
+import os
+from encoding_dsv4 import encode_messages, parse_message_from_completion_text
+TESTS_DIR = os.path.join(os.path.dirname(__file__), "tests")
+def test_case_1():
+    """Thinking mode with tool calls (multi-turn, tool results merged into user)."""
+    with open(os.path.join(TESTS_DIR, "test_input_1.json")) as f:
+        td = json.load(f)
+        messages = td["messages"]
+        messages[0]["tools"] = td["tools"]
+    gold = open(os.path.join(TESTS_DIR, "test_output_1.txt")).read()
+    prompt = encode_messages(messages, thinking_mode="thinking")
+    assert prompt == gold
+    # Parse: assistant turn with tool call
+    marker = "<｜Assistant｜><think>"
+    first_start = prompt.find(marker) + len(marker)
+    first_end = prompt.find("<｜User｜>", first_start)
+    parsed_tc = parse_message_from_completion_text(prompt[first_start:first_end], thinking_mode="thinking")
+    assert parsed_tc["reasoning_content"] == "The user wants to know the weather in Beijing. I should use the get_weather tool."
+    assert parsed_tc["content"] == ""
+    assert len(parsed_tc["tool_calls"]) == 1
+    assert parsed_tc["tool_calls"][0]["function"]["name"] == "get_weather"
+    assert json.loads(parsed_tc["tool_calls"][0]["function"]["arguments"]) == {"location": "Beijing", "unit": "celsius"}
+    # Parse: final assistant turn with content
+    last_start = prompt.rfind(marker) + len(marker)
+    parsed_final = parse_message_from_completion_text(prompt[last_start:], thinking_mode="thinking")
+    assert parsed_final["reasoning_content"] == "Got the weather data. Let me format a nice response."
+    assert "22°C" in parsed_final["content"]
+    assert parsed_final["tool_calls"] == []
+    print("  [PASS] case 1: thinking with tools (encode + parse)")
+def test_case_2():
+    """Thinking mode without tools (drop_thinking removes earlier reasoning)."""
+    messages = json.load(open(os.path.join(TESTS_DIR, "test_input_2.json")))
+    gold = open(os.path.join(TESTS_DIR, "test_output_2.txt")).read()
+    prompt = encode_messages(messages, thinking_mode="thinking")
+    assert prompt == gold
+    # Parse: last assistant turn
+    marker = "<｜Assistant｜><think>"
+    last_start = prompt.rfind(marker) + len(marker)
+    parsed = parse_message_from_completion_text(prompt[last_start:], thinking_mode="thinking")
+    assert parsed["reasoning_content"] == "The user asks about the capital of France. It is Paris."
+    assert parsed["content"] == "The capital of France is Paris."
+    assert parsed["tool_calls"] == []
+    # Verify drop_thinking: first assistant's reasoning should be absent
+    assert "The user said hello" not in prompt
+    print("  [PASS] case 2: thinking without tools (encode + parse)")
+def test_case_3():
+    """Interleaved thinking + search (developer with tools, latest_reminder)."""
+    messages = json.load(open(os.path.join(TESTS_DIR, "test_input_3.json")))
+    gold = open(os.path.join(TESTS_DIR, "test_output_3.txt")).read()
+    assert encode_messages(messages, thinking_mode="thinking") == gold
+    print("  [PASS] case 3: interleaved thinking + search")
+def test_case_4():
+    """Quick instruction task with latest_reminder (chat mode, action task)."""
+    messages = json.load(open(os.path.join(TESTS_DIR, "test_input_4.json")))
+    gold = open(os.path.join(TESTS_DIR, "test_output_4.txt")).read()
+    assert encode_messages(messages, thinking_mode="chat") == gold
+    print("  [PASS] case 4: quick instruction task")
+if __name__ == "__main__":
+    print("Running DeepSeek-V4 Encoding Tests...\n")
+    test_case_1()
+    test_case_2()
+    test_case_3()
+    test_case_4()
+    print("\nAll 4 tests passed!")

encoding/tests/test_input_1.json ADDED Viewed

	@@ -0,0 +1,81 @@

+{
+    "tools": [
+        {
+            "type": "function",
+            "function": {
+                "name": "get_weather",
+                "description": "Get the weather for a specific location",
+                "parameters": {
+                    "type": "object",
+                    "properties": {
+                        "location": {
+                            "type": "string",
+                            "description": "The city name"
+                        },
+                        "unit": {
+                            "type": "string",
+                            "enum": ["celsius", "fahrenheit"],
+                            "description": "Temperature unit"
+                        }
+                    },
+                    "required": ["location"]
+                }
+            }
+        },
+        {
+            "type": "function",
+            "function": {
+                "name": "search",
+                "description": "Search the web for information",
+                "parameters": {
+                    "type": "object",
+                    "properties": {
+                        "query": {
+                            "type": "string",
+                            "description": "Search query"
+                        },
+                        "num_results": {
+                            "type": "integer",
+                            "description": "Number of results to return"
+                        }
+                    },
+                    "required": ["query"]
+                }
+            }
+        }
+    ],
+    "messages": [
+        {
+            "role": "system",
+            "content": "You are a helpful assistant."
+        },
+        {
+            "role": "user",
+            "content": "What's the weather in Beijing?"
+        },
+        {
+            "role": "assistant",
+            "reasoning_content": "The user wants to know the weather in Beijing. I should use the get_weather tool.",
+            "tool_calls": [
+                {
+                    "id": "call_001",
+                    "type": "function",
+                    "function": {
+                        "name": "get_weather",
+                        "arguments": "{\"location\": \"Beijing\", \"unit\": \"celsius\"}"
+                    }
+                }
+            ]
+        },
+        {
+            "role": "tool",
+            "tool_call_id": "call_001",
+            "content": "{\"temperature\": 22, \"condition\": \"sunny\", \"humidity\": 45}"
+        },
+        {
+            "role": "assistant",
+            "reasoning_content": "Got the weather data. Let me format a nice response.",
+            "content": "The weather in Beijing is currently sunny with a temperature of 22°C and 45% humidity."
+        }
+    ]
+}

encoding/tests/test_input_2.json ADDED Viewed

	@@ -0,0 +1,24 @@

+[
+  {
+    "role": "system",
+    "content": "You are a helpful assistant."
+  },
+  {
+    "role": "user",
+    "content": "Hello"
+  },
+  {
+    "role": "assistant",
+    "reasoning_content": "The user said hello, I should greet back.",
+    "content": "Hi there! How can I help you?"
+  },
+  {
+    "role": "user",
+    "content": "What is the capital of France?"
+  },
+  {
+    "role": "assistant",
+    "reasoning_content": "The user asks about the capital of France. It is Paris.",
+    "content": "The capital of France is Paris."
+  }
+]

encoding/tests/test_input_3.json ADDED Viewed

	@@ -0,0 +1,159 @@

+[
+  {
+    "role": "system",
+    "content": "该助手为DeepSeek，由深度求索公司创造。"
+  },
+  {
+    "role": "latest_reminder",
+    "content": "2026-02-21,星期六,广州,App,中文"
+  },
+  {
+    "role": "developer",
+    "content": "小柴胡冲剂和布洛芬能一起吃吗？\n\nCITATION FORMAT: 【{cursor_id}†L{start_line_id}(-L{end_line_id})?】",
+    "tools": [
+      {
+        "type": "function",
+        "function": {
+          "name": "search",
+          "description": "Web search. Split multiple queries with '||'.",
+          "parameters": {
+            "type": "object",
+            "properties": {
+              "queries": {
+                "type": "string",
+                "description": "query1||query2"
+              }
+            },
+            "required": [
+              "queries"
+            ],
+            "additionalProperties": false,
+            "$schema": "http://json-schema.org/draft-07/schema#"
+          }
+        }
+      },
+      {
+        "type": "function",
+        "function": {
+          "name": "open",
+          "description": "Batch open IDs (format 【{id}†...】) or URLs.",
+          "parameters": {
+            "type": "object",
+            "properties": {
+              "open_list": {
+                "type": "array",
+                "items": {
+                  "type": "object",
+                  "properties": {
+                    "id": {
+                      "description": "ID or URL",
+                      "anyOf": [
+                        {
+                          "type": "integer"
+                        },
+                        {
+                          "type": "string"
+                        }
+                      ],
+                      "default": -1
+                    },
+                    "cursor": {
+                      "type": "integer",
+                      "description": "",
+                      "default": -1
+                    },
+                    "loc": {
+                      "type": "integer",
+                      "description": "Start line",
+                      "default": -1
+                    },
+                    "num_lines": {
+                      "type": "integer",
+                      "description": "",
+                      "default": -1
+                    },
+                    "view_source": {
+                      "type": "boolean",
+                      "description": "",
+                      "default": false
+                    }
+                  },
+                  "additionalProperties": false
+                },
+                "description": ""
+              }
+            },
+            "required": [
+              "open_list"
+            ],
+            "additionalProperties": false,
+            "$schema": "http://json-schema.org/draft-07/schema#"
+          }
+        }
+      },
+      {
+        "type": "function",
+        "function": {
+          "name": "find",
+          "description": "Find exact text pattern in pages.",
+          "parameters": {
+            "type": "object",
+            "properties": {
+              "find_list": {
+                "type": "array",
+                "items": {
+                  "type": "object",
+                  "properties": {
+                    "pattern": {
+                      "type": "string",
+                      "description": ""
+                    },
+                    "cursor": {
+                      "type": "integer",
+                      "description": "",
+                      "default": -1
+                    }
+                  },
+                  "required": [
+                    "pattern"
+                  ],
+                  "additionalProperties": false
+                },
+                "description": ""
+              }
+            },
+            "required": [
+              "find_list"
+            ],
+            "additionalProperties": false,
+            "$schema": "http://json-schema.org/draft-07/schema#"
+          }
+        }
+      }
+    ]
+  },
+  {
+    "role": "assistant",
+    "content": "",
+    "reasoning_content": "用户想知道小柴胡冲剂和布洛芬能否一起服用。",
+    "tool_calls": [
+      {
+        "type": "function",
+        "function": {
+          "name": "search",
+          "arguments": "{\"queries\": \"小柴胡冲剂 布洛芬 相互作用 一起吃\"}"
+        }
+      }
+    ]
+  },
+  {
+    "role": "tool",
+    "content": "[0]"
+  },
+  {
+    "role": "assistant",
+    "content": "请及时就医。",
+    "reasoning_content": "现在开始组织回答。",
+    "tool_calls": []
+  }
+]

encoding/tests/test_input_4.json ADDED Viewed

	@@ -0,0 +1,28 @@

+[
+  {
+    "role": "system",
+    "content": "该助手为DeepSeek-V3，由深度求索公司创造。\n今天是2025年10月17日，星期五。"
+  },
+  {
+    "role": "latest_reminder",
+    "content": "2024-11-15,上海市,App,中文"
+  },
+  {
+    "role": "user",
+    "content": "热海大滚锅是世界著名温泉吗"
+  },
+  {
+    "role": "assistant",
+    "content": "关于热海大滚锅是否是世界著名温泉，可以这样总结：它在**中国乃至全球的地热奇观中占有重要地位**，但“世界著名”的称号更侧重于它作为**独特的地质现象和旅游景点**，而非普遍的温泉疗养体验。\n\n为了让你快速了解，我整理了一个简要的表格：\n\n| 维度 | 热海大滚锅的地位与特点 |\n| :--- | :--- |\n| **地质奇观** | **世界罕见**的高温地热系统，泉眼多、水温高、形态多样。 |\n| **国内声誉** | **中国三大地热区之一**，**国家5A级旅游景区**，自古闻名（徐霞客曾游历并记载）。 |\n| **国际知名度** | 在一些旅游平台被提及为“世界六大温泉”之一，但此说法流传不广，其国际声誉更多建立在地质独特性上。 |\n| **核心体验** | **观赏地热奇观**（如97℃沸腾的“大滚锅”）、**体验温泉煮鸡蛋**。 |\n\n### 💡 游玩攻略与温馨提示\n\n如果你计划前往热海大滚锅，这里有一些实用信息供你参考：\n\n- **门票与开放时间**：\n    - **门票**：景区门票约为**50元/人**。如果选择包含温泉沐浴的套餐，价格会更高，例如约**288元**。\n    - **开放时间**：景区一般**08:00-18:00**开放，但具体时间可能变动，建议提前核实。\n\n- **特色体验**：\n    - **温泉煮鸡蛋**：这几乎是必试项目。可以在景区门口购买用草绳串起的生鸡蛋（约5-8元/串），然后到“大滚锅”旁的指定区域蒸煮，几分钟便可熟食，趣味十足。\n    - **金汤足浴**：可以直接用从“大滚锅”流出的温泉水泡脚，缓解旅途疲劳。\n\n- **注意事项**：\n    - **安全第一**：“大滚锅”水温极高，务必遵守游览规则，在指定区域内观赏，切勿随意触碰泉水。\n    - **规划行程**：建议为热海景区预留**3-4小时**的游览时间。景区内步道不走回头路，出入口有观光车接送。\n\n希望这些信息能帮助你更好地了解热海大滚锅。如果你对腾冲的其他景点或者行程规划有更多疑问，我很乐意提供进一步的信息。",
+    "mask": 1
+  },
+  {
+    "role": "user",
+    "content": "世界著名温泉有哪些",
+    "task": "action"
+  },
+  {
+    "role": "assistant",
+    "content": "Search"
+  }
+]

encoding/tests/test_output_1.txt ADDED Viewed

	@@ -0,0 +1,36 @@

+<｜begin▁of▁sentence｜>You are a helpful assistant.
+## Tools
+You have access to a set of tools to help answer the user's question. You can invoke tools by writing a "<｜DSML｜tool_calls>" block like the following:
+<｜DSML｜tool_calls>
+<｜DSML｜invoke name="$TOOL_NAME">
+<｜DSML｜parameter name="$PARAMETER_NAME" string="true|false">$PARAMETER_VALUE</｜DSML｜parameter>
+...
+</｜DSML｜invoke>
+<｜DSML｜invoke name="$TOOL_NAME2">
+...
+</｜DSML｜invoke>
+</｜DSML｜tool_calls>
+String parameters should be specified as is and set `string="true"`. For all other types (numbers, booleans, arrays, objects), pass the value in JSON format and set `string="false"`.
+If thinking_mode is enabled (triggered by <think>), you MUST output your complete reasoning inside <think>...</think> BEFORE any tool calls or final response.
+Otherwise, output directly after </think> with tool calls or final response.
+### Available Tool Schemas
+{"name": "get_weather", "description": "Get the weather for a specific location", "parameters": {"type": "object", "properties": {"location": {"type": "string", "description": "The city name"}, "unit": {"type": "string", "enum": ["celsius", "fahrenheit"], "description": "Temperature unit"}}, "required": ["location"]}}
+{"name": "search", "description": "Search the web for information", "parameters": {"type": "object", "properties": {"query": {"type": "string", "description": "Search query"}, "num_results": {"type": "integer", "description": "Number of results to return"}}, "required": ["query"]}}
+You MUST strictly follow the above defined tool name and parameter schemas to invoke tool calls.
+<｜User｜>What's the weather in Beijing?<｜Assistant｜><think>The user wants to know the weather in Beijing. I should use the get_weather tool.</think>
+<｜DSML｜tool_calls>
+<｜DSML｜invoke name="get_weather">
+<｜DSML｜parameter name="location" string="true">Beijing</｜DSML｜parameter>
+<｜DSML｜parameter name="unit" string="true">celsius</｜DSML｜parameter>
+</｜DSML｜invoke>
+</｜DSML｜tool_calls><｜end▁of▁sentence｜><｜User｜><tool_result>{"temperature": 22, "condition": "sunny", "humidity": 45}</tool_result><｜Assistant｜><think>Got the weather data. Let me format a nice response.</think>The weather in Beijing is currently sunny with a temperature of 22°C and 45% humidity.<｜end▁of▁sentence｜>

encoding/tests/test_output_2.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+ <｜begin▁of▁sentence｜>You are a helpful assistant.<｜User｜>Hello<｜Assistant｜></think>Hi there! How can I help you?<｜end▁of▁sentence｜><｜User｜>What is the capital of France?<｜Assistant｜><think>The user asks about the capital of France. It is Paris.</think>The capital of France is Paris.<｜end▁of▁sentence｜>

encoding/tests/test_output_3.txt ADDED Viewed

	@@ -0,0 +1,38 @@

+<｜begin▁of▁sentence｜>该助手为DeepSeek，由深度求索公司创造。<｜latest_reminder｜>2026-02-21,星期六,广州,App,中文<｜User｜>小柴胡冲剂和布洛芬能一起吃吗？
+CITATION FORMAT: 【{cursor_id}†L{start_line_id}(-L{end_line_id})?】
+## Tools
+You have access to a set of tools to help answer the user's question. You can invoke tools by writing a "<｜DSML｜tool_calls>" block like the following:
+<｜DSML｜tool_calls>
+<｜DSML｜invoke name="$TOOL_NAME">
+<｜DSML｜parameter name="$PARAMETER_NAME" string="true|false">$PARAMETER_VALUE</｜DSML｜parameter>
+...
+</｜DSML｜invoke>
+<｜DSML｜invoke name="$TOOL_NAME2">
+...
+</｜DSML｜invoke>
+</｜DSML｜tool_calls>
+String parameters should be specified as is and set `string="true"`. For all other types (numbers, booleans, arrays, objects), pass the value in JSON format and set `string="false"`.
+If thinking_mode is enabled (triggered by <think>), you MUST output your complete reasoning inside <think>...</think> BEFORE any tool calls or final response.
+Otherwise, output directly after </think> with tool calls or final response.
+### Available Tool Schemas
+{"name": "search", "description": "Web search. Split multiple queries with '||'.", "parameters": {"type": "object", "properties": {"queries": {"type": "string", "description": "query1||query2"}}, "required": ["queries"], "additionalProperties": false, "$schema": "http://json-schema.org/draft-07/schema#"}}
+{"name": "open", "description": "Batch open IDs (format 【{id}†...】) or URLs.", "parameters": {"type": "object", "properties": {"open_list": {"type": "array", "items": {"type": "object", "properties": {"id": {"description": "ID or URL", "anyOf": [{"type": "integer"}, {"type": "string"}], "default": -1}, "cursor": {"type": "integer", "description": "", "default": -1}, "loc": {"type": "integer", "description": "Start line", "default": -1}, "num_lines": {"type": "integer", "description": "", "default": -1}, "view_source": {"type": "boolean", "description": "", "default": false}}, "additionalProperties": false}, "description": ""}}, "required": ["open_list"], "additionalProperties": false, "$schema": "http://json-schema.org/draft-07/schema#"}}
+{"name": "find", "description": "Find exact text pattern in pages.", "parameters": {"type": "object", "properties": {"find_list": {"type": "array", "items": {"type": "object", "properties": {"pattern": {"type": "string", "description": ""}, "cursor": {"type": "integer", "description": "", "default": -1}}, "required": ["pattern"], "additionalProperties": false}, "description": ""}}, "required": ["find_list"], "additionalProperties": false, "$schema": "http://json-schema.org/draft-07/schema#"}}
+You MUST strictly follow the above defined tool name and parameter schemas to invoke tool calls.
+<｜Assistant｜><think>用户想知道小柴胡冲剂和布洛芬能否一起服用。</think>
+<｜DSML｜tool_calls>
+<｜DSML｜invoke name="search">
+<｜DSML｜parameter name="queries" string="true">小柴胡冲剂 布洛芬 相互作用 一起吃</｜DSML｜parameter>
+</｜DSML｜invoke>
+</｜DSML｜tool_calls><｜end▁of▁sentence｜><｜User｜><tool_result>[0]</tool_result><｜Assistant｜><think>现在开始组织回答。</think>请及时就医。<｜end▁of▁sentence｜>

encoding/tests/test_output_4.txt ADDED Viewed

	@@ -0,0 +1,29 @@

+<｜begin▁of▁sentence｜>该助手为DeepSeek-V3，由深度求索公司创造。
+今天是2025年10月17日，星期五。<｜latest_reminder｜>2024-11-15,上海市,App,中文<｜User｜>热海大滚锅是世界著名温泉吗<｜Assistant｜></think>关于热海大滚锅是否是世界著名温泉，可以这样总结：它在**中国乃至全球的地热奇观中占有重要地位**，但“世界著名”的称号更侧重于它作为**独特的地质现象和旅游景点**，而非普遍的温泉疗养体验。
+为了让你快速了解，我整理了一个简要的表格：
+| 维度 | 热海大滚锅的地位与特点 |
+| :--- | :--- |
+| **地质奇观** | **世界罕见**的高温地热系统，泉眼多、水温高、形态多样。 |
+| **国内声誉** | **中国三大地热区之一**，**国家5A级旅游景区**，自古闻名（徐霞客曾游历并记载）。 |
+| **国际知名度** | 在一些旅游平台被提及为“世界六大温泉”之一，但此说法流传不广，其国际声誉更多建立在地质独特性上。 |
+| **核心体验** | **观赏地热奇观**（如97℃沸腾的“大滚锅”）、**体验温泉煮鸡蛋**。 |
+### 💡 游玩攻略与温馨提示
+如果你计划前往热海大滚锅，这里有一些实用信息供你参考：
+- **门票与开放时间**：
+    - **门票**：景区门票约为**50元/人**。如果选择包含温泉沐浴的套餐，价格会更高，例如约**288元**。
+    - **开放时间**：景区一般**08:00-18:00**开放，但具体时间可能变动，建议提前核实。
+- **特色体验**：
+    - **温泉煮鸡蛋**：这几乎是必试项目。可以在景区门口购买用草绳串起的生鸡蛋（约5-8元/串），然后到“大滚锅”旁的指定区域蒸煮，几分钟便可熟食，趣味十足。
+    - **金汤足浴**：可以直接用从“大滚锅”流出的温泉水泡脚，缓解旅途疲劳。
+- **注意事项**：
+    - **安全第一**：“大滚锅”水温极高，务必遵守游览规则，在指定区域内观赏，切勿随意触碰泉水。
+    - **规划行程**：建议为热海景区预留**3-4小时**的游览时间。景区内步道不走回头路，出入口有观光车接送。
+希望这些信息能帮助你更好地了解热海大滚锅。如果你对腾冲的其他景点或者行程规划有更多疑问，我很乐意提供进一步的信息。<｜end▁of▁sentence｜><｜User｜>世界著名温泉有哪些<｜Assistant｜></think><｜action｜>Search<｜end▁of▁sentence｜>

generation_config.json ADDED Viewed

	@@ -0,0 +1,12 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 0,
+  "eos_token_id": [
+    1,
+    128803
+  ],
+  "do_sample": true,
+  "temperature": 1.0,
+  "top_p": 1.0,
+  "transformers_version": "4.46.3"
+}

jang_config.json ADDED Viewed

	@@ -0,0 +1,101 @@

+{
+  "weight_format": "mxtq",
+  "profile": "JANGTQ2",
+  "variant": "std",
+  "mxtq_seed": 42,
+  "drop_mtp": false,
+  "critical_f32_preserved": true,
+  "dsv4_runtime_requirements": {
+    "limited_swiglu_tq_patch": true,
+    "generic_mlx_sinks": false,
+    "native_cache_schema": "deepseek_v4_v7",
+    "generic_turboquant_kv": false
+  },
+  "quantization": {
+    "method": "affine+mxtq",
+    "routed_experts": {
+      "bits": 2,
+      "codec": "mxtq",
+      "bit_plan": {
+        "default_bits": 2,
+        "codec": "mxtq"
+      }
+    },
+    "non_routed": {
+      "bits": 8,
+      "codec": "affine",
+      "group_size": 32
+    },
+    "critical_control_tensors": "source-f32"
+  },
+  "source_model": "/Volumes/eric/models/Sources/DeepSeek-V4-Flash",
+  "source_config": {
+    "n_routed_experts": 256,
+    "num_hidden_layers": 43,
+    "n_hash_layers": 3
+  },
+  "routed_expert_bit_plan": {
+    "default_bits": 2,
+    "codec": "mxtq"
+  },
+  "mxtq_bits": {
+    "routed_expert": 2,
+    "attention": 8,
+    "shared_expert": 8,
+    "compressor": 8,
+    "indexer": 8,
+    "embed_tokens": 8,
+    "lm_head": 8,
+    "norms_router_hc": 16
+  },
+  "model_family": "deepseek_v4",
+  "chat": {
+    "encoder": "encoding_dsv4",
+    "encoder_fn": "encode_messages",
+    "chat_template_source": "builtin_encoding_module",
+    "has_tokenizer_chat_template": false,
+    "bos_token": "<\uff5cbegin\u2581of\u2581sentence\uff5c>",
+    "eos_token": "<\uff5cend\u2581of\u2581sentence\uff5c>",
+    "bos_token_id": 0,
+    "eos_token_id": 1,
+    "role_tokens": {
+      "user": "<\uff5cUser\uff5c>",
+      "assistant": "<\uff5cAssistant\uff5c>",
+      "latest_reminder": "<\uff5clatest_reminder\uff5c>"
+    },
+    "reasoning": {
+      "supported": true,
+      "modes": [
+        "chat",
+        "thinking"
+      ],
+      "default_mode": "chat",
+      "thinking_start": "<think>",
+      "thinking_end": "</think>",
+      "reasoning_effort_levels": [
+        "max",
+        "high",
+        null
+      ],
+      "drop_earlier_reasoning": true
+    },
+    "tool_calling": {
+      "supported": true,
+      "parser": "dsml",
+      "dsml_token": "\uff5cDSML\uff5c",
+      "tool_calls_block": "tool_calls",
+      "invoke_block": "invoke",
+      "parameter_block": "parameter",
+      "tool_output_tag": "tool_result"
+    },
+    "sampling_defaults": {
+      "temperature": 0.6,
+      "top_p": 0.95,
+      "repetition_penalty": 1.0,
+      "repetition_penalty_thinking": 1.0,
+      "repetition_penalty_chat": 1.05,
+      "max_new_tokens": 4096
+    }
+  },
+  "routed_expert_layout": "prestacked"
+}

jangtq_runtime.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4d9d6a16471eb320cb1182ff6eb3f00f0fd8ca47fe12a1f96a59b8058072f9ab
+size 24904

model-00001-of-00075.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:088ed989062e97cb71ccd12e92b8d3036dd7412e778b767b9cb61079c56bc2e4
+size 1075839457

model-00002-of-00075.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a94150b65120ddde9f91557006a7cd39aade24ef14d385e62fbd59b8d3cfb7e9
+size 1075839682

model-00003-of-00075.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6ee2e073f0dd50373827026878836a4c651e496afd0fcdea2bedd1cf50433f4e
+size 1076888258

model-00004-of-00075.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:73dd3f9012b95d8f7f20963b39107f22b0c6a843c4b7bacb1e78918894df3f8c
+size 1076888258

model-00005-of-00075.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2e20638be8a0991503c4e0e6b451f7513ae6b9ed7b6b182486501e5c391c0e8d
+size 1075839682

model-00006-of-00075.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b1d6bfb75d70461dbe705becf480bfde94e17ac3b0249534df21332345fafda4
+size 1076888266

model-00007-of-00075.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d24390d9fc5810048f521917b338e1b836b4f0ea1dcae87298df13ccb367dbbd
+size 1076888266

model-00008-of-00075.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7ead78397d9ef3327660433bbf89d16f69937242efbfd9b48fee6bf17420c81f
+size 1075839682

model-00009-of-00075.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:df5285b4a5c4a30c2d4f9aa7a8cafe2021a8e1d87c22a2ae3efdafe3808e9560
+size 1076888266

model-00010-of-00075.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a1aeba15162e6cf3dd5b5218c29e5a05232b15cf032cecf575152544b8266fa8
+size 1076888266

model-00011-of-00075.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:034276980d07a512ffa9ba583fc7e4701a297bac92d5f7dda64c079497b2c566
+size 1075839682

model-00012-of-00075.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4eb3a2429b0266bf4182aaa31ed6d9f1b5e21fd389b30613ca806af83dae0d86
+size 1076888266

model-00013-of-00075.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5160c7c27a16295a7a5791332ef48adfaf7b7560c948e42d4f1f8ab3e25c2558
+size 1076888266

model-00014-of-00075.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:20213048afa9fe1cdbc22efcd7d2d7a6209364716661e16e0377478b54b84917
+size 1075839682

model-00015-of-00075.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ec554fba2ae5cb80e315c4aa48118dca8573b0283f24162d133d9e325685a91d
+size 1076888266

model-00016-of-00075.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7918bba6b49b2ac75fa6875dc09ba149c67776a11bff0cf6cc4da970d0979892
+size 1076888266

model-00017-of-00075.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:60309a58123006e2e2c127a1ad5ec519b8260e1837fa181cb678d50fc46032f7
+size 1075839682

model-00018-of-00075.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:630aed5bc4cc32e8f003faaec7037c03d6cc3e2fbd8d311bfcc9534dc5e1f68e
+size 1076888266

model-00019-of-00075.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fe2538af49fbefc5fb89d1acf976a7b92980633c47caa33c7fcb2759fb20331b
+size 1076888258

model-00020-of-00075.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fc0e4ad24ed5556cf117585e3a33f91bd00b8541c67c7af660d7212d8f8b7da7
+size 1075839682

model-00021-of-00075.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:32b8cff0de0ce15ccc5dc567ffcf4ec89e29ae09fc6eff669fc68c9656a280da
+size 1076888266

model-00022-of-00075.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bba68aa12b96a7964b9fb13e27995f8f8133d175474fffe1ec48cb175d333a28
+size 1076888266

model-00023-of-00075.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2fe567e503d44731bb7fcbb4e9a0e4f1f27e52b2fcb12582220e00ecd1075594
+size 1075839682

model-00024-of-00075.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:02e9b3d56f69b37354c712293b27469613f2af31b70a902bdce1359104aa88c4
+size 1076888266

model-00025-of-00075.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:47063266b0e44b21f763fb37dbb7679baa15b3d3403e4d13dcbe74c0a6a2ba60
+size 1076888266

model-00026-of-00075.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e054e97984c9920b8b3c69cad1e79ff3af214d8d5567ef393ea1e68a61cc4bee
+size 1075839682

model-00027-of-00075.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3a88be58ebbdff6f50a41d5d7491652cbc00a6f21e9a3c189b0042b08711bbf1
+size 1076888266

model-00028-of-00075.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4fd0c8aa1620f5ba038d2d810d96f5e194b2efd133a03eeadabfcaa6c6a84b17
+size 1076888266

model-00029-of-00075.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6948b397d12078aa8324652f4162a42c6ee94fb53eda25da013c6685a3a0c5b6
+size 1075839682