Luigi Claude Sonnet 4.5 commited on
Commit
8a9d263
Β·
1 Parent(s): 3ec1246

separate Qwen3 thinking blocks into thinking.txt and summary.txt

Browse files

Add parse_thinking_blocks() function to extract content between
`` tags, writing thinking content to thinking.txt and the
final summary to summary.txt. Maintains UTF-8 encoding and
Traditional Chinese (zh-TW) conversion.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Files changed (2) hide show
  1. AGENTS.md +137 -0
  2. summarize_transcript.py +41 -3
AGENTS.md ADDED
@@ -0,0 +1,137 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # AGENTS.md - Tiny Scribe Project Guidelines
2
+
3
+ ## Project Overview
4
+
5
+ Tiny Scribe is a Python CLI tool for summarizing transcripts using GGUF models (e.g., ERNIE, Qwen) with llama-cpp-python. It supports live streaming output and Traditional Chinese (zh-TW) conversion via OpenCC.
6
+
7
+ ## Build / Lint / Test Commands
8
+
9
+ **Run the script:**
10
+ ```bash
11
+ python summarize_transcript.py -i ./transcripts/short.txt
12
+ python summarize_transcript.py -m unsloth/Qwen3-1.7B-GGUF:Q2_K_L
13
+ python summarize_transcript.py -c # CPU only
14
+ ```
15
+
16
+ **Linting (if ruff installed):**
17
+ ```bash
18
+ ruff check .
19
+ ruff check --select I . # Import sorting
20
+ ```
21
+
22
+ **Type checking (if mypy installed):**
23
+ ```bash
24
+ mypy summarize_transcript.py
25
+ ```
26
+
27
+ **Running tests:**
28
+ ```bash
29
+ # No test suite in root project yet
30
+ # Tests exist in llama-cpp-python/tests/ submodule
31
+ # To test llama-cpp-python:
32
+ cd llama-cpp-python && pip install ".[test]" && pytest tests/test_llama.py -v
33
+ ```
34
+
35
+ **Single test:**
36
+ ```bash
37
+ pytest tests/test_llama.py::test_function_name -v
38
+ ```
39
+
40
+ ## Code Style Guidelines
41
+
42
+ **Formatting:**
43
+ - Use 4 spaces for indentation
44
+ - Line length: 100 characters max
45
+ - Use double quotes for docstrings, single quotes for strings acceptable
46
+ - Two blank lines before function definitions
47
+ - One blank line after docstrings
48
+
49
+ **Imports:**
50
+ ```python
51
+ # Standard library first
52
+ import os
53
+ import argparse
54
+
55
+ # Third-party packages
56
+ from llama_cpp import Llama
57
+ from huggingface_hub import hf_hub_download
58
+ from opencc import OpenCC
59
+ ```
60
+
61
+ **Type Hints:**
62
+ - Use type hints for function parameters and return values
63
+ - Use `Optional[]` for nullable types
64
+ - Example: `def load_model(repo_id: str, filename: str, cpu_only: bool = False) -> Llama:`
65
+
66
+ **Naming Conventions:**
67
+ - `snake_case` for functions and variables
68
+ - `CamelCase` for classes
69
+ - `UPPER_CASE` for constants
70
+ - Descriptive names: `stream_summarize_transcript`, not `summ`
71
+
72
+ **Docstrings:**
73
+ - Use triple quotes for all public functions
74
+ - Include Args/Returns sections for complex functions
75
+ - Keep first line as a brief summary
76
+
77
+ **Error Handling:**
78
+ - Use explicit error messages with f-strings
79
+ - Check file existence before operations
80
+ - Use `try/except` blocks for external API calls (Hugging Face, model loading)
81
+
82
+ ## Dependencies
83
+
84
+ **Required:**
85
+ - `llama-cpp-python` - Core inference engine
86
+ - `huggingface-hub` - Model downloading
87
+ - `opencc` - Chinese text conversion
88
+
89
+ **Development (optional):**
90
+ - `pytest` - Testing framework
91
+ - `ruff` - Linting and formatting
92
+ - `mypy` - Type checking
93
+ - `black` - Code formatting
94
+
95
+ ## Project Structure
96
+
97
+ ```
98
+ tiny-scribe/
99
+ β”œβ”€β”€ summarize_transcript.py # Main CLI script
100
+ β”œβ”€β”€ transcripts/ # Input transcript files
101
+ β”‚ β”œβ”€β”€ short.txt
102
+ β”‚ └── full.txt
103
+ β”œβ”€β”€ summary.txt # Generated output
104
+ β”œβ”€β”€ llama-cpp-python/ # Git submodule
105
+ β”‚ └── vendor/llama.cpp/ # Core C++ library
106
+ └── README.md # Project documentation
107
+ ```
108
+
109
+ ## Usage Patterns
110
+
111
+ **Model Loading:**
112
+ ```python
113
+ llm = Llama.from_pretrained(
114
+ repo_id="unsloth/Qwen3-0.6B-GGUF",
115
+ filename="*Q4_0.gguf",
116
+ n_gpu_layers=-1, # -1 for all GPU, 0 for CPU
117
+ n_ctx=32768, # Context window size
118
+ )
119
+ ```
120
+
121
+ **Streaming Chat Completion:**
122
+ ```python
123
+ stream = llm.create_chat_completion(
124
+ messages=[{"role": "user", "content": prompt}],
125
+ stream=True,
126
+ max_tokens=1024,
127
+ temperature=0.6,
128
+ )
129
+ ```
130
+
131
+ ## Notes for AI Agents
132
+
133
+ - This is a simple utility project; no formal CI/CD or test suite in root
134
+ - When modifying, maintain the existing streaming output pattern
135
+ - Always call `llm.reset()` after completion to ensure state isolation
136
+ - Model format: `repo_id:quant` (e.g., `unsloth/Qwen3-1.7B-GGUF:Q2_K_L`)
137
+ - Default language output is Traditional Chinese (zh-TW) via OpenCC conversion
summarize_transcript.py CHANGED
@@ -5,6 +5,8 @@ Script to summarize transcript using ERNIE-4.5-21B-A3B-PT-GGUF model with SYCL a
5
 
6
  import os
7
  import argparse
 
 
8
  from llama_cpp import Llama
9
  from huggingface_hub import hf_hub_download
10
  from opencc import OpenCC
@@ -32,6 +34,33 @@ def read_transcript(file_path):
32
  content = f.read()
33
  return content
34
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
  def stream_summarize_transcript(llm, transcript):
36
  """
37
  Perform live streaming summary by getting real-time token output from the model.
@@ -121,10 +150,19 @@ def main():
121
  summary = stream_summarize_transcript(llm, transcript)
122
 
123
  # Save summaries to files
124
- with open("summary.txt", 'w', encoding='utf-8') as f:
125
- f.write(summary)
 
 
 
 
 
 
126
 
127
- print("\nSummaries saved to summary.txt.")
 
 
 
128
 
129
  # Clean up
130
  del llm
 
5
 
6
  import os
7
  import argparse
8
+ import re
9
+ from typing import Tuple
10
  from llama_cpp import Llama
11
  from huggingface_hub import hf_hub_download
12
  from opencc import OpenCC
 
34
  content = f.read()
35
  return content
36
 
37
+ def parse_thinking_blocks(content: str) -> Tuple[str, str]:
38
+ """
39
+ Parse thinking blocks from Qwen3 model output.
40
+
41
+ Args:
42
+ content: Full model response containing thinking blocks and summary
43
+
44
+ Returns:
45
+ Tuple of (thinking_content, summary_content)
46
+ - thinking_content: All text between <think> tags (or empty string)
47
+ - summary_content: All text outside thinking blocks (or full content if no tags)
48
+ """
49
+ pattern = r'<think>(.*?)</think>'
50
+ matches = re.findall(pattern, content, re.DOTALL)
51
+
52
+ if not matches:
53
+ # No thinking blocks found - return entire content as summary
54
+ return ("", content)
55
+
56
+ # Extract all thinking blocks
57
+ thinking = '\n\n'.join(match.strip() for match in matches)
58
+
59
+ # Remove thinking blocks from content to get summary
60
+ summary = re.sub(pattern, '', content, flags=re.DOTALL).strip()
61
+
62
+ return (thinking, summary)
63
+
64
  def stream_summarize_transcript(llm, transcript):
65
  """
66
  Perform live streaming summary by getting real-time token output from the model.
 
150
  summary = stream_summarize_transcript(llm, transcript)
151
 
152
  # Save summaries to files
153
+ # Parse thinking blocks and separate content
154
+ thinking_content, summary_content = parse_thinking_blocks(summary)
155
+
156
+ # Write thinking content if present
157
+ if thinking_content:
158
+ with open("thinking.txt", 'w', encoding='utf-8') as f:
159
+ f.write(thinking_content)
160
+ print(f"\n[Thinking content saved to thinking.txt ({len(thinking_content)} chars)]")
161
 
162
+ # Write summary content
163
+ with open("summary.txt", 'w', encoding='utf-8') as f:
164
+ f.write(summary_content)
165
+ print(f"[Summary saved to summary.txt ({len(summary_content)} chars)]")
166
 
167
  # Clean up
168
  del llm