Spaces:
Running
Running
separate Qwen3 thinking blocks into thinking.txt and summary.txt
Browse filesAdd parse_thinking_blocks() function to extract content between
`` tags, writing thinking content to thinking.txt and the
final summary to summary.txt. Maintains UTF-8 encoding and
Traditional Chinese (zh-TW) conversion.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- AGENTS.md +137 -0
- summarize_transcript.py +41 -3
AGENTS.md
ADDED
|
@@ -0,0 +1,137 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# AGENTS.md - Tiny Scribe Project Guidelines
|
| 2 |
+
|
| 3 |
+
## Project Overview
|
| 4 |
+
|
| 5 |
+
Tiny Scribe is a Python CLI tool for summarizing transcripts using GGUF models (e.g., ERNIE, Qwen) with llama-cpp-python. It supports live streaming output and Traditional Chinese (zh-TW) conversion via OpenCC.
|
| 6 |
+
|
| 7 |
+
## Build / Lint / Test Commands
|
| 8 |
+
|
| 9 |
+
**Run the script:**
|
| 10 |
+
```bash
|
| 11 |
+
python summarize_transcript.py -i ./transcripts/short.txt
|
| 12 |
+
python summarize_transcript.py -m unsloth/Qwen3-1.7B-GGUF:Q2_K_L
|
| 13 |
+
python summarize_transcript.py -c # CPU only
|
| 14 |
+
```
|
| 15 |
+
|
| 16 |
+
**Linting (if ruff installed):**
|
| 17 |
+
```bash
|
| 18 |
+
ruff check .
|
| 19 |
+
ruff check --select I . # Import sorting
|
| 20 |
+
```
|
| 21 |
+
|
| 22 |
+
**Type checking (if mypy installed):**
|
| 23 |
+
```bash
|
| 24 |
+
mypy summarize_transcript.py
|
| 25 |
+
```
|
| 26 |
+
|
| 27 |
+
**Running tests:**
|
| 28 |
+
```bash
|
| 29 |
+
# No test suite in root project yet
|
| 30 |
+
# Tests exist in llama-cpp-python/tests/ submodule
|
| 31 |
+
# To test llama-cpp-python:
|
| 32 |
+
cd llama-cpp-python && pip install ".[test]" && pytest tests/test_llama.py -v
|
| 33 |
+
```
|
| 34 |
+
|
| 35 |
+
**Single test:**
|
| 36 |
+
```bash
|
| 37 |
+
pytest tests/test_llama.py::test_function_name -v
|
| 38 |
+
```
|
| 39 |
+
|
| 40 |
+
## Code Style Guidelines
|
| 41 |
+
|
| 42 |
+
**Formatting:**
|
| 43 |
+
- Use 4 spaces for indentation
|
| 44 |
+
- Line length: 100 characters max
|
| 45 |
+
- Use double quotes for docstrings, single quotes for strings acceptable
|
| 46 |
+
- Two blank lines before function definitions
|
| 47 |
+
- One blank line after docstrings
|
| 48 |
+
|
| 49 |
+
**Imports:**
|
| 50 |
+
```python
|
| 51 |
+
# Standard library first
|
| 52 |
+
import os
|
| 53 |
+
import argparse
|
| 54 |
+
|
| 55 |
+
# Third-party packages
|
| 56 |
+
from llama_cpp import Llama
|
| 57 |
+
from huggingface_hub import hf_hub_download
|
| 58 |
+
from opencc import OpenCC
|
| 59 |
+
```
|
| 60 |
+
|
| 61 |
+
**Type Hints:**
|
| 62 |
+
- Use type hints for function parameters and return values
|
| 63 |
+
- Use `Optional[]` for nullable types
|
| 64 |
+
- Example: `def load_model(repo_id: str, filename: str, cpu_only: bool = False) -> Llama:`
|
| 65 |
+
|
| 66 |
+
**Naming Conventions:**
|
| 67 |
+
- `snake_case` for functions and variables
|
| 68 |
+
- `CamelCase` for classes
|
| 69 |
+
- `UPPER_CASE` for constants
|
| 70 |
+
- Descriptive names: `stream_summarize_transcript`, not `summ`
|
| 71 |
+
|
| 72 |
+
**Docstrings:**
|
| 73 |
+
- Use triple quotes for all public functions
|
| 74 |
+
- Include Args/Returns sections for complex functions
|
| 75 |
+
- Keep first line as a brief summary
|
| 76 |
+
|
| 77 |
+
**Error Handling:**
|
| 78 |
+
- Use explicit error messages with f-strings
|
| 79 |
+
- Check file existence before operations
|
| 80 |
+
- Use `try/except` blocks for external API calls (Hugging Face, model loading)
|
| 81 |
+
|
| 82 |
+
## Dependencies
|
| 83 |
+
|
| 84 |
+
**Required:**
|
| 85 |
+
- `llama-cpp-python` - Core inference engine
|
| 86 |
+
- `huggingface-hub` - Model downloading
|
| 87 |
+
- `opencc` - Chinese text conversion
|
| 88 |
+
|
| 89 |
+
**Development (optional):**
|
| 90 |
+
- `pytest` - Testing framework
|
| 91 |
+
- `ruff` - Linting and formatting
|
| 92 |
+
- `mypy` - Type checking
|
| 93 |
+
- `black` - Code formatting
|
| 94 |
+
|
| 95 |
+
## Project Structure
|
| 96 |
+
|
| 97 |
+
```
|
| 98 |
+
tiny-scribe/
|
| 99 |
+
βββ summarize_transcript.py # Main CLI script
|
| 100 |
+
βββ transcripts/ # Input transcript files
|
| 101 |
+
β βββ short.txt
|
| 102 |
+
β βββ full.txt
|
| 103 |
+
βββ summary.txt # Generated output
|
| 104 |
+
βββ llama-cpp-python/ # Git submodule
|
| 105 |
+
β βββ vendor/llama.cpp/ # Core C++ library
|
| 106 |
+
βββ README.md # Project documentation
|
| 107 |
+
```
|
| 108 |
+
|
| 109 |
+
## Usage Patterns
|
| 110 |
+
|
| 111 |
+
**Model Loading:**
|
| 112 |
+
```python
|
| 113 |
+
llm = Llama.from_pretrained(
|
| 114 |
+
repo_id="unsloth/Qwen3-0.6B-GGUF",
|
| 115 |
+
filename="*Q4_0.gguf",
|
| 116 |
+
n_gpu_layers=-1, # -1 for all GPU, 0 for CPU
|
| 117 |
+
n_ctx=32768, # Context window size
|
| 118 |
+
)
|
| 119 |
+
```
|
| 120 |
+
|
| 121 |
+
**Streaming Chat Completion:**
|
| 122 |
+
```python
|
| 123 |
+
stream = llm.create_chat_completion(
|
| 124 |
+
messages=[{"role": "user", "content": prompt}],
|
| 125 |
+
stream=True,
|
| 126 |
+
max_tokens=1024,
|
| 127 |
+
temperature=0.6,
|
| 128 |
+
)
|
| 129 |
+
```
|
| 130 |
+
|
| 131 |
+
## Notes for AI Agents
|
| 132 |
+
|
| 133 |
+
- This is a simple utility project; no formal CI/CD or test suite in root
|
| 134 |
+
- When modifying, maintain the existing streaming output pattern
|
| 135 |
+
- Always call `llm.reset()` after completion to ensure state isolation
|
| 136 |
+
- Model format: `repo_id:quant` (e.g., `unsloth/Qwen3-1.7B-GGUF:Q2_K_L`)
|
| 137 |
+
- Default language output is Traditional Chinese (zh-TW) via OpenCC conversion
|
summarize_transcript.py
CHANGED
|
@@ -5,6 +5,8 @@ Script to summarize transcript using ERNIE-4.5-21B-A3B-PT-GGUF model with SYCL a
|
|
| 5 |
|
| 6 |
import os
|
| 7 |
import argparse
|
|
|
|
|
|
|
| 8 |
from llama_cpp import Llama
|
| 9 |
from huggingface_hub import hf_hub_download
|
| 10 |
from opencc import OpenCC
|
|
@@ -32,6 +34,33 @@ def read_transcript(file_path):
|
|
| 32 |
content = f.read()
|
| 33 |
return content
|
| 34 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 35 |
def stream_summarize_transcript(llm, transcript):
|
| 36 |
"""
|
| 37 |
Perform live streaming summary by getting real-time token output from the model.
|
|
@@ -121,10 +150,19 @@ def main():
|
|
| 121 |
summary = stream_summarize_transcript(llm, transcript)
|
| 122 |
|
| 123 |
# Save summaries to files
|
| 124 |
-
|
| 125 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 126 |
|
| 127 |
-
|
|
|
|
|
|
|
|
|
|
| 128 |
|
| 129 |
# Clean up
|
| 130 |
del llm
|
|
|
|
| 5 |
|
| 6 |
import os
|
| 7 |
import argparse
|
| 8 |
+
import re
|
| 9 |
+
from typing import Tuple
|
| 10 |
from llama_cpp import Llama
|
| 11 |
from huggingface_hub import hf_hub_download
|
| 12 |
from opencc import OpenCC
|
|
|
|
| 34 |
content = f.read()
|
| 35 |
return content
|
| 36 |
|
| 37 |
+
def parse_thinking_blocks(content: str) -> Tuple[str, str]:
|
| 38 |
+
"""
|
| 39 |
+
Parse thinking blocks from Qwen3 model output.
|
| 40 |
+
|
| 41 |
+
Args:
|
| 42 |
+
content: Full model response containing thinking blocks and summary
|
| 43 |
+
|
| 44 |
+
Returns:
|
| 45 |
+
Tuple of (thinking_content, summary_content)
|
| 46 |
+
- thinking_content: All text between <think> tags (or empty string)
|
| 47 |
+
- summary_content: All text outside thinking blocks (or full content if no tags)
|
| 48 |
+
"""
|
| 49 |
+
pattern = r'<think>(.*?)</think>'
|
| 50 |
+
matches = re.findall(pattern, content, re.DOTALL)
|
| 51 |
+
|
| 52 |
+
if not matches:
|
| 53 |
+
# No thinking blocks found - return entire content as summary
|
| 54 |
+
return ("", content)
|
| 55 |
+
|
| 56 |
+
# Extract all thinking blocks
|
| 57 |
+
thinking = '\n\n'.join(match.strip() for match in matches)
|
| 58 |
+
|
| 59 |
+
# Remove thinking blocks from content to get summary
|
| 60 |
+
summary = re.sub(pattern, '', content, flags=re.DOTALL).strip()
|
| 61 |
+
|
| 62 |
+
return (thinking, summary)
|
| 63 |
+
|
| 64 |
def stream_summarize_transcript(llm, transcript):
|
| 65 |
"""
|
| 66 |
Perform live streaming summary by getting real-time token output from the model.
|
|
|
|
| 150 |
summary = stream_summarize_transcript(llm, transcript)
|
| 151 |
|
| 152 |
# Save summaries to files
|
| 153 |
+
# Parse thinking blocks and separate content
|
| 154 |
+
thinking_content, summary_content = parse_thinking_blocks(summary)
|
| 155 |
+
|
| 156 |
+
# Write thinking content if present
|
| 157 |
+
if thinking_content:
|
| 158 |
+
with open("thinking.txt", 'w', encoding='utf-8') as f:
|
| 159 |
+
f.write(thinking_content)
|
| 160 |
+
print(f"\n[Thinking content saved to thinking.txt ({len(thinking_content)} chars)]")
|
| 161 |
|
| 162 |
+
# Write summary content
|
| 163 |
+
with open("summary.txt", 'w', encoding='utf-8') as f:
|
| 164 |
+
f.write(summary_content)
|
| 165 |
+
print(f"[Summary saved to summary.txt ({len(summary_content)} chars)]")
|
| 166 |
|
| 167 |
# Clean up
|
| 168 |
del llm
|