Spaces:

Luigi
/

tiny-scribe

Running

Luigi Claude Sonnet 4.5 commited on Jan 30

Commit

8a9d263

1 Parent(s): 3ec1246

separate Qwen3 thinking blocks into thinking.txt and summary.txt

Add parse_thinking_blocks() function to extract content between
`` tags, writing thinking content to thinking.txt and the
final summary to summary.txt. Maintains UTF-8 encoding and
Traditional Chinese (zh-TW) conversion.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Files changed (2) hide show

AGENTS.md +137 -0
summarize_transcript.py +41 -3

AGENTS.md ADDED Viewed

	@@ -0,0 +1,137 @@

+# AGENTS.md - Tiny Scribe Project Guidelines
+## Project Overview
+Tiny Scribe is a Python CLI tool for summarizing transcripts using GGUF models (e.g., ERNIE, Qwen) with llama-cpp-python. It supports live streaming output and Traditional Chinese (zh-TW) conversion via OpenCC.
+## Build / Lint / Test Commands
+**Run the script:**
+```bash
+python summarize_transcript.py -i ./transcripts/short.txt
+python summarize_transcript.py -m unsloth/Qwen3-1.7B-GGUF:Q2_K_L
+python summarize_transcript.py -c  # CPU only
+```
+**Linting (if ruff installed):**
+```bash
+ruff check .
+ruff check --select I .  # Import sorting
+```
+**Type checking (if mypy installed):**
+```bash
+mypy summarize_transcript.py
+```
+**Running tests:**
+```bash
+# No test suite in root project yet
+# Tests exist in llama-cpp-python/tests/ submodule
+# To test llama-cpp-python:
+cd llama-cpp-python && pip install ".[test]" && pytest tests/test_llama.py -v
+```
+**Single test:**
+```bash
+pytest tests/test_llama.py::test_function_name -v
+```
+## Code Style Guidelines
+**Formatting:**
+- Use 4 spaces for indentation
+- Line length: 100 characters max
+- Use double quotes for docstrings, single quotes for strings acceptable
+- Two blank lines before function definitions
+- One blank line after docstrings
+**Imports:**
+```python
+# Standard library first
+import os
+import argparse
+# Third-party packages
+from llama_cpp import Llama
+from huggingface_hub import hf_hub_download
+from opencc import OpenCC
+```
+**Type Hints:**
+- Use type hints for function parameters and return values
+- Use `Optional[]` for nullable types
+- Example: `def load_model(repo_id: str, filename: str, cpu_only: bool = False) -> Llama:`
+**Naming Conventions:**
+- `snake_case` for functions and variables
+- `CamelCase` for classes
+- `UPPER_CASE` for constants
+- Descriptive names: `stream_summarize_transcript`, not `summ`
+**Docstrings:**
+- Use triple quotes for all public functions
+- Include Args/Returns sections for complex functions
+- Keep first line as a brief summary
+**Error Handling:**
+- Use explicit error messages with f-strings
+- Check file existence before operations
+- Use `try/except` blocks for external API calls (Hugging Face, model loading)
+## Dependencies
+**Required:**
+- `llama-cpp-python` - Core inference engine
+- `huggingface-hub` - Model downloading
+- `opencc` - Chinese text conversion
+**Development (optional):**
+- `pytest` - Testing framework
+- `ruff` - Linting and formatting
+- `mypy` - Type checking
+- `black` - Code formatting
+## Project Structure
+```
+tiny-scribe/
+├── summarize_transcript.py    # Main CLI script
+├── transcripts/               # Input transcript files
+│   ├── short.txt
+│   └── full.txt
+├── summary.txt                # Generated output
+├── llama-cpp-python/          # Git submodule
+│   └── vendor/llama.cpp/      # Core C++ library
+└── README.md                  # Project documentation
+```
+## Usage Patterns
+**Model Loading:**
+```python
+llm = Llama.from_pretrained(
+    repo_id="unsloth/Qwen3-0.6B-GGUF",
+    filename="*Q4_0.gguf",
+    n_gpu_layers=-1,  # -1 for all GPU, 0 for CPU
+    n_ctx=32768,      # Context window size
+)
+```
+**Streaming Chat Completion:**
+```python
+stream = llm.create_chat_completion(
+    messages=[{"role": "user", "content": prompt}],
+    stream=True,
+    max_tokens=1024,
+    temperature=0.6,
+)
+```
+## Notes for AI Agents
+- This is a simple utility project; no formal CI/CD or test suite in root
+- When modifying, maintain the existing streaming output pattern
+- Always call `llm.reset()` after completion to ensure state isolation
+- Model format: `repo_id:quant` (e.g., `unsloth/Qwen3-1.7B-GGUF:Q2_K_L`)
+- Default language output is Traditional Chinese (zh-TW) via OpenCC conversion

summarize_transcript.py CHANGED Viewed

@@ -5,6 +5,8 @@ Script to summarize transcript using ERNIE-4.5-21B-A3B-PT-GGUF model with SYCL a
 import os
 import argparse
 from llama_cpp import Llama
 from huggingface_hub import hf_hub_download
 from opencc import OpenCC
@@ -32,6 +34,33 @@ def read_transcript(file_path):
         content = f.read()
     return content
 def stream_summarize_transcript(llm, transcript):
     """
     Perform live streaming summary by getting real-time token output from the model.
@@ -121,10 +150,19 @@ def main():
     summary = stream_summarize_transcript(llm, transcript)
     # Save summaries to files
-    with open("summary.txt", 'w', encoding='utf-8') as f:
-        f.write(summary)
-    print("\nSummaries saved to summary.txt.")
     # Clean up
     del llm

 import os
 import argparse
+import re
+from typing import Tuple
 from llama_cpp import Llama
 from huggingface_hub import hf_hub_download
 from opencc import OpenCC
         content = f.read()
     return content
+def parse_thinking_blocks(content: str) -> Tuple[str, str]:
+    """
+    Parse thinking blocks from Qwen3 model output.
+    Args:
+        content: Full model response containing thinking blocks and summary
+    Returns:
+        Tuple of (thinking_content, summary_content)
+        - thinking_content: All text between <think> tags (or empty string)
+        - summary_content: All text outside thinking blocks (or full content if no tags)
+    """
+    pattern = r'<think>(.*?)</think>'
+    matches = re.findall(pattern, content, re.DOTALL)
+    if not matches:
+        # No thinking blocks found - return entire content as summary
+        return ("", content)
+    # Extract all thinking blocks
+    thinking = '\n\n'.join(match.strip() for match in matches)
+    # Remove thinking blocks from content to get summary
+    summary = re.sub(pattern, '', content, flags=re.DOTALL).strip()
+    return (thinking, summary)
 def stream_summarize_transcript(llm, transcript):
     """
     Perform live streaming summary by getting real-time token output from the model.
     summary = stream_summarize_transcript(llm, transcript)
     # Save summaries to files
+    # Parse thinking blocks and separate content
+    thinking_content, summary_content = parse_thinking_blocks(summary)
+    # Write thinking content if present
+    if thinking_content:
+        with open("thinking.txt", 'w', encoding='utf-8') as f:
+            f.write(thinking_content)
+        print(f"\n[Thinking content saved to thinking.txt ({len(thinking_content)} chars)]")
+    # Write summary content
+    with open("summary.txt", 'w', encoding='utf-8') as f:
+        f.write(summary_content)
+    print(f"[Summary saved to summary.txt ({len(summary_content)} chars)]")
     # Clean up
     del llm