mukunda1729 commited on
Commit
66448b4
Β·
verified Β·
1 Parent(s): 8c4373e

Initial: token counter across model families

Browse files
Files changed (3) hide show
  1. README.md +27 -5
  2. app.py +85 -0
  3. requirements.txt +3 -0
README.md CHANGED
@@ -1,12 +1,34 @@
1
  ---
2
  title: Token Counter
3
- emoji: 🐠
4
- colorFrom: green
5
- colorTo: blue
6
  sdk: gradio
7
- sdk_version: 6.13.0
 
8
  app_file: app.py
9
  pinned: false
 
 
 
 
 
 
 
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  title: Token Counter
3
+ emoji: πŸ”’
4
+ colorFrom: yellow
5
+ colorTo: gray
6
  sdk: gradio
7
+ sdk_version: "5.49.1"
8
+ python_version: "3.12"
9
  app_file: app.py
10
  pinned: false
11
+ license: mit
12
+ short_description: "Count tokens across Claude, GPT, Llama tokenizers."
13
+ tags:
14
+ - tokenization
15
+ - llm
16
+ - context-window
17
+ - agentfit
18
  ---
19
 
20
+ # Token Counter
21
+
22
+ Paste any text and see how it tokenizes across Claude, GPT, and other model families. Powered by [`agentfit`](https://pypi.org/project/agentfit-py/).
23
+
24
+ ## Why?
25
+
26
+ - Different tokenizers split the same string very differently β€” Claude often uses ~half the tokens GPT does for the same Chinese / emoji input.
27
+ - Useful when budgeting prompts and deciding which model to use for non-English content.
28
+ - Sanity-check your own token counter against a reference.
29
+
30
+ ## Related
31
+
32
+ - [`agentfit` on PyPI](https://pypi.org/project/agentfit-py/)
33
+ - [The Agent Reliability Stack](https://mukundakatta.github.io/agent-stack/)
34
+ - Companion dataset: [`token-counting-edge-cases`](https://huggingface.co/datasets/mukunda1729/token-counting-edge-cases)
app.py ADDED
@@ -0,0 +1,85 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Token counter β€” count tokens for text across multiple model families.
2
+
3
+ Uses agentfit's pluggable counter. Falls back to char-based estimates when
4
+ exact tokenizers aren't available.
5
+ """
6
+
7
+ import gradio as gr
8
+ from agentfit import count
9
+
10
+
11
+ MODELS = [
12
+ "claude-sonnet-4-6",
13
+ "claude-haiku-4-5",
14
+ "gpt-5",
15
+ "gpt-4.1",
16
+ "default",
17
+ ]
18
+
19
+
20
+ def count_tokens(text: str, comparison: str):
21
+ """Count tokens for the given text across selected models."""
22
+ if not text.strip():
23
+ return "_Enter some text to count tokens._"
24
+
25
+ models = [m.strip() for m in comparison.split(",") if m.strip()] if comparison else MODELS
26
+ rows = ["| Model | Tokens | Chars/token |", "|---|---:|---:|"]
27
+ char_count = len(text)
28
+ for m in models:
29
+ try:
30
+ n = count([{"role": "user", "content": text}], model=m)
31
+ ratio = f"{char_count / n:.2f}" if n else "β€”"
32
+ rows.append(f"| `{m}` | {n} | {ratio} |")
33
+ except Exception as e:
34
+ rows.append(f"| `{m}` | β€” | error: {e} |")
35
+ return "\n".join(rows) + f"\n\n**Input:** {char_count} chars Β· {len(text.split())} words"
36
+
37
+
38
+ with gr.Blocks(title="Token Counter β€” across model families", theme=gr.themes.Soft()) as demo:
39
+ gr.Markdown(
40
+ """
41
+ # Token Counter
42
+
43
+ Paste any text and see how it tokenizes across Claude, GPT, and other model families.
44
+ Powered by [`agentfit`](https://pypi.org/project/agentfit-py/) β€” pure Python, no API calls.
45
+
46
+ πŸ’‘ Useful for: budgeting context windows, comparing tokenizer efficiency for non-English text, sanity-checking your own counter.
47
+ """
48
+ )
49
+
50
+ with gr.Row():
51
+ with gr.Column():
52
+ txt = gr.Textbox(
53
+ value="The quick brown fox jumps over the lazy dog.",
54
+ label="Text",
55
+ lines=10,
56
+ placeholder="Paste text here...",
57
+ )
58
+ models_in = gr.Textbox(
59
+ value=", ".join(MODELS),
60
+ label="Models (comma-separated)",
61
+ )
62
+ btn = gr.Button("Count", variant="primary")
63
+ out = gr.Markdown()
64
+ btn.click(count_tokens, inputs=[txt, models_in], outputs=out)
65
+
66
+ gr.Examples(
67
+ examples=[
68
+ ["Hello, world!", "claude-sonnet-4-6, gpt-5, default"],
69
+ ["δ½ ε₯½δΈ–η•Œ こんにけは μ•ˆλ…•ν•˜μ„Έμš”", "claude-sonnet-4-6, gpt-5, default"],
70
+ ["πŸš€πŸŽ‰πŸ€–βœ¨", "claude-sonnet-4-6, gpt-5, default"],
71
+ ["function add(a, b) {\n return a + b;\n}", "claude-sonnet-4-6, gpt-5"],
72
+ ],
73
+ inputs=[txt, models_in],
74
+ )
75
+
76
+ gr.Markdown(
77
+ """
78
+ ---
79
+ Part of [The Agent Reliability Stack](https://mukundakatta.github.io/agent-stack/) Β· MIT licensed
80
+ """
81
+ )
82
+
83
+
84
+ if __name__ == "__main__":
85
+ demo.launch(server_name="0.0.0.0", server_port=7860)
requirements.txt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ gradio==5.49.1
2
+ huggingface_hub>=0.30,<1.0
3
+ agentfit-py>=0.1.0