Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

tokyotech-llm

university

Activity Feed Request to join this org

AI & ML interests

None defined yet.

tokyotech-llm 's collections 16

GPT-OSS-Swallow-v0.1

tokyotech-llm/GPT-OSS-Swallow-20B-RL-v0.1

Text Generation • 21B • Updated Feb 20 • 3.1k • 20
tokyotech-llm/GPT-OSS-Swallow-120B-RL-v0.1

Text Generation • 117B • Updated Feb 20 • 1.12k • 15
tokyotech-llm/GPT-OSS-Swallow-20B-SFT-v0.1

Text Generation • 21B • Updated Feb 20 • 1.78k • 5
tokyotech-llm/GPT-OSS-Swallow-120B-SFT-v0.1

Text Generation • 117B • Updated Feb 20 • 413 • 2

Apache-2.0 Open High Quality Math Corpus

tokyotech-llm/swallow-math-v2

Viewer • Updated Nov 6, 2025 • 17.4M • 6.42k • 30

Llama-3.1-Swallow-v0.5

tokyotech-llm/Llama-3.1-Swallow-8B-v0.5

8B • Updated Jul 1, 2025 • 1.38k • 9
tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.5

Text Generation • 8B • Updated Jun 25, 2025 • 5.95k • • 18

Rewriting Pre-Training Data Boosts LLM Performance in Math and Code

tokyotech-llm/swallow-code

Viewer • Updated Mar 1 • 129M • 1.23k • 63
tokyotech-llm/Llama-3.1-8B-code-ablation-exp1-LR2.5e-5-MINLR2.5E-6-WD0.1-iter0002500

Updated Jul 4, 2025 • 69
tokyotech-llm/Llama-3.1-8B-code-ablation-exp1-LR2.5e-5-MINLR2.5E-6-WD0.1-iter0005000

8B • Updated Jul 4, 2025 • 5
tokyotech-llm/Llama-3.1-8B-code-ablation-exp1-LR2.5e-5-MINLR2.5E-6-WD0.1-iter0007500

8B • Updated Jul 4, 2025 • 8

Llama-3.3-Swallow

tokyotech-llm/Llama-3.3-Swallow-70B-Instruct-v0.4

Text Generation • 71B • Updated Jul 1, 2025 • 723 • 13
tokyotech-llm/Llama-3.3-Swallow-70B-v0.4

Text Generation • 71B • Updated May 31, 2025 • 470 • 4
tokyotech-llm/edu-classifier

Text Classification • Updated Jan 30, 2025 • 335 • 13

Llama-3-Swallow

tokyotech-llm/Llama-3-Swallow-8B-v0.1

Text Generation • Updated Oct 8, 2024 • 565 • • 12
tokyotech-llm/Llama-3-Swallow-70B-v0.1

Text Generation • Updated Oct 8, 2024 • 10 • • 6
tokyotech-llm/Llama-3-Swallow-8B-Instruct-v0.1

Text Generation • 8B • Updated Oct 8, 2024 • 8.74k • • 21
tokyotech-llm/Llama-3-Swallow-70B-Instruct-v0.1

Text Generation • 71B • Updated Oct 8, 2024 • 67 • • 7

Swallow-instruct

Swallow instruction tuning models

tokyotech-llm/Swallow-7b-instruct-hf

Text Generation • 7B • Updated Oct 8, 2024 • 268 • 44
tokyotech-llm/Swallow-13b-instruct-v0.1

Text Generation • 13B • Updated Oct 8, 2024 • 59 • 1
tokyotech-llm/Swallow-70b-instruct-v0.1

Text Generation • 69B • Updated Oct 8, 2024 • 33
tokyotech-llm/Swallow-7b-instruct-v0.1

Text Generation • 7B • Updated Oct 8, 2024 • 257 • 4

Swallow MX(Mixtral) models

tokyotech-llm/Swallow-MX-8x7b-NVE-v0.1

Text Generation • 47B • Updated Aug 17, 2024 • 42 • 29

Qwen3-Swallow-v0.2

tokyotech-llm/Qwen3-Swallow-8B-RL-v0.2

Text Generation • 8B • Updated Feb 23 • 4.38k • 9
tokyotech-llm/Qwen3-Swallow-30B-A3B-RL-v0.2

Text Generation • 31B • Updated Feb 23 • 438 • 6
tokyotech-llm/Qwen3-Swallow-32B-RL-v0.2

Text Generation • 33B • Updated Feb 23 • 1.34k • 1
tokyotech-llm/Qwen3-Swallow-8B-SFT-v0.2

Text Generation • 8B • Updated Feb 23 • 2.84k • 5

Apache-2.0 Open High Quality Code Corpus

tokyotech-llm/swallow-code-v2

Viewer • Updated Nov 8, 2025 • 147M • 12.9k • 35

Rewriting Pre-Training Data Boosts LLM Performance in Math and Code

tokyotech-llm/swallow-math

Viewer • Updated Mar 1 • 4.33M • 1k • 44
tokyotech-llm/Llama-3.1-8B-math-ablation-exp2-LR2.5e-5-WD0.1-iter0002500

8B • Updated May 7, 2025
tokyotech-llm/Llama-3.1-8B-math-ablation-exp2-LR2.5e-5-WD0.1-iter0005000

8B • Updated May 7, 2025 • 1
tokyotech-llm/Llama-3.1-8B-math-ablation-exp2-LR2.5e-5-WD0.1-iter0007500

8B • Updated May 7, 2025

Gemma-2-Swallow

tokyotech-llm/Gemma-2-Llama-Swallow-27b-pt-v0.1

Text Generation • 27B • Updated May 18, 2025 • 87 • 1
tokyotech-llm/Gemma-2-Llama-Swallow-9b-pt-v0.1

Text Generation • Updated May 18, 2025 • 1.3k • 1
tokyotech-llm/Gemma-2-Llama-Swallow-2b-pt-v0.1

Text Generation • Updated May 18, 2025 • 73
tokyotech-llm/Gemma-2-Llama-Swallow-2b-it-v0.1

Text Generation • Updated May 18, 2025 • 153 • 4

Llama-3.1-Swallow

tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.5

Text Generation • 8B • Updated Jun 25, 2025 • 5.95k • • 18
tokyotech-llm/Llama-3.1-Swallow-8B-v0.5

8B • Updated Jul 1, 2025 • 1.38k • 9
tokyotech-llm/Llama-3.1-Swallow-70B-Instruct-v0.3

Text Generation • 71B • Updated Apr 2, 2025 • 310 • 13
tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.3

Text Generation • 8B • Updated Apr 2, 2025 • 4.96k • • 24

Continual Pre-Training from Llama 2

tokyotech-llm/Swallow-7b-hf

Text Generation • 7B • Updated Oct 8, 2024 • 1.34k • 17
tokyotech-llm/Swallow-7b-instruct-hf

Text Generation • 7B • Updated Oct 8, 2024 • 268 • 44
tokyotech-llm/Swallow-7b-instruct-v0.1

Text Generation • 7B • Updated Oct 8, 2024 • 257 • 4
tokyotech-llm/Swallow-7b-plus-hf

Text Generation • Updated Oct 8, 2024 • 20 • 8

Swallow MS/MX (Mistral/Mixtral) models

tokyotech-llm/Swallow-MS-7b-v0.1

Text Generation • 7B • Updated Aug 17, 2024 • 432 • 28
tokyotech-llm/Swallow-MS-7b-instruct-v0.1

Text Generation • 7B • Updated Aug 17, 2024 • 29 • 14
tokyotech-llm/Swallow-MX-8x7b-NVE-v0.1

Text Generation • 47B • Updated Aug 17, 2024 • 42 • 29

Swallow-MS-instruct

tokyotech-llm/Swallow-MS-7b-instruct-v0.1

Text Generation • 7B • Updated Aug 17, 2024 • 29 • 14

GPT-OSS-Swallow-v0.1

tokyotech-llm/GPT-OSS-Swallow-20B-RL-v0.1

Text Generation • 21B • Updated Feb 20 • 3.1k • 20
tokyotech-llm/GPT-OSS-Swallow-120B-RL-v0.1

Text Generation • 117B • Updated Feb 20 • 1.12k • 15
tokyotech-llm/GPT-OSS-Swallow-20B-SFT-v0.1

Text Generation • 21B • Updated Feb 20 • 1.78k • 5
tokyotech-llm/GPT-OSS-Swallow-120B-SFT-v0.1

Text Generation • 117B • Updated Feb 20 • 413 • 2

Qwen3-Swallow-v0.2

tokyotech-llm/Qwen3-Swallow-8B-RL-v0.2

Text Generation • 8B • Updated Feb 23 • 4.38k • 9
tokyotech-llm/Qwen3-Swallow-30B-A3B-RL-v0.2

Text Generation • 31B • Updated Feb 23 • 438 • 6
tokyotech-llm/Qwen3-Swallow-32B-RL-v0.2

Text Generation • 33B • Updated Feb 23 • 1.34k • 1
tokyotech-llm/Qwen3-Swallow-8B-SFT-v0.2

Text Generation • 8B • Updated Feb 23 • 2.84k • 5

Apache-2.0 Open High Quality Math Corpus

tokyotech-llm/swallow-math-v2

Viewer • Updated Nov 6, 2025 • 17.4M • 6.42k • 30

Apache-2.0 Open High Quality Code Corpus

tokyotech-llm/swallow-code-v2

Viewer • Updated Nov 8, 2025 • 147M • 12.9k • 35

Llama-3.1-Swallow-v0.5

tokyotech-llm/Llama-3.1-Swallow-8B-v0.5

8B • Updated Jul 1, 2025 • 1.38k • 9
tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.5

Text Generation • 8B • Updated Jun 25, 2025 • 5.95k • • 18

Rewriting Pre-Training Data Boosts LLM Performance in Math and Code

tokyotech-llm/swallow-math

Viewer • Updated Mar 1 • 4.33M • 1k • 44
tokyotech-llm/Llama-3.1-8B-math-ablation-exp2-LR2.5e-5-WD0.1-iter0002500

8B • Updated May 7, 2025
tokyotech-llm/Llama-3.1-8B-math-ablation-exp2-LR2.5e-5-WD0.1-iter0005000

8B • Updated May 7, 2025 • 1
tokyotech-llm/Llama-3.1-8B-math-ablation-exp2-LR2.5e-5-WD0.1-iter0007500

8B • Updated May 7, 2025

Rewriting Pre-Training Data Boosts LLM Performance in Math and Code

tokyotech-llm/swallow-code

Viewer • Updated Mar 1 • 129M • 1.23k • 63
tokyotech-llm/Llama-3.1-8B-code-ablation-exp1-LR2.5e-5-MINLR2.5E-6-WD0.1-iter0002500

Updated Jul 4, 2025 • 69
tokyotech-llm/Llama-3.1-8B-code-ablation-exp1-LR2.5e-5-MINLR2.5E-6-WD0.1-iter0005000

8B • Updated Jul 4, 2025 • 5
tokyotech-llm/Llama-3.1-8B-code-ablation-exp1-LR2.5e-5-MINLR2.5E-6-WD0.1-iter0007500

8B • Updated Jul 4, 2025 • 8

Gemma-2-Swallow

tokyotech-llm/Gemma-2-Llama-Swallow-27b-pt-v0.1

Text Generation • 27B • Updated May 18, 2025 • 87 • 1
tokyotech-llm/Gemma-2-Llama-Swallow-9b-pt-v0.1

Text Generation • Updated May 18, 2025 • 1.3k • 1
tokyotech-llm/Gemma-2-Llama-Swallow-2b-pt-v0.1

Text Generation • Updated May 18, 2025 • 73
tokyotech-llm/Gemma-2-Llama-Swallow-2b-it-v0.1

Text Generation • Updated May 18, 2025 • 153 • 4

Llama-3.3-Swallow

tokyotech-llm/Llama-3.3-Swallow-70B-Instruct-v0.4

Text Generation • 71B • Updated Jul 1, 2025 • 723 • 13
tokyotech-llm/Llama-3.3-Swallow-70B-v0.4

Text Generation • 71B • Updated May 31, 2025 • 470 • 4
tokyotech-llm/edu-classifier

Text Classification • Updated Jan 30, 2025 • 335 • 13

Llama-3.1-Swallow

tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.5

Text Generation • 8B • Updated Jun 25, 2025 • 5.95k • • 18
tokyotech-llm/Llama-3.1-Swallow-8B-v0.5

8B • Updated Jul 1, 2025 • 1.38k • 9
tokyotech-llm/Llama-3.1-Swallow-70B-Instruct-v0.3

Text Generation • 71B • Updated Apr 2, 2025 • 310 • 13
tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.3

Text Generation • 8B • Updated Apr 2, 2025 • 4.96k • • 24

Llama-3-Swallow

tokyotech-llm/Llama-3-Swallow-8B-v0.1

Text Generation • Updated Oct 8, 2024 • 565 • • 12
tokyotech-llm/Llama-3-Swallow-70B-v0.1

Text Generation • Updated Oct 8, 2024 • 10 • • 6
tokyotech-llm/Llama-3-Swallow-8B-Instruct-v0.1

Text Generation • 8B • Updated Oct 8, 2024 • 8.74k • • 21
tokyotech-llm/Llama-3-Swallow-70B-Instruct-v0.1

Text Generation • 71B • Updated Oct 8, 2024 • 67 • • 7

Continual Pre-Training from Llama 2

tokyotech-llm/Swallow-7b-hf

Text Generation • 7B • Updated Oct 8, 2024 • 1.34k • 17
tokyotech-llm/Swallow-7b-instruct-hf

Text Generation • 7B • Updated Oct 8, 2024 • 268 • 44
tokyotech-llm/Swallow-7b-instruct-v0.1

Text Generation • 7B • Updated Oct 8, 2024 • 257 • 4
tokyotech-llm/Swallow-7b-plus-hf

Text Generation • Updated Oct 8, 2024 • 20 • 8

Swallow-instruct

Swallow instruction tuning models

tokyotech-llm/Swallow-7b-instruct-hf

Text Generation • 7B • Updated Oct 8, 2024 • 268 • 44
tokyotech-llm/Swallow-13b-instruct-v0.1

Text Generation • 13B • Updated Oct 8, 2024 • 59 • 1
tokyotech-llm/Swallow-70b-instruct-v0.1

Text Generation • 69B • Updated Oct 8, 2024 • 33
tokyotech-llm/Swallow-7b-instruct-v0.1

Text Generation • 7B • Updated Oct 8, 2024 • 257 • 4

Swallow MS/MX (Mistral/Mixtral) models

tokyotech-llm/Swallow-MS-7b-v0.1

Text Generation • 7B • Updated Aug 17, 2024 • 432 • 28
tokyotech-llm/Swallow-MS-7b-instruct-v0.1

Text Generation • 7B • Updated Aug 17, 2024 • 29 • 14
tokyotech-llm/Swallow-MX-8x7b-NVE-v0.1

Text Generation • 47B • Updated Aug 17, 2024 • 42 • 29

Swallow MX(Mixtral) models

tokyotech-llm/Swallow-MX-8x7b-NVE-v0.1

Text Generation • 47B • Updated Aug 17, 2024 • 42 • 29

Swallow-MS-instruct

tokyotech-llm/Swallow-MS-7b-instruct-v0.1

Text Generation • 7B • Updated Aug 17, 2024 • 29 • 14

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs