Text Generation
Transformers
Safetensors
English
llama
small
tiny
story
tinystories
roneneldan
cpu
free
open-source
text-generation-inference
Instructions to use SupraLabs/StorySupra-10M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use SupraLabs/StorySupra-10M with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="SupraLabs/StorySupra-10M")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("SupraLabs/StorySupra-10M") model = AutoModelForCausalLM.from_pretrained("SupraLabs/StorySupra-10M") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use SupraLabs/StorySupra-10M with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "SupraLabs/StorySupra-10M" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SupraLabs/StorySupra-10M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/SupraLabs/StorySupra-10M
- SGLang
How to use SupraLabs/StorySupra-10M with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "SupraLabs/StorySupra-10M" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SupraLabs/StorySupra-10M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "SupraLabs/StorySupra-10M" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SupraLabs/StorySupra-10M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use SupraLabs/StorySupra-10M with Docker Model Runner:
docker model run hf.co/SupraLabs/StorySupra-10M
File size: 6,581 Bytes
01ff6cc 160f4be 01ff6cc 160f4be 01ff6cc | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 | ---
license: apache-2.0
datasets:
- roneneldan/TinyStories
language:
- en
pipeline_tag: text-generation
library_name: transformers
tags:
- small
- tiny
- story
- tinystories
- roneneldan
- cpu
- free
- open-source
---
# π StorySupra 10M
## Config
- Parameters: 12,587,264 (~10M)
- Hidden Size: 256
- Intermediate Size: 1024
- Hidden Layers: 8
- Attention Heads: 8
- Max Position Embeddings: 256
- Vocab Size: 8192
## Samples
Once upon a time , a small bird was flying in the sky . It saw a big tree and wanted to rest under it . But the tree was too high for the bird to reach . The bird tried to fly up , but it could not . Then , a wise old owl flew by and saw the bird struggling . The owl said , " Don ' t worry little bird , I can help you ." The owl used its strong beak to climb the tree and get the bird down . The bird was
<br><br>
Once upon a time , there was a little boy named Timmy . He loved to play with his toys and run around outside . One day , he found a shiny penny on the ground . It was so pretty that he picked it up and showed it to his mom . " Look , Mommy ! I found a penny !" he said . His mom smiled and said , " That ' s great , Timmy . But be careful , it ' s very special ." Timmy didn ' t understand what " valuable " meant , but he knew it meant something important . So
<br><br>
Once upon a time , there was a lovely princess . She had long , blonde hair and a sparkly crown . One day , she wanted to go for a walk in the forest . She put on her dress and started walking . As she walked , she saw something strange . It was a big , scary bear ! The princess was scared , but she didn ' t want to get away . So she just kept walking until she reached the forest . When she got there , she saw a little rabbit . He was wearing a bright red bow and he looked very friendly .
## Training
- GPU: single RTX 5060 Ti 16GB
- Time: ~20 minutes
- Epochs: 3
- Samples of the dataset: 200k
## Dataset
200k samples of roneneldan/TinyStories
## Code
You can find the full code in this repo as `train.py` and inference.py. Have fun :-)
## Usage
Use this to run the model:
```python3
"""
StorySupra-10M β Interactive Story Generator
Loads model weights directly from HuggingFace: SupraLabs/StorySupra-10M
"""
import torch
from transformers import LlamaForCausalLM, PreTrainedTokenizerFast
# ββββββββββββββββββββββββββββββββββββββββββββββ
# Configuration
# ββββββββββββββββββββββββββββββββββββββββββββββ
MODEL_ID = "SupraLabs/StorySupra-10M"
GENERATION_DEFAULTS = {
"max_new_tokens": 100,
"temperature": 0.55,
"top_k": 25,
"top_p": 0.85,
"repetition_penalty": 1.1,
"do_sample": True,
}
EXIT_COMMANDS = {"exit", "quit", "leave"}
# ββββββββββββββββββββββββββββββββββββββββββββββ
# Model loading
# ββββββββββββββββββββββββββββββββββββββββββββββ
def load_model(model_id: str):
"""Download and return the tokenizer and model from HuggingFace Hub."""
print(f"Downloading model from HuggingFace: {model_id}")
print("(This may take a moment on first run β weights will be cached locally.)\n")
tokenizer = PreTrainedTokenizerFast.from_pretrained(model_id)
model = LlamaForCausalLM.from_pretrained(model_id)
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}\n")
model.to(device)
model.eval()
return tokenizer, model, device
# ββββββββββββββββββββββββββββββββββββββββββββββ
# Text generation
# ββββββββββββββββββββββββββββββββββββββββββββββ
def generate_text(
prompt: str,
tokenizer,
model,
device: str,
max_new_tokens: int = GENERATION_DEFAULTS["max_new_tokens"],
temperature: float = GENERATION_DEFAULTS["temperature"],
top_k: int = GENERATION_DEFAULTS["top_k"],
top_p: float = GENERATION_DEFAULTS["top_p"],
repetition_penalty: float = GENERATION_DEFAULTS["repetition_penalty"],
) -> str:
"""Generate a story continuation from the given prompt."""
inputs = tokenizer(prompt, return_tensors="pt").to(device)
with torch.no_grad():
output_tokens = model.generate(
**inputs,
max_new_tokens=max_new_tokens,
do_sample=True,
temperature=temperature,
top_k=top_k,
top_p=top_p,
repetition_penalty=repetition_penalty,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id,
)
return tokenizer.decode(output_tokens[0], skip_special_tokens=True)
# ββββββββββββββββββββββββββββββββββββββββββββββ
# Interactive loop
# ββββββββββββββββββββββββββββββββββββββββββββββ
def run():
print("=" * 50)
print(" StorySupra-10M β Interactive Story Generator")
print("=" * 50)
tokenizer, model, device = load_model(MODEL_ID)
print("-" * 50)
print("Model ready! Type a prompt to generate a story.")
print(f"Type {' / '.join(EXIT_COMMANDS)} to quit.")
print("-" * 50)
while True:
try:
user_prompt = input("\nYour prompt: ").strip()
except (EOFError, KeyboardInterrupt):
print("\nExiting. Goodbye!")
break
if not user_prompt:
print("Please enter a prompt.")
continue
if user_prompt.lower() in EXIT_COMMANDS:
print("Goodbye!")
break
print("\nGenerating...\n")
story = generate_text(user_prompt, tokenizer, model, device)
print("Generated story:")
print("-" * 20)
print(story)
print("-" * 20)
# ββββββββββββββββββββββββββββββββββββββββββββββ
# Entry point
# ββββββββββββββββββββββββββββββββββββββββββββββ
if __name__ == "__main__":
run()
``` |