Instructions to use AksaraLLM/AksaraLLM-20B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use AksaraLLM/AksaraLLM-20B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="AksaraLLM/AksaraLLM-20B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("AksaraLLM/AksaraLLM-20B")
model = AutoModelForCausalLM.from_pretrained("AksaraLLM/AksaraLLM-20B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use AksaraLLM/AksaraLLM-20B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "AksaraLLM/AksaraLLM-20B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AksaraLLM/AksaraLLM-20B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/AksaraLLM/AksaraLLM-20B

SGLang

How to use AksaraLLM/AksaraLLM-20B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "AksaraLLM/AksaraLLM-20B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AksaraLLM/AksaraLLM-20B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "AksaraLLM/AksaraLLM-20B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AksaraLLM/AksaraLLM-20B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use AksaraLLM/AksaraLLM-20B with Docker Model Runner:
```
docker model run hf.co/AksaraLLM/AksaraLLM-20B
```

Ezekiel999 commited on 30 days ago

Commit

cfacedb

verified ·

1 Parent(s): 88202f1

Add corpus-build monitoring runbook (SSH, tmux, GCS commands)

Browse files

Files changed (1) hide show

MONITORING.md +192 -0

MONITORING.md ADDED Viewed

	@@ -0,0 +1,192 @@

+# AksaraLLM 20B — Monitoring Runbook
+Corpus build is running in a **tmux session on `aksara-20b-v6e-8`**. It will keep running even after this Devin session ends. Use this runbook to check progress, diagnose issues, or stop / restart the build.
+## Quick status (from your laptop)
+```bash
+# 1. Check bucket growth (how much corpus is in GCS)
+gcloud storage du -s gs://aksarallm20b-eu/pretrain/v1/
+# Per-source breakdown
+gcloud storage du -s \
+  gs://aksarallm20b-eu/pretrain/v1/fineweb \
+  gs://aksarallm20b-eu/pretrain/v1/fineweb2_id \
+  gs://aksarallm20b-eu/pretrain/v1/culturax_id \
+  gs://aksarallm20b-eu/pretrain/v1/culturax_ms \
+  gs://aksarallm20b-eu/pretrain/v1/fineweb2_jv \
+  gs://aksarallm20b-eu/pretrain/v1/fineweb2_su \
+  gs://aksarallm20b-eu/pretrain/v1/culturax_jv \
+  gs://aksarallm20b-eu/pretrain/v1/culturax_su \
+  gs://aksarallm20b-eu/pretrain/v1/code_search_net \
+  gs://aksarallm20b-eu/pretrain/v1/wikipedia_id \
+  gs://aksarallm20b-eu/pretrain/v1/wikipedia_jv \
+  gs://aksarallm20b-eu/pretrain/v1/wikipedia_en
+# 2. Read the latest manifest
+gcloud storage cp gs://aksarallm20b-eu/pretrain/v1/manifest.json - | jq '.per_source_stats'
+```
+## Detailed status (SSH into TPU)
+```bash
+gcloud compute tpus tpu-vm ssh aksara-20b-v6e-8 --zone=europe-west4-a --worker=0
+```
+Once inside:
+```bash
+# See tmux sessions
+tmux ls
+# → corpus: 2 windows (created ...)
+# Attach to live view (Ctrl-b n = next window, Ctrl-b d = detach)
+tmux attach -t corpus
+# Or just tail logs without attaching
+tail -f ~/corpus_build.log      # producer
+tail -f ~/corpus_upload.log     # uploader
+# Local tmpfs state
+du -sh /dev/shm/corpus_work/
+find /dev/shm/corpus_work -type f | head
+```
+## Expected progress
+| What | Now |
+|---|---|
+| Sources being processed | fineweb, fineweb2_id, culturax_id, culturax_ms, fineweb2_jv, fineweb2_su, culturax_jv, culturax_su, code_search_net, wikipedia_id, wikipedia_jv, wikipedia_en |
+| Target per full run | 100B tokens (budget distributed by mix %) |
+| Throughput | ~100k tokens/sec (CPU-bound on one VM) |
+| First shard → GCS | already landed (see bucket) |
+| Each new shard | ~6 min per 500 MB (~250M tokens) |
+| Wall-clock for 100B | ~10–12 days single-threaded |
+**If you want faster:** start more parallel producers on different source subsets — see "Scaling out" below.
+## Key files on the TPU VM
+| File | Purpose |
+|---|---|
+| `~/corpus_build_runner.sh` | Launches producer, auto-restarts on crash up to 20 times |
+| `~/corpus_upload_loop.sh` | Wrapper; invokes `corpus_upload_loop.py` |
+| `~/corpus_upload_loop.py` | Python+gcsfs mirror of tmpfs → GCS every 10 min, deletes local shards older than 20 min |
+| `~/.aksara_env` | Holds `HF_TOKEN`; sourced by runner |
+| `~/AksaraLLM/scripts/build_pretrain_corpus_v2.py` | The actual producer |
+| `/dev/shm/corpus_work/` | Producer's output (RAM-backed 709 GB tmpfs) |
+| `/home/ubuntu/corpus_build.log` | Producer log |
+| `/home/ubuntu/corpus_upload.log` | Uploader log |
+## Stop the build
+```bash
+# On TPU VM:
+tmux kill-session -t corpus
+# Any shards already uploaded stay in GCS. Local tmpfs shards not yet uploaded are lost on VM reboot.
+```
+## Restart the build (fresh)
+```bash
+# On TPU VM:
+tmux kill-session -t corpus 2>/dev/null
+rm -rf /dev/shm/corpus_work/*
+tmux new-session -d -s corpus -n producer 'bash ~/corpus_build_runner.sh'
+tmux new-window -t corpus -n uploader 'bash ~/corpus_upload_loop.sh'
+```
+Note: the producer deduplicates within its in-memory state only. A restart forgets previous near-dups, so restarts cost a small amount of extra duplicates. Exact-dedup (`sha256`) is reset too; cross-run dedup should be re-done at consolidation time before the real pretrain run.
+## Scaling out (speed up 10×)
+The current producer is single-threaded per-source, single-process per-VM. To produce 400B tokens in 2–3 weeks you need ~8–10 parallel producers. Easiest scheme: **partition sources across tmux windows**.
+```bash
+# Example: 3 parallel producers, each owning different sources
+tmux kill-session -t corpus 2>/dev/null
+# Producer 1: English (highest volume source)
+tmux new-session -d -s corpus -n p1 \
+    'source ~/.aksara_env && cd ~/AksaraLLM && \
+     python3 -u scripts/build_pretrain_corpus_v2.py build \
+       --config configs/aksara_20b_dense.json \
+       --output-dir /dev/shm/corpus_work \
+       --assets-dir /dev/shm/pretrain_assets \
+       --target-total-tokens 400000000000 \
+       --shard-target-bytes 524288000 \
+       --no-decontam \
+       --sources fineweb,wikipedia_en 2>&1 | tee -a ~/corpus_p1.log'
+# Producer 2: Indonesian bulk
+tmux new-window -t corpus -n p2 \
+    'source ~/.aksara_env && cd ~/AksaraLLM && \
+     python3 -u scripts/build_pretrain_corpus_v2.py build \
+       --config configs/aksara_20b_dense.json \
+       --output-dir /dev/shm/corpus_work \
+       --assets-dir /dev/shm/pretrain_assets \
+       --target-total-tokens 400000000000 \
+       --shard-target-bytes 524288000 \
+       --no-decontam \
+       --sources fineweb2_id,culturax_id,wikipedia_id 2>&1 | tee -a ~/corpus_p2.log'
+# Producer 3: Malay + JV + SU
+tmux new-window -t corpus -n p3 \
+    'source ~/.aksara_env && cd ~/AksaraLLM && \
+     python3 -u scripts/build_pretrain_corpus_v2.py build \
+       --config configs/aksara_20b_dense.json \
+       --output-dir /dev/shm/corpus_work \
+       --assets-dir /dev/shm/pretrain_assets \
+       --target-total-tokens 400000000000 \
+       --shard-target-bytes 524288000 \
+       --no-decontam \
+       --sources culturax_ms,fineweb2_jv,fineweb2_su,culturax_jv,culturax_su,wikipedia_jv 2>&1 | tee -a ~/corpus_p3.log'
+# Keep the uploader running
+tmux new-window -t corpus -n uploader 'bash ~/corpus_upload_loop.sh'
+```
+⚠️ Each producer writes to `/dev/shm/corpus_work/{source}/shard-*.parquet`. Since no two producers share a source (if you partition correctly), they don't collide. The uploader mirrors everything to GCS.
+## Troubleshooting
+### "bucket size: 0.00 GB" stays zero
+Uploader is running but nothing to sync because no shard has flushed yet. Shards flush every ~500 MB of text (~250M tokens). Wait 6–10 min after producer start.
+### Producer keeps restarting
+Check `tail -50 ~/corpus_build.log` for the actual error. Common causes:
+- Rate limiting from HF Hub → wait a few minutes, it retries automatically
+- Dataset schema change → lock `datasets` version in requirements
+- OOM → `/dev/shm` filling up; the uploader should be purging old shards every 10 min, but if uploader crashed, local files accumulate
+### `gcloud storage du` shows it working, but manifest.json missing
+Manifest writes after each source completes. Early in the run only the first shard exists without a manifest. Manifest appears once fineweb (or any source) finishes.
+### tmux session disappears after VM reboot
+This is expected — tmux state doesn't survive reboots. TPU preemptible nodes can reboot. To make the producer survive reboots, configure a systemd service (see `Auto-restart on reboot` below).
+## Auto-restart on reboot (optional)
+Create `/etc/systemd/system/aksara-corpus.service` on TPU VM:
+```ini
+[Unit]
+Description=AksaraLLM corpus build producer
+After=network-online.target
+[Service]
+Type=simple
+User=ubuntu
+ExecStart=/bin/bash /home/ubuntu/corpus_build_runner.sh
+Restart=always
+RestartSec=60
+StandardOutput=append:/home/ubuntu/corpus_build.log
+StandardError=append:/home/ubuntu/corpus_build.log
+[Install]
+WantedBy=multi-user.target
+```
+Then: `sudo systemctl daemon-reload && sudo systemctl enable --now aksara-corpus`.
+Same pattern for `aksara-corpus-uploader.service`.