Instructions to use AksaraLLM/AksaraLLM-20B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use AksaraLLM/AksaraLLM-20B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="AksaraLLM/AksaraLLM-20B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("AksaraLLM/AksaraLLM-20B") model = AutoModelForCausalLM.from_pretrained("AksaraLLM/AksaraLLM-20B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use AksaraLLM/AksaraLLM-20B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "AksaraLLM/AksaraLLM-20B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AksaraLLM/AksaraLLM-20B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/AksaraLLM/AksaraLLM-20B
- SGLang
How to use AksaraLLM/AksaraLLM-20B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "AksaraLLM/AksaraLLM-20B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AksaraLLM/AksaraLLM-20B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "AksaraLLM/AksaraLLM-20B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AksaraLLM/AksaraLLM-20B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use AksaraLLM/AksaraLLM-20B with Docker Model Runner:
docker model run hf.co/AksaraLLM/AksaraLLM-20B
Add corpus-build monitoring runbook (SSH, tmux, GCS commands)
Browse files- MONITORING.md +192 -0
MONITORING.md
ADDED
|
@@ -0,0 +1,192 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# AksaraLLM 20B β Monitoring Runbook
|
| 2 |
+
|
| 3 |
+
Corpus build is running in a **tmux session on `aksara-20b-v6e-8`**. It will keep running even after this Devin session ends. Use this runbook to check progress, diagnose issues, or stop / restart the build.
|
| 4 |
+
|
| 5 |
+
## Quick status (from your laptop)
|
| 6 |
+
|
| 7 |
+
```bash
|
| 8 |
+
# 1. Check bucket growth (how much corpus is in GCS)
|
| 9 |
+
gcloud storage du -s gs://aksarallm20b-eu/pretrain/v1/
|
| 10 |
+
|
| 11 |
+
# Per-source breakdown
|
| 12 |
+
gcloud storage du -s \
|
| 13 |
+
gs://aksarallm20b-eu/pretrain/v1/fineweb \
|
| 14 |
+
gs://aksarallm20b-eu/pretrain/v1/fineweb2_id \
|
| 15 |
+
gs://aksarallm20b-eu/pretrain/v1/culturax_id \
|
| 16 |
+
gs://aksarallm20b-eu/pretrain/v1/culturax_ms \
|
| 17 |
+
gs://aksarallm20b-eu/pretrain/v1/fineweb2_jv \
|
| 18 |
+
gs://aksarallm20b-eu/pretrain/v1/fineweb2_su \
|
| 19 |
+
gs://aksarallm20b-eu/pretrain/v1/culturax_jv \
|
| 20 |
+
gs://aksarallm20b-eu/pretrain/v1/culturax_su \
|
| 21 |
+
gs://aksarallm20b-eu/pretrain/v1/code_search_net \
|
| 22 |
+
gs://aksarallm20b-eu/pretrain/v1/wikipedia_id \
|
| 23 |
+
gs://aksarallm20b-eu/pretrain/v1/wikipedia_jv \
|
| 24 |
+
gs://aksarallm20b-eu/pretrain/v1/wikipedia_en
|
| 25 |
+
|
| 26 |
+
# 2. Read the latest manifest
|
| 27 |
+
gcloud storage cp gs://aksarallm20b-eu/pretrain/v1/manifest.json - | jq '.per_source_stats'
|
| 28 |
+
```
|
| 29 |
+
|
| 30 |
+
## Detailed status (SSH into TPU)
|
| 31 |
+
|
| 32 |
+
```bash
|
| 33 |
+
gcloud compute tpus tpu-vm ssh aksara-20b-v6e-8 --zone=europe-west4-a --worker=0
|
| 34 |
+
```
|
| 35 |
+
|
| 36 |
+
Once inside:
|
| 37 |
+
|
| 38 |
+
```bash
|
| 39 |
+
# See tmux sessions
|
| 40 |
+
tmux ls
|
| 41 |
+
# β corpus: 2 windows (created ...)
|
| 42 |
+
|
| 43 |
+
# Attach to live view (Ctrl-b n = next window, Ctrl-b d = detach)
|
| 44 |
+
tmux attach -t corpus
|
| 45 |
+
|
| 46 |
+
# Or just tail logs without attaching
|
| 47 |
+
tail -f ~/corpus_build.log # producer
|
| 48 |
+
tail -f ~/corpus_upload.log # uploader
|
| 49 |
+
|
| 50 |
+
# Local tmpfs state
|
| 51 |
+
du -sh /dev/shm/corpus_work/
|
| 52 |
+
find /dev/shm/corpus_work -type f | head
|
| 53 |
+
```
|
| 54 |
+
|
| 55 |
+
## Expected progress
|
| 56 |
+
|
| 57 |
+
| What | Now |
|
| 58 |
+
|---|---|
|
| 59 |
+
| Sources being processed | fineweb, fineweb2_id, culturax_id, culturax_ms, fineweb2_jv, fineweb2_su, culturax_jv, culturax_su, code_search_net, wikipedia_id, wikipedia_jv, wikipedia_en |
|
| 60 |
+
| Target per full run | 100B tokens (budget distributed by mix %) |
|
| 61 |
+
| Throughput | ~100k tokens/sec (CPU-bound on one VM) |
|
| 62 |
+
| First shard β GCS | already landed (see bucket) |
|
| 63 |
+
| Each new shard | ~6 min per 500 MB (~250M tokens) |
|
| 64 |
+
| Wall-clock for 100B | ~10β12 days single-threaded |
|
| 65 |
+
|
| 66 |
+
**If you want faster:** start more parallel producers on different source subsets β see "Scaling out" below.
|
| 67 |
+
|
| 68 |
+
## Key files on the TPU VM
|
| 69 |
+
|
| 70 |
+
| File | Purpose |
|
| 71 |
+
|---|---|
|
| 72 |
+
| `~/corpus_build_runner.sh` | Launches producer, auto-restarts on crash up to 20 times |
|
| 73 |
+
| `~/corpus_upload_loop.sh` | Wrapper; invokes `corpus_upload_loop.py` |
|
| 74 |
+
| `~/corpus_upload_loop.py` | Python+gcsfs mirror of tmpfs β GCS every 10 min, deletes local shards older than 20 min |
|
| 75 |
+
| `~/.aksara_env` | Holds `HF_TOKEN`; sourced by runner |
|
| 76 |
+
| `~/AksaraLLM/scripts/build_pretrain_corpus_v2.py` | The actual producer |
|
| 77 |
+
| `/dev/shm/corpus_work/` | Producer's output (RAM-backed 709 GB tmpfs) |
|
| 78 |
+
| `/home/ubuntu/corpus_build.log` | Producer log |
|
| 79 |
+
| `/home/ubuntu/corpus_upload.log` | Uploader log |
|
| 80 |
+
|
| 81 |
+
## Stop the build
|
| 82 |
+
|
| 83 |
+
```bash
|
| 84 |
+
# On TPU VM:
|
| 85 |
+
tmux kill-session -t corpus
|
| 86 |
+
# Any shards already uploaded stay in GCS. Local tmpfs shards not yet uploaded are lost on VM reboot.
|
| 87 |
+
```
|
| 88 |
+
|
| 89 |
+
## Restart the build (fresh)
|
| 90 |
+
|
| 91 |
+
```bash
|
| 92 |
+
# On TPU VM:
|
| 93 |
+
tmux kill-session -t corpus 2>/dev/null
|
| 94 |
+
rm -rf /dev/shm/corpus_work/*
|
| 95 |
+
tmux new-session -d -s corpus -n producer 'bash ~/corpus_build_runner.sh'
|
| 96 |
+
tmux new-window -t corpus -n uploader 'bash ~/corpus_upload_loop.sh'
|
| 97 |
+
```
|
| 98 |
+
|
| 99 |
+
Note: the producer deduplicates within its in-memory state only. A restart forgets previous near-dups, so restarts cost a small amount of extra duplicates. Exact-dedup (`sha256`) is reset too; cross-run dedup should be re-done at consolidation time before the real pretrain run.
|
| 100 |
+
|
| 101 |
+
## Scaling out (speed up 10Γ)
|
| 102 |
+
|
| 103 |
+
The current producer is single-threaded per-source, single-process per-VM. To produce 400B tokens in 2β3 weeks you need ~8β10 parallel producers. Easiest scheme: **partition sources across tmux windows**.
|
| 104 |
+
|
| 105 |
+
```bash
|
| 106 |
+
# Example: 3 parallel producers, each owning different sources
|
| 107 |
+
tmux kill-session -t corpus 2>/dev/null
|
| 108 |
+
|
| 109 |
+
# Producer 1: English (highest volume source)
|
| 110 |
+
tmux new-session -d -s corpus -n p1 \
|
| 111 |
+
'source ~/.aksara_env && cd ~/AksaraLLM && \
|
| 112 |
+
python3 -u scripts/build_pretrain_corpus_v2.py build \
|
| 113 |
+
--config configs/aksara_20b_dense.json \
|
| 114 |
+
--output-dir /dev/shm/corpus_work \
|
| 115 |
+
--assets-dir /dev/shm/pretrain_assets \
|
| 116 |
+
--target-total-tokens 400000000000 \
|
| 117 |
+
--shard-target-bytes 524288000 \
|
| 118 |
+
--no-decontam \
|
| 119 |
+
--sources fineweb,wikipedia_en 2>&1 | tee -a ~/corpus_p1.log'
|
| 120 |
+
|
| 121 |
+
# Producer 2: Indonesian bulk
|
| 122 |
+
tmux new-window -t corpus -n p2 \
|
| 123 |
+
'source ~/.aksara_env && cd ~/AksaraLLM && \
|
| 124 |
+
python3 -u scripts/build_pretrain_corpus_v2.py build \
|
| 125 |
+
--config configs/aksara_20b_dense.json \
|
| 126 |
+
--output-dir /dev/shm/corpus_work \
|
| 127 |
+
--assets-dir /dev/shm/pretrain_assets \
|
| 128 |
+
--target-total-tokens 400000000000 \
|
| 129 |
+
--shard-target-bytes 524288000 \
|
| 130 |
+
--no-decontam \
|
| 131 |
+
--sources fineweb2_id,culturax_id,wikipedia_id 2>&1 | tee -a ~/corpus_p2.log'
|
| 132 |
+
|
| 133 |
+
# Producer 3: Malay + JV + SU
|
| 134 |
+
tmux new-window -t corpus -n p3 \
|
| 135 |
+
'source ~/.aksara_env && cd ~/AksaraLLM && \
|
| 136 |
+
python3 -u scripts/build_pretrain_corpus_v2.py build \
|
| 137 |
+
--config configs/aksara_20b_dense.json \
|
| 138 |
+
--output-dir /dev/shm/corpus_work \
|
| 139 |
+
--assets-dir /dev/shm/pretrain_assets \
|
| 140 |
+
--target-total-tokens 400000000000 \
|
| 141 |
+
--shard-target-bytes 524288000 \
|
| 142 |
+
--no-decontam \
|
| 143 |
+
--sources culturax_ms,fineweb2_jv,fineweb2_su,culturax_jv,culturax_su,wikipedia_jv 2>&1 | tee -a ~/corpus_p3.log'
|
| 144 |
+
|
| 145 |
+
# Keep the uploader running
|
| 146 |
+
tmux new-window -t corpus -n uploader 'bash ~/corpus_upload_loop.sh'
|
| 147 |
+
```
|
| 148 |
+
|
| 149 |
+
β οΈ Each producer writes to `/dev/shm/corpus_work/{source}/shard-*.parquet`. Since no two producers share a source (if you partition correctly), they don't collide. The uploader mirrors everything to GCS.
|
| 150 |
+
|
| 151 |
+
## Troubleshooting
|
| 152 |
+
|
| 153 |
+
### "bucket size: 0.00 GB" stays zero
|
| 154 |
+
Uploader is running but nothing to sync because no shard has flushed yet. Shards flush every ~500 MB of text (~250M tokens). Wait 6β10 min after producer start.
|
| 155 |
+
|
| 156 |
+
### Producer keeps restarting
|
| 157 |
+
Check `tail -50 ~/corpus_build.log` for the actual error. Common causes:
|
| 158 |
+
- Rate limiting from HF Hub β wait a few minutes, it retries automatically
|
| 159 |
+
- Dataset schema change β lock `datasets` version in requirements
|
| 160 |
+
- OOM β `/dev/shm` filling up; the uploader should be purging old shards every 10 min, but if uploader crashed, local files accumulate
|
| 161 |
+
|
| 162 |
+
### `gcloud storage du` shows it working, but manifest.json missing
|
| 163 |
+
Manifest writes after each source completes. Early in the run only the first shard exists without a manifest. Manifest appears once fineweb (or any source) finishes.
|
| 164 |
+
|
| 165 |
+
### tmux session disappears after VM reboot
|
| 166 |
+
This is expected β tmux state doesn't survive reboots. TPU preemptible nodes can reboot. To make the producer survive reboots, configure a systemd service (see `Auto-restart on reboot` below).
|
| 167 |
+
|
| 168 |
+
## Auto-restart on reboot (optional)
|
| 169 |
+
|
| 170 |
+
Create `/etc/systemd/system/aksara-corpus.service` on TPU VM:
|
| 171 |
+
|
| 172 |
+
```ini
|
| 173 |
+
[Unit]
|
| 174 |
+
Description=AksaraLLM corpus build producer
|
| 175 |
+
After=network-online.target
|
| 176 |
+
|
| 177 |
+
[Service]
|
| 178 |
+
Type=simple
|
| 179 |
+
User=ubuntu
|
| 180 |
+
ExecStart=/bin/bash /home/ubuntu/corpus_build_runner.sh
|
| 181 |
+
Restart=always
|
| 182 |
+
RestartSec=60
|
| 183 |
+
StandardOutput=append:/home/ubuntu/corpus_build.log
|
| 184 |
+
StandardError=append:/home/ubuntu/corpus_build.log
|
| 185 |
+
|
| 186 |
+
[Install]
|
| 187 |
+
WantedBy=multi-user.target
|
| 188 |
+
```
|
| 189 |
+
|
| 190 |
+
Then: `sudo systemctl daemon-reload && sudo systemctl enable --now aksara-corpus`.
|
| 191 |
+
|
| 192 |
+
Same pattern for `aksara-corpus-uploader.service`.
|