Instructions to use AksaraLLM/aksarallm-1.5b-native with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use AksaraLLM/aksarallm-1.5b-native with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="AksaraLLM/aksarallm-1.5b-native")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("AksaraLLM/aksarallm-1.5b-native")
model = AutoModelForCausalLM.from_pretrained("AksaraLLM/aksarallm-1.5b-native")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use AksaraLLM/aksarallm-1.5b-native with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "AksaraLLM/aksarallm-1.5b-native"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AksaraLLM/aksarallm-1.5b-native",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/AksaraLLM/aksarallm-1.5b-native

SGLang

How to use AksaraLLM/aksarallm-1.5b-native with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "AksaraLLM/aksarallm-1.5b-native" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AksaraLLM/aksarallm-1.5b-native",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "AksaraLLM/aksarallm-1.5b-native" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AksaraLLM/aksarallm-1.5b-native",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use AksaraLLM/aksarallm-1.5b-native with Docker Model Runner:
```
docker model run hf.co/AksaraLLM/aksarallm-1.5b-native
```

aksarallm-1.5b-native / README.md

Ezekiel999

[Devin Audit] append real CPU-inference sample outputs

db4e1ad verified 5 days ago

preview code

raw

history blame contribute delete

4.15 kB

	---
	language:
	- id
	license: apache-2.0
	library_name: transformers
	pipeline_tag: text-generation
	tags:
	- indonesian
	- aksarallm
	- llama
	- pretrained-from-scratch
	- experimental
	---
	# aksarallm-1.5b-native

	The first fully from-scratch AksaraLLM 1.5B model (2.04B actual params),
	LLaMA-style architecture. Where the `AksaraLLM-Qwen-1.5B*` line is descended
	from Qwen2, this checkpoint contains no inherited weights — it was trained
	from random init on AksaraLLM's own corpus and tokenizer.

	## Measured baseline (Devin audit, CPU bf16, 50 short Indonesian sentences)

	\| Metric \| Value \|
	\|---\|---\|
	\| Perplexity \| 113.5 (much higher than Qwen-derived models, see below) \|
	\| English-stopword ratio in ID-prompted output \| 0.0% \|
	\| Indonesian-stopword ratio in ID-prompted output \| 31.3% (highest of any AksaraLLM model — most Indonesian-saturated) \|
	\| Parameters \| 2039.0 M \|
	\| Architecture \| LlamaForCausalLM \|
	\| Vocabulary \| 151 665 \|

	## Why the high perplexity?

	This model started from random init and has been trained on a smaller
	corpus than the Qwen2-derived models, which began with ~5 T tokens of pretraining
	already baked in. PPL ≈ 113 reflects "model is converging on Indonesian
	distribution but not fully there yet". The very high Indonesian-word
	ratio (31%) and zero English leak suggest the model is producing
	Indonesian-only output even when uncertain — a useful signal that the
	language identity is correctly trained, but the lexical / factual quality
	is below the Qwen-derived models.

	This is the honest from-scratch baseline for the AksaraLLM project. It is
	the right reference point when measuring how much value continued
	pretraining / from-scratch with a larger corpus delivers (which is exactly
	what the planned 20B aims to demonstrate).

	## Loading notes

	The checkpoint contains legacy `rope.sin_cached` and `rope.cos_cached`
	keys that are unexpected by HF's `LlamaForCausalLM`; HF silently drops
	them on load — this is benign. Same `tie_word_embeddings` config / checkpoint
	mismatch as the Qwen variants; recommend setting `tie_word_embeddings: false`
	in `config.json`.

	## Quickstart

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import torch

	tok = AutoTokenizer.from_pretrained("AksaraLLM/aksarallm-1.5b-native")
	model = AutoModelForCausalLM.from_pretrained(
	"AksaraLLM/aksarallm-1.5b-native",
	torch_dtype=torch.bfloat16,
	device_map="auto",
	)
	inp = tok("Indonesia adalah negara", return_tensors="pt").to(model.device)
	print(tok.decode(model.generate(**inp, max_new_tokens=120, do_sample=True, top_p=0.9)[0], skip_special_tokens=True))
	```

	## License
	Apache 2.0

	## Sample outputs (CPU inference, greedy, repeat_penalty=1.15)

	Real generation captured during the Devin audit on 2026-05-02. Greedy decoding (`do_sample=False`) so these are reproducible.

	### Continuations (no chat template)

	Prompt: Indonesia adalah negara kepulauan terbesar di dunia,
	Continuation: dan berada di kawasan Indonesia. Kebantahannya juga mengalami perubahan kecil yang memang tidak menentu. Pada tahun 1930, seorang pemerintat diperlakukan dengan pengelolaan yang lebih baik. Dalam permainan ini, perlu berbentuk kota dan kembali ke khus

	Prompt: Resep sederhana untuk membuat nasi goreng adalah
	Continuation: kandungan yang tidak akan bermanfaat dengan baik. Kota ini berikut dihadirkan pada kecilnya jelajah kembali dan menghasilkan khusus yang terbaik, dan memperhatikan perawatan dan pemeras kesehatan.
	Kembang kota juga menawarkan kota yang lebih banyak dipermainkan oleh

	Prompt: Ibukota Republik Indonesia adalah
	Continuation: kembali menggantikan pemerintahan Kepala Paling Jaya di Indonesia. Dalam Kualifikasi Tengku Sosial (KAN) 1029, Bapak M. Kusyari
	Pada tahun 1354 Kepala KAN berbagai permainan kecil yang terjadi dengan memperbaiki k

	Prompt: Bahasa Indonesia berasal dari rumpun bahasa
	Continuation: Kakik, dan permainan ini berbeda-beda. Dalam pemerintahan Jepang, kota Pekanba adalah kekuatan yang terlalu banyak di seorang penghiper 1930-2050. Tidak tidak mengbihnya untuk memperhatikan kembali kekayaan dan kegagalan ke