Text Generation
PEFT
Safetensors
Transformers
qwen3
lora
sft
trl
lm-eval
bakat
indonesian
conversational
text-generation-inference
Instructions to use aitf-komdigi/KomdigiUB-8B-Base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use aitf-komdigi/KomdigiUB-8B-Base with PEFT:
Task type is invalid.
- Transformers
How to use aitf-komdigi/KomdigiUB-8B-Base with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="aitf-komdigi/KomdigiUB-8B-Base") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("aitf-komdigi/KomdigiUB-8B-Base") model = AutoModelForCausalLM.from_pretrained("aitf-komdigi/KomdigiUB-8B-Base") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use aitf-komdigi/KomdigiUB-8B-Base with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "aitf-komdigi/KomdigiUB-8B-Base" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "aitf-komdigi/KomdigiUB-8B-Base", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/aitf-komdigi/KomdigiUB-8B-Base
- SGLang
How to use aitf-komdigi/KomdigiUB-8B-Base with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "aitf-komdigi/KomdigiUB-8B-Base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "aitf-komdigi/KomdigiUB-8B-Base", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "aitf-komdigi/KomdigiUB-8B-Base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "aitf-komdigi/KomdigiUB-8B-Base", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use aitf-komdigi/KomdigiUB-8B-Base with Docker Model Runner:
docker model run hf.co/aitf-komdigi/KomdigiUB-8B-Base
Update README.md content
Browse files
README.md
CHANGED
|
@@ -1,76 +1,88 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
- **library_name:** transformers
|
| 4 |
-
- **base_model:** Qwen/Qwen3-8B
|
| 5 |
-
- **tags:** qwen, qwen3, causal-lm, continued-pretraining, indonesian, id, prd, dtp
|
| 6 |
-
- **license:** apache-2.0
|
| 7 |
-
- **language:** id, en
|
| 8 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
---
|
| 10 |
|
| 11 |
-
#
|
|
|
|
|
|
|
| 12 |
|
| 13 |
-
|
| 14 |
|
| 15 |
-
* **
|
| 16 |
-
* **Digital Talent Pool (DTP)** – Workforce and digital capability development
|
| 17 |
|
| 18 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
|
| 20 |
---
|
| 21 |
|
| 22 |
-
##
|
| 23 |
-
|
| 24 |
-
### Model Description
|
| 25 |
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
|
| 33 |
-
|
| 34 |
|
| 35 |
-
|
| 36 |
|
| 37 |
-
|
| 38 |
-
* Digital workforce & skill landscape (DTP)
|
| 39 |
|
| 40 |
-
-
|
| 41 |
|
| 42 |
-
|
|
|
|
|
|
|
| 43 |
|
| 44 |
-
|
| 45 |
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
| **PRD** | Cybersecurity, PDP Law, content moderation, hoax prevention | 92.0 | ~42.9% |
|
| 50 |
-
| **Wikipedia ID** | General knowledge anchor & grammar stability | 28.2 | ~13.2% |
|
| 51 |
-
| **Total** | — | **214.2** | **100%** |
|
| 52 |
|
| 53 |
---
|
| 54 |
|
| 55 |
-
##
|
| 56 |
|
| 57 |
-
|
|
|
|
|
|
|
| 58 |
|
| 59 |
-
|
| 60 |
|
| 61 |
-
|
| 62 |
-
* Misinformation pattern detection
|
| 63 |
-
* Understanding legal terminology (UU ITE, UU PDP)
|
| 64 |
|
| 65 |
-
##
|
| 66 |
|
| 67 |
-
*
|
| 68 |
-
*
|
| 69 |
-
*
|
| 70 |
|
| 71 |
---
|
| 72 |
|
| 73 |
-
##
|
| 74 |
|
| 75 |
Load the model using **HuggingFace Transformers**:
|
| 76 |
|
|
@@ -79,7 +91,7 @@ import torch
|
|
| 79 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
| 80 |
|
| 81 |
# 1. Configuration
|
| 82 |
-
model_id = "
|
| 83 |
|
| 84 |
# 2. Load Model
|
| 85 |
# Use bfloat16 for A100/A10G, float16 for T4
|
|
@@ -106,41 +118,65 @@ with torch.no_grad():
|
|
| 106 |
|
| 107 |
---
|
| 108 |
|
| 109 |
-
##
|
| 110 |
|
| 111 |
-
### Training
|
| 112 |
|
| 113 |
-
|
|
|
|
|
|
|
| 114 |
|
| 115 |
-
###
|
| 116 |
|
| 117 |
-
|
| 118 |
-
* **Training Duration:** ~36 hours
|
| 119 |
-
* **Frameworks:** PyTorch, Transformers, Accelerate
|
| 120 |
|
| 121 |
-
###
|
| 122 |
|
| 123 |
-
*
|
| 124 |
-
*
|
| 125 |
-
*
|
| 126 |
-
*
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 127 |
|
| 128 |
---
|
| 129 |
|
| 130 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 131 |
|
| 132 |
-
|
| 133 |
-
|
| 134 |
-
* **
|
|
|
|
|
|
|
| 135 |
|
| 136 |
---
|
| 137 |
|
| 138 |
-
##
|
|
|
|
|
|
|
| 139 |
|
| 140 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 141 |
|
| 142 |
-
|
| 143 |
-
* Add **high‑quality instruction datasets**
|
| 144 |
-
* Apply **evaluation benchmarks** before deployment
|
| 145 |
|
| 146 |
-
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
|
| 3 |
+
base_model: aitfindonesia/Bakat-8B-Base
|
| 4 |
+
library_name: peft
|
| 5 |
+
pipeline_tag: text-generation
|
| 6 |
+
anguage:
|
| 7 |
+
- id
|
| 8 |
+
tags:
|
| 9 |
+
- base_model:Qwen/Qwen3-8B
|
| 10 |
+
- lora
|
| 11 |
+
- sft
|
| 12 |
+
- transformers
|
| 13 |
+
- trl
|
| 14 |
+
- lm-eval
|
| 15 |
+
- biawak
|
| 16 |
+
- indonesian
|
| 17 |
+
license: apache-2.0
|
| 18 |
+
datasets:
|
| 19 |
+
- internal-curated
|
| 20 |
---
|
| 21 |
|
| 22 |
+
# Bakti-8B-Base
|
| 23 |
+
|
| 24 |
+
## Model Details
|
| 25 |
|
| 26 |
+
### Model Description
|
| 27 |
|
| 28 |
+
**Bakti-8B-Base** adalah base model bahasa Indonesia yang dirancang untuk **Continued Pre-Training (CPT)** pada domain kebijakan dan pengawasan ruang digital. Model ini merupakan turunan dari **Biawak-8B-Base** dan dibangun di atas arsitektur **Qwen3-8B**, dengan pendekatan **LoRA (Low-Rank Adaptation)** dan **4-bit quantization** untuk efisiensi memori dan komputasi.
|
|
|
|
| 29 |
|
| 30 |
+
* **Developed by**: Tim 1 AITF
|
| 31 |
+
* **Model type**: Causal Language Model (LoRA Adapter)
|
| 32 |
+
* **Base architecture**: Qwen3-8B
|
| 33 |
+
* **Primary language**: Indonesian (id)
|
| 34 |
+
* **License**: Apache-2.0
|
| 35 |
|
| 36 |
---
|
| 37 |
|
| 38 |
+
## Training Data Composition
|
|
|
|
|
|
|
| 39 |
|
| 40 |
+
| Kategori | Elemen | Jumlah Token (M) | Persentase |
|
| 41 |
+
| ---------------- | ----------------------------------------------------------------------------------------------------- | ---------------- | ---------- |
|
| 42 |
+
| **DTP** | Okupasi PON TIK, Tren Pekerjaan, Kompetensi & SDM, Kebijakan & Regulasi DTP, Teknologi Digital Talent | 94 | 43.9% |
|
| 43 |
+
| **PRD** | Judi Online, Hoax, Perlindungan Anak, Konten Edukasi, Kebijakan & Regulasi PRD, Kekerasan Masyarakat | 92 | 42.9% |
|
| 44 |
+
| **Wikipedia ID** | Pengetahuan Umum & Bahasa Daerah Seluruh Indonesia | 28.2 | 13.2% |
|
| 45 |
+
| **Total** | – | **214.2** | **100%** |
|
| 46 |
|
| 47 |
+
---
|
| 48 |
|
| 49 |
+
## Intended Use
|
| 50 |
|
| 51 |
+
### Direct Use (Recommended)
|
|
|
|
| 52 |
|
| 53 |
+
Model ini **ditujukan untuk Continued Pre-Training**, khususnya untuk:
|
| 54 |
|
| 55 |
+
* Adaptasi domain kebijakan publik dan regulasi digital
|
| 56 |
+
* Pengayaan pengetahuan spesifik Indonesia
|
| 57 |
+
* Pre-adaptation sebelum Instruction Tuning atau SFT
|
| 58 |
|
| 59 |
+
### Out-of-Scope Use
|
| 60 |
|
| 61 |
+
* **Long-context conversations** (belum dioptimalkan)
|
| 62 |
+
* **High-stakes decision making** (legal, medis, finansial)
|
| 63 |
+
* **Chat-oriented instruction following** tanpa fine-tuning lanjutan
|
|
|
|
|
|
|
|
|
|
| 64 |
|
| 65 |
---
|
| 66 |
|
| 67 |
+
## Bias, Risks, and Limitations
|
| 68 |
|
| 69 |
+
* Dataset didominasi oleh domain kebijakan dan pengawasan ruang digital, sehingga bias topikal dapat muncul pada domain non-terkait.
|
| 70 |
+
* Model belum melalui tahap preference alignment (RLHF/DPO).
|
| 71 |
+
* Konten Wikipedia digunakan sebagai penyeimbang, namun tidak menjamin netralitas penuh.
|
| 72 |
|
| 73 |
+
Pengguna disarankan melakukan evaluasi tambahan sebelum penggunaan produksi.
|
| 74 |
|
| 75 |
+
---
|
|
|
|
|
|
|
| 76 |
|
| 77 |
+
## Recommendations
|
| 78 |
|
| 79 |
+
* Gunakan **Qwen3 chat template** untuk hasil generasi terbaik.
|
| 80 |
+
* Lakukan **Instruction Fine-Tuning** atau **Preference Tuning** sebelum deployment ke end-user.
|
| 81 |
+
* Verifikasi keluaran model untuk informasi kritikal.
|
| 82 |
|
| 83 |
---
|
| 84 |
|
| 85 |
+
## How to Get Started
|
| 86 |
|
| 87 |
Load the model using **HuggingFace Transformers**:
|
| 88 |
|
|
|
|
| 91 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
| 92 |
|
| 93 |
# 1. Configuration
|
| 94 |
+
model_id = "aitfindonesia/Bakat-8B-Base" # Replace with your actual Hub ID
|
| 95 |
|
| 96 |
# 2. Load Model
|
| 97 |
# Use bfloat16 for A100/A10G, float16 for T4
|
|
|
|
| 118 |
|
| 119 |
---
|
| 120 |
|
| 121 |
+
## Training Details
|
| 122 |
|
| 123 |
+
### Training Data
|
| 124 |
|
| 125 |
+
* **Total size**: ~214M tokens
|
| 126 |
+
* **Domains**: Digital Talent Policy (DTP), Pengawasan Ruang Digital (PRD), Wikipedia Indonesia
|
| 127 |
+
* **Split**: Train (90%) / Validation (10%)
|
| 128 |
|
| 129 |
+
### Training Procedure
|
| 130 |
|
| 131 |
+
Model dilatih menggunakan **Continued Pre-Training (CPT)** dengan LoRA pada HuggingFace Transformers.
|
|
|
|
|
|
|
| 132 |
|
| 133 |
+
#### Hyperparameters
|
| 134 |
|
| 135 |
+
* **Precision**: bf16 (mixed precision)
|
| 136 |
+
* **Quantization**: 4-bit (nf4)
|
| 137 |
+
* **LoRA Rank (r)**: 8
|
| 138 |
+
* **LoRA Alpha**: 16
|
| 139 |
+
* **Target modules**: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
|
| 140 |
+
* **Batch size**: 4 / device
|
| 141 |
+
* **Gradient accumulation**: 16 (effective batch size = 32)
|
| 142 |
+
* **Learning rate**: 2e-4 (linear schedule)
|
| 143 |
+
* **Warmup ratio**: 0.03
|
| 144 |
+
* **Epochs**: 1
|
| 145 |
+
* **Optimizer**: adamw_8bit
|
| 146 |
|
| 147 |
---
|
| 148 |
|
| 149 |
+
## Evaluation
|
| 150 |
+
|
| 151 |
+
### Results
|
| 152 |
+
|
| 153 |
+
* **Final Training Loss**: ~1.2685
|
| 154 |
+
* **Final Validation Loss**: ~1.264
|
| 155 |
+
* **Training Perplexity**: ~3.56
|
| 156 |
+
* **Validation Perplexity**: ~3.55
|
| 157 |
|
| 158 |
+
### Benchmark (General)
|
| 159 |
+
|
| 160 |
+
* **MMLU**: ~74.20
|
| 161 |
+
* **IndoMMLU**: ~65.66
|
| 162 |
+
* **XCOPA-ID**: ~75.80
|
| 163 |
|
| 164 |
---
|
| 165 |
|
| 166 |
+
## Environmental Impact
|
| 167 |
+
|
| 168 |
+
Estimasi emisi karbon mengikuti metodologi Lacoste et al. (2019).
|
| 169 |
|
| 170 |
+
* **Hardware**: NVIDIA A100 80GB
|
| 171 |
+
* **Training time**: ~36 jam
|
| 172 |
+
* **Compute region**: Indonesia
|
| 173 |
+
* **Infrastructure**: University / Private Server
|
| 174 |
+
|
| 175 |
+
---
|
| 176 |
|
| 177 |
+
## Framework Versions
|
|
|
|
|
|
|
| 178 |
|
| 179 |
+
* Transformers: 4.x
|
| 180 |
+
* PyTorch: 2.x
|
| 181 |
+
* Datasets: 2.x
|
| 182 |
+
* Tokenizers: 0.x
|