TOBA LLM
Model description
TOBA LLM is a language model built upon the TOBA (Tokenisasi Optimal Berbasis Aglutinasi) tokenization scheme. This approach is inspired by the Gasing Literacy Learning System (https://gasingacademy.org/), an educational framework designed to teach Indonesian by integrating reading, writing, and pronunciation while addressing the local characteristics of the language.
The TOBA tokenization is optimized for the agglutinative nature of Indonesian. By integrating principles from human literacy education with computational optimization, TOBA LLM offers a highly efficient and linguistically nuanced approach to language processing. This convergence of pedagogical principles and advanced language modeling techniques makes TOBA LLM particularly suited for tasks requiring a deep understanding of Indonesian, such as educational tools, natural language processing applications, and content generation.
Usage
The script supports two modes: completion and chat.
Setup
Python 3.8 or higher is required. To install the necessary dependencies:
pip install -r requirements.txt
Completion Mode
Generates a continuation of a single input prompt.
python infer.py completion
After execution, a prompt can be entered in the terminal. The model will generate a corresponding completion.
Chat Mode
Enables multi-turn interaction with the model in a conversational format.
python infer.py chat
The model maintains conversational context across turns. Press Ctrl+C to exit the session.
Model tree for ai-toba/toba-llm
Unable to build the model tree, the base model loops to the model itself. Learn more.