gpt-oss-20b-balitanlp-cpt

This model is continuously pretrained model version of openai/gpt-oss-20b on an unknown dataset.

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 3e-05
train_batch_size: 1
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 16
gradient_accumulation_steps: 16
total_train_batch_size: 256
total_eval_batch_size: 128
optimizer: Use OptimizerNames.ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 10
training_steps: 100

TL;DR

This is pre-trained base model, it's not quite useful but you can see it learned some Filipino.

TODO: Serving examples, evals, instruction tuning.

Framework versions

Transformers 4.57.3
Pytorch 2.9.0a0+145a3a7bda.nv25.10
Datasets 4.4.2
Tokenizers 0.22.1

GPT-OSS 20B BalitaNLP - Continuous Pre-Training Validation

Model Information

Model: sapinsapin/gpt-oss-20b-balitanlp-cpt
Base: GPT-OSS 20B
Training: Continued pre-training on BalitaNLP dataset
GPUs: 4x NVIDIA GB200
API: OpenAI-compatible endpoint

Request

curl -X POST "http://localhost:8001/v1/chat/completions" \
  -H "Content-Type: application/json" \
  --data '{
    "model": "sapinsapin/gpt-oss-20b-balitanlp-cpt",
    "messages": [
      {
        "role": "user",
        "content": "Ang sabi nga nila, kapag bumagyo"
      }
    ]
  }'

Response

{
  "id": "chatcmpl-3ec05bed2618486eba9932752923ac74",
  "object": "chat.completion",
  "created": 1766881744,
  "model": "sapinsapin/gpt-oss-20b-balitanlp-cpt",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Ang sabi nga nila, habang umuulan…peke! Kaawa-awa kapag walang payong at, puro kamay lang ang magpapaihi. Ang siste, kapag naghatid yan ng baha at iligal na droga, iyong mga taong binalot ng mahiwagang double rainbow na lumilitaw after ng ulan? Mas marami, at hindi lang racial 'yan, fantasy straight and diss; K-Pop, drama addicts, ra-rambilli, furrys, chuchichu, pathetic pius, ppl w oarryng, TV-radio, screenplay trainees, and another set of naysayers na tara eh! Yah. Kasi, walang gustong mag bayad ng kabilisan bar matapos yan. At mga ballesteros, tambay lang. 😉 So, nasa dulo ng dulo, uulan din tayo ng pagmamahal. Na walang tuldok tsaka ng uso sa news! 😝 Huwag mong paihi ang sumpa at magpalinlabon na lang. Sunny days welcome pa rin. 🙌",
        "refusal": null,
        "reasoning_content": "User says: \"Ang sabi nga nila, kapag bumagyo\". Spanish? The user is likely trying to say \"The other night, it rained\" or referencing old sentimental slogan: \"Ang sabi nga nila, kapag bumagyo may hangin. Ang sabi nga nila, kapag umuulan, may baho. Ang sabi nga nila, kapag umuulan, may buhos. Ang sabi nga nila, kapag bumara, may umuupa.\" So maybe the user is praising this old tagline. Possibly they want a completion to the line..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 78,
    "completion_tokens": 689,
    "total_tokens": 767
  }
}

Purpose: Sense-Checking Continuous Pre-Training

This example demonstrates a quick validation that the model has successfully adapted to Filipino/Tagalog language through continuous pre-training (CPT). The response shows:

✅ Language adaptation: Model generates fluent Tagalog text
✅ Cultural context: Uses Filipino colloquialisms, code-switching, and emojis naturally
✅ Vocabulary expansion: Handles Tagalog-specific words and phrases

What is Continuous Pre-Training (CPT)?

Continuous pre-training extends a foundation model's knowledge by training it on new domain-specific or language-specific data using the same next-token prediction objective as the original pre-training.

What CPT does:

Adapts the model to new languages, domains, or time periods
Expands vocabulary and linguistic patterns
Maintains general capabilities while adding specialized knowledge
Uses causal language modeling (predicting next tokens)

What CPT does NOT do:

Does not teach the model to follow instructions
Does not align the model with human preferences
Does not make the model conversational or helpful
Does not reduce hallucinations or improve safety

In this case:

Base model: GPT-OSS 20B (English-dominant)
CPT dataset: BalitaNLP (Filipino news articles)
Result: Model now understands and generates Filipino text

Next Steps: Post-Training

After CPT, the model needs post-training to become useful for real applications:

1. Supervised Fine-Tuning (SFT)

Purpose: Teach the model to follow instructions and respond helpfully

Process:

Train on instruction-response pairs
Examples: "Translate this to English: [text]" → "[translation]"
Dataset: High-quality Q&A, conversations, task completions
Objective: Cross-entropy loss on response tokens

Result: Model learns to be an assistant rather than just a text generator

2. Reinforcement Learning from Human Feedback (RLHF)

Purpose: Align model outputs with human preferences

Process:

Collect human preference data (A vs B comparisons)
Train a reward model to predict human preferences
Use PPO or DPO to optimize model outputs
Focus on: helpfulness, harmlessness, honesty

Result: Model produces more preferred, safer, and more aligned responses

3. Direct Preference Optimization (DPO)

Alternative to RLHF: Simpler approach that directly optimizes preferences

Process:

Train on preference pairs without a separate reward model
More stable and efficient than RLHF
Increasingly popular for post-training

Typical Training Pipeline

1. Pre-training (Foundation)
   ↓ (Large-scale, general text)
   Base Model (e.g., GPT-OSS 20B)
   
2. Continuous Pre-Training (CPT) ← WE ARE HERE
   ↓ (Domain/language-specific text)
   Adapted Model (e.g., Filipino GPT-OSS)
   
3. Supervised Fine-Tuning (SFT)
   ↓ (Instruction-response pairs)
   Instruction-Following Model
   
4. RLHF/DPO
   ↓ (Human preference data)
   Aligned Assistant Model

Token Usage

Prompt: 78 tokens
Completion: 689 tokens
Total: 767 tokens
Response time: ~6 seconds

Downloads last month: 224

Safetensors

Model size

0.7B params

Tensor type

F32

Model tree for sapinsapin/gpt-oss-20b-balitanlp-cpt

Base model

openai/gpt-oss-20b

Finetuned

(501)

this model

sapinsapin
/

gpt-oss-20b-balitanlp-cpt

gpt-oss-20b-balitanlp-cpt

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

TL;DR

Framework versions

GPT-OSS 20B BalitaNLP - Continuous Pre-Training Validation

Model Information

Request

Response

Purpose: Sense-Checking Continuous Pre-Training

What is Continuous Pre-Training (CPT)?

Next Steps: Post-Training

1. Supervised Fine-Tuning (SFT)

2. Reinforcement Learning from Human Feedback (RLHF)

3. Direct Preference Optimization (DPO)

Typical Training Pipeline

Token Usage

Model tree for sapinsapin/gpt-oss-20b-balitanlp-cpt

Dataset used to train sapinsapin/gpt-oss-20b-balitanlp-cpt