gpt-oss-20b-balitanlp-cpt

This model is continuously pretrained model version of openai/gpt-oss-20b on an unknown dataset.

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-05
  • train_batch_size: 1
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 16
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 256
  • total_eval_batch_size: 128
  • optimizer: Use OptimizerNames.ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 10
  • training_steps: 100

TL;DR

This is pre-trained base model, it's not quite useful but you can see it learned some Filipino.

TODO: Serving examples, evals, instruction tuning.

Framework versions

  • Transformers 4.57.3
  • Pytorch 2.9.0a0+145a3a7bda.nv25.10
  • Datasets 4.4.2
  • Tokenizers 0.22.1

GPT-OSS 20B BalitaNLP - Continuous Pre-Training Validation

Model Information

  • Model: sapinsapin/gpt-oss-20b-balitanlp-cpt
  • Base: GPT-OSS 20B
  • Training: Continued pre-training on BalitaNLP dataset
  • GPUs: 4x NVIDIA GB200
  • API: OpenAI-compatible endpoint

Request

curl -X POST "http://localhost:8001/v1/chat/completions" \
  -H "Content-Type: application/json" \
  --data '{
    "model": "sapinsapin/gpt-oss-20b-balitanlp-cpt",
    "messages": [
      {
        "role": "user",
        "content": "Ang sabi nga nila, kapag bumagyo"
      }
    ]
  }'

Response

{
  "id": "chatcmpl-3ec05bed2618486eba9932752923ac74",
  "object": "chat.completion",
  "created": 1766881744,
  "model": "sapinsapin/gpt-oss-20b-balitanlp-cpt",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Ang sabi nga nila, habang umuulan…peke! Kaawa-awa kapag walang payong at, puro kamay lang ang magpapaihi. Ang siste, kapag naghatid yan ng baha at iligal na droga, iyong mga taong binalot ng mahiwagang double rainbow na lumilitaw after ng ulan? Mas marami, at hindi lang racial 'yan, fantasy straight and diss; K-Pop, drama addicts, ra-rambilli, furrys, chuchichu, pathetic pius, ppl w oarryng, TV-radio, screenplay trainees, and another set of naysayers na tara eh! Yah. Kasi, walang gustong mag bayad ng kabilisan bar matapos yan. At mga ballesteros, tambay lang. 😉 So, nasa dulo ng dulo, uulan din tayo ng pagmamahal. Na walang tuldok tsaka ng uso sa news! 😝 Huwag mong paihi ang sumpa at magpalinlabon na lang. Sunny days welcome pa rin. 🙌",
        "refusal": null,
        "reasoning_content": "User says: \"Ang sabi nga nila, kapag bumagyo\". Spanish? The user is likely trying to say \"The other night, it rained\" or referencing old sentimental slogan: \"Ang sabi nga nila, kapag bumagyo may hangin. Ang sabi nga nila, kapag umuulan, may baho. Ang sabi nga nila, kapag umuulan, may buhos. Ang sabi nga nila, kapag bumara, may umuupa.\" So maybe the user is praising this old tagline. Possibly they want a completion to the line..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 78,
    "completion_tokens": 689,
    "total_tokens": 767
  }
}

Purpose: Sense-Checking Continuous Pre-Training

This example demonstrates a quick validation that the model has successfully adapted to Filipino/Tagalog language through continuous pre-training (CPT). The response shows:

  • Language adaptation: Model generates fluent Tagalog text
  • Cultural context: Uses Filipino colloquialisms, code-switching, and emojis naturally
  • Vocabulary expansion: Handles Tagalog-specific words and phrases

What is Continuous Pre-Training (CPT)?

Continuous pre-training extends a foundation model's knowledge by training it on new domain-specific or language-specific data using the same next-token prediction objective as the original pre-training.

What CPT does:

  • Adapts the model to new languages, domains, or time periods
  • Expands vocabulary and linguistic patterns
  • Maintains general capabilities while adding specialized knowledge
  • Uses causal language modeling (predicting next tokens)

What CPT does NOT do:

  • Does not teach the model to follow instructions
  • Does not align the model with human preferences
  • Does not make the model conversational or helpful
  • Does not reduce hallucinations or improve safety

In this case:

  • Base model: GPT-OSS 20B (English-dominant)
  • CPT dataset: BalitaNLP (Filipino news articles)
  • Result: Model now understands and generates Filipino text

Next Steps: Post-Training

After CPT, the model needs post-training to become useful for real applications:

1. Supervised Fine-Tuning (SFT)

Purpose: Teach the model to follow instructions and respond helpfully

Process:

  • Train on instruction-response pairs
  • Examples: "Translate this to English: [text]" → "[translation]"
  • Dataset: High-quality Q&A, conversations, task completions
  • Objective: Cross-entropy loss on response tokens

Result: Model learns to be an assistant rather than just a text generator

2. Reinforcement Learning from Human Feedback (RLHF)

Purpose: Align model outputs with human preferences

Process:

  • Collect human preference data (A vs B comparisons)
  • Train a reward model to predict human preferences
  • Use PPO or DPO to optimize model outputs
  • Focus on: helpfulness, harmlessness, honesty

Result: Model produces more preferred, safer, and more aligned responses

3. Direct Preference Optimization (DPO)

Alternative to RLHF: Simpler approach that directly optimizes preferences

Process:

  • Train on preference pairs without a separate reward model
  • More stable and efficient than RLHF
  • Increasingly popular for post-training

Typical Training Pipeline

1. Pre-training (Foundation)
   ↓ (Large-scale, general text)
   Base Model (e.g., GPT-OSS 20B)
   
2. Continuous Pre-Training (CPT) ← WE ARE HERE
   ↓ (Domain/language-specific text)
   Adapted Model (e.g., Filipino GPT-OSS)
   
3. Supervised Fine-Tuning (SFT)
   ↓ (Instruction-response pairs)
   Instruction-Following Model
   
4. RLHF/DPO
   ↓ (Human preference data)
   Aligned Assistant Model

Token Usage

  • Prompt: 78 tokens
  • Completion: 689 tokens
  • Total: 767 tokens
  • Response time: ~6 seconds
Downloads last month
224
Safetensors
Model size
0.7B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sapinsapin/gpt-oss-20b-balitanlp-cpt

Finetuned
(501)
this model

Dataset used to train sapinsapin/gpt-oss-20b-balitanlp-cpt