NLLB Multi LoRA Adapter v2

  • NOTE: Version 1 was initially trained on 5 million parallel sentences per pair, which led to a dramatic underestimation of compute unit cost and training time. Adapter v2 uses a downsampled version with 300k parallel sentences per pair to fit training within the allowed budget and time constraints.

This repository contains a LoRA adapter sampled on 300K bidirectional parallel sentences fine-tuned from facebook/nllb-200-distilled-600M.

Datasets

The following Datasets were used:

  • OpenSubtitles --> en<->es
  • NLLB --> en<->am; en<->uz

Each direction used 300K parallel sentences for a total of 1.8mil sentences across all 6 directions

Evaluation

Post-training BLEU was computed with SacreBLEU using generation with the correct NLLB forced_bos_token_id per target language.

BLEU results:
eng_Latn {'bleu': 20.14, 'count': 1000}
amh_Ethi {'bleu': 7.84, 'count': 1000}
spa_Latn {'bleu': 35.81, 'count': 1000}
uzb_Latn {'bleu': 3.82, 'count': 1000}
overall {'bleu': 16.36, 'count': 4000}

Carbon

co2_eq_emissions:
emissions: 0.28
source: Calculated after training with MLCO2 impact
training_type: LoRA adapter fine-tuning
geographical_location: Google Colab Provider - US Central
hardware_used: Google Colab Pro A100

Compute and Training Statistics

COMPUTE REPORT

✓ Training Statistics:
Total steps: 15,000
Total epochs: 2.00
Training time: 7107.43 seconds (1.97 hours)

✓ Compute Efficiency:
Samples/second: 67.53
Steps/second: 2.11
Total FLOPs: 7.97e+16
FLOPs per step: 5.32e+12

Loading

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
from peft import PeftModel

base_model = AutoModelForSeq2SeqLM.from_pretrained("facebook/nllb-200-distilled-600M")
tokenizer = AutoTokenizer.from_pretrained("facebook/nllb-200-distilled-600M")
model = PeftModel.from_pretrained(base_model, "rob-wav/nllb-multi-lora-adapter-v2")
"""
Downloads last month
100
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rob-wav/nllb-multi-lora-adapter-v2

Adapter
(68)
this model

Dataset used to train rob-wav/nllb-multi-lora-adapter-v2