NLLB Multi LoRA Adapter v2
- NOTE: Version 1 was initially trained on 5 million parallel sentences per pair, which led to a dramatic underestimation of compute unit cost and training time. Adapter v2 uses a downsampled version with 300k parallel sentences per pair to fit training within the allowed budget and time constraints.
This repository contains a LoRA adapter sampled on 300K bidirectional parallel sentences fine-tuned from facebook/nllb-200-distilled-600M.
Datasets
The following Datasets were used:
- OpenSubtitles --> en<->es
- NLLB --> en<->am; en<->uz
Each direction used 300K parallel sentences for a total of 1.8mil sentences across all 6 directions
Evaluation
Post-training BLEU was computed with SacreBLEU using generation with the correct NLLB forced_bos_token_id per target language.
BLEU results:
eng_Latn {'bleu': 20.14, 'count': 1000}
amh_Ethi {'bleu': 7.84, 'count': 1000}
spa_Latn {'bleu': 35.81, 'count': 1000}
uzb_Latn {'bleu': 3.82, 'count': 1000}
overall {'bleu': 16.36, 'count': 4000}
Carbon
co2_eq_emissions:
emissions: 0.28
source: Calculated after training with MLCO2 impact
training_type: LoRA adapter fine-tuning
geographical_location: Google Colab Provider - US Central
hardware_used: Google Colab Pro A100
Compute and Training Statistics
COMPUTE REPORT
✓ Training Statistics:
Total steps: 15,000
Total epochs: 2.00
Training time: 7107.43 seconds (1.97 hours)
✓ Compute Efficiency:
Samples/second: 67.53
Steps/second: 2.11
Total FLOPs: 7.97e+16
FLOPs per step: 5.32e+12
Loading
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
from peft import PeftModel
base_model = AutoModelForSeq2SeqLM.from_pretrained("facebook/nllb-200-distilled-600M")
tokenizer = AutoTokenizer.from_pretrained("facebook/nllb-200-distilled-600M")
model = PeftModel.from_pretrained(base_model, "rob-wav/nllb-multi-lora-adapter-v2")
"""
- Downloads last month
- 100
Model tree for rob-wav/nllb-multi-lora-adapter-v2
Base model
facebook/nllb-200-distilled-600M