BART Summarizer β LoRA Fine-tuned on Lesson Texts
A LoRA fine-tuned version of facebook/bart-large-cnn for abstractive summarization of educational and lesson texts. Only ~2.08% of parameters were trained using PEFT/LoRA, resulting in a lightweight adapter on top of the already summarization-capable BART model.
Usage
import torch
from transformers import AutoTokenizer, BartForConditionalGeneration
from peft import PeftModel
model_id = "SeifElden2342532/children_educational_summarizer"
tokenizer = AutoTokenizer.from_pretrained(model_id)
base = BartForConditionalGeneration.from_pretrained("facebook/bart-large-cnn").to("cuda")
model = PeftModel.from_pretrained(base, model_id)
model = model.merge_and_unload()
model.eval()
text = "your lesson text here..."
inputs = tokenizer(
text,
max_length=1024,
truncation=True,
padding="max_length",
return_tensors="pt",
).to("cuda")
with torch.no_grad():
summary_ids = model.generate(
input_ids = inputs["input_ids"],
attention_mask = inputs["attention_mask"],
num_beams = 4,
max_length = 256,
early_stopping = True,
)
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
print(summary)
Model Details
|
|
| Base model |
facebook/bart-large-cnn |
| Fine-tuning |
LoRA (PEFT) |
| LoRA rank |
16 |
| LoRA alpha |
32 |
| LoRA dropout |
0.05 |
| Trainable params |
8,650,752 out of 414,941,184 (~2.08%) |
| Task |
Abstractive summarization |
| Dataset |
Custom lesson descriptions (~2000 samples) |
| Max input length |
1024 tokens |
| Max output length |
256 tokens |
| Training epochs |
8 |
| Effective batch size |
16 (batch 4 Γ grad accum 4) |
| Warmup steps |
100 |
| Weight decay |
0.01 |
| Precision |
fp16 |
| GPU |
NVIDIA H100 80GB |
| Framework |
HuggingFace Transformers + PEFT |
Training Details
- Architecture: BART-large pre-trained on CNN/DailyMail, adapted with LoRA for educational text summarization
- Target modules:
q_proj, v_proj, k_proj, out_proj, fc1, fc2
- Loss: Cross-entropy, padding tokens masked with -100
- Evaluation metric: ROUGE-1, ROUGE-2, ROUGE-L computed on validation set each epoch
- Best checkpoint: Selected automatically via
load_best_model_at_end=True
Dataset splits
| Split |
Samples |
| Train |
~1697 |
| Validation |
~189 |
| Test |
100 |
Training progress
| Epoch |
Eval Loss |
ROUGE-1 |
ROUGE-2 |
ROUGE-L |
| 1 |
2.040 |
44.14 |
15.65 |
26.80 |
| 2 |
1.947 |
46.10 |
17.03 |
28.25 |
| 3 |
1.908 |
46.82 |
17.63 |
28.54 |
| 4 |
1.885 |
47.30 |
18.13 |
28.87 |
| 5 |
1.873 |
47.42 |
18.13 |
29.18 |
| 6 |
1.866 |
47.96 |
18.44 |
29.40 |
| 7 |
1.864 |
47.84 |
18.20 |
29.32 |
| 8 |
1.935 |
47.41 |
17.51 |
28.51 |
Evaluation Results (Test Set)
| Metric |
Score |
| ROUGE-1 |
47.41 |
| ROUGE-2 |
17.51 |
| ROUGE-L |
28.51 |
| Eval Loss |
1.935 |
Limitations
- Optimized for educational/lesson text β may underperform on other domains
- Best results with inputs between 128β1024 tokens
- Max output is 256 tokens
License
Apache 2.0 β same as the base model.