A PLT-5 model fine-tuned from allegro/plt5-base which can generate paraphrases in Polish.

This model may have worse quality than Wojtekb30/plt5-paraphraser-pl, but was trained only on my synthetic dataset, so has less licence restrictions regarding allowed usage.

https://huggingface.co/Wojtekb30/plt5-paraphraser-pl

Original model:

https://huggingface.co/allegro/plt5-base

Dataset used:

https://huggingface.co/datasets/Wojtekb30/Polish-paraphrases-12K-synthetic

Since the dataset was fully synthetic, the model may have worse quality than Wojtekb30/plt5-paraphraser-pl, but has less licence restrictions regarding allowed usage.

Inference code example:

Important note: inputs must start with "Parafrazuj: ", for example "Parafrazuj: Dzisiaj jest ładna pogoda.".

import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

MODEL_NAME = "Wojtekb30/plt5-paraphraser-pl"
#MODEL_NAME = "plt5-paraphraser-pl"

print("Loading model...")
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForSeq2SeqLM.from_pretrained(MODEL_NAME)

# Use GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()

def paraphrase(text, num_return_sequences=3):
    """
    Generate paraphrases for a given Polish input sentence.
    """

    # The model requires this prefix
    input_text = f"Parafrazuj: {text}"

    inputs = tokenizer(
        input_text,
        return_tensors="pt",
        max_length=256,
        truncation=True
    ).to(device)

    outputs = model.generate(
        **inputs,
        max_length=256,
        num_return_sequences=num_return_sequences,
        num_beams=5,
        do_sample=True,
        temperature=1.0,
        top_k=50,
        top_p=0.95,
    )

    paraphrases = [
        tokenizer.decode(output, skip_special_tokens=True)
        for output in outputs
    ]

    return paraphrases


if __name__ == "__main__":

    test_sentences = [
        "W nocy zapowiadane są bardzo silne opady deszczu, dlatego lepiej nie wychodzić z domu.",
        "Pomimo zmęczenia po ciężkim dniu pracy, Janek zdecydował się pójść na długi spacer z psem do lasu."
    ]

    for sentence in test_sentences:
        print("\nOriginal:", sentence)
        print("Paraphrases:")
        for i, p in enumerate(paraphrase(sentence), 1):
            print(f"{i}. {p}")

Model PLT-5 fine-tuned z allegro/plt5-base, który może generować parafrazy w języku polskim.

Ten model może mieć gorszą jakość niż Wojtekb30/plt5-paraphraser-pl, ponieważ został fine-tuned na w pełni systentycznym datasecie, ale ma mniej restrykcji licencyjnych.

https://huggingface.co/Wojtekb30/plt5-paraphraser-pl

Oryginalny model:

https://huggingface.co/allegro/plt5-base

Wykorzystany zbiór danych:

https://huggingface.co/datasets/Wojtekb30/Polish-paraphrases-12K-synthetic

Przykładowy kod:

Ważna uwaga: wejście musi się zaczynać od "Parafrazuj: ", np. "Parafrazuj: Dzisiaj jest ładna pogoda.".

import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

MODEL_NAME = "Wojtekb30/plt5-paraphraser-pl"
#MODEL_NAME = "plt5-paraphraser-pl"

print("Loading model...")
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForSeq2SeqLM.from_pretrained(MODEL_NAME)

# Use GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()

def paraphrase(text, num_return_sequences=3):
    """
    Generate paraphrases for a given Polish input sentence.
    """

    # The model requires this prefix
    input_text = f"Parafrazuj: {text}"

    inputs = tokenizer(
        input_text,
        return_tensors="pt",
        max_length=256,
        truncation=True
    ).to(device)

    outputs = model.generate(
        **inputs,
        max_length=256,
        num_return_sequences=num_return_sequences,
        num_beams=5,
        do_sample=True,
        temperature=1.0,
        top_k=50,
        top_p=0.95,
    )

    paraphrases = [
        tokenizer.decode(output, skip_special_tokens=True)
        for output in outputs
    ]

    return paraphrases


if __name__ == "__main__":

    test_sentences = [
        "W nocy zapowiadane są bardzo silne opady deszczu, dlatego lepiej nie wychodzić z domu.",
        "Pomimo zmęczenia po ciężkim dniu pracy, Janek zdecydował się pójść na długi spacer z psem do lasu."
    ]

    for sentence in test_sentences:
        print("\nOriginal:", sentence)
        print("Paraphrases:")
        for i, p in enumerate(paraphrase(sentence), 1):
            print(f"{i}. {p}")

Downloads last month: 5

Safetensors

Model size

0.4B params

Tensor type

F32

Model tree for Wojtekb30/plt5-paraphraser-pl-synthetic

Base model

allegro/plt5-base

Finetuned

(3)

this model

Wojtekb30
/

plt5-paraphraser-pl-synthetic