cive202/humanize-ai-text-bart-base

Fine-tuned BART-base (facebook/bart-base) for AI → Human rewriting (“humanization”) via prefix-based conditional generation.

  • Architecture: encoder–decoder (seq2seq)
  • Parameters: ~139M
  • Task format: humanize: {ai_text}{human_text}

📄 Paper

“Rewriting the Machine: Encoder-Decoder vs. Decoder-Only Transformers for AI-to-Human Text Style Transfer”
Authors: Utsav Paneru et al.
arXiv: https://arxiv.org/abs/2604.11687v1
Status: Preprint (2026)

Citation

@misc{paneru2026makesoundlikehuman,
      title={Please Make it Sound like Human: Encoder-Decoder vs. Decoder-Only Transformers for AI-to-Human Text Style Transfer}, 
      author={Utsav Paneru},
      year={2026},
      eprint={2604.11687},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2604.11687}, 
}

Quickstart

pip install -U "transformers>=4.40.0" torch sentencepiece
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model_id = "cive202/humanize-ai-text-bart-base"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id)

ai_text = "Large language models often produce fluent, structured prose with recognizable regularities..."

inputs = tokenizer("humanize: " + ai_text, return_tensors="pt", truncation=True)

out = model.generate(
    **inputs,
    max_new_tokens=256,
    num_beams=4,
)

print(tokenizer.decode(out[0], skip_special_tokens=True))

Training note (important)

This checkpoint corresponds to a smoke-test / pipeline validation run, not a full training run.

Saved config characteristics:

  • max_steps = 10
  • max_train_samples = 128
  • num_train_epochs = 1

⚠️ Interpret results below as a lower-bound baseline, not a fully optimized model.


Dataset

Parallel chunk pairs created via sentence-aware chunking:

  • Train: 25,140 pairs
  • Validation: 1,390
  • Test: 1,390

Preprocessing

  • Sentence tokenization (NLTK)
  • Greedy token packing (≤200 tokens)
  • Filtering short pairs (<10 words)
  • Document-disjoint splits

Evaluation (test n = 1,390)

Reference similarity

  • BERTScore F1: 0.9088
  • ROUGE-L: 0.4448
  • chrF++: 46.4131

Fluency proxy

  • GPT-2 PPL (output): 26.6919
  • GPT-2 PPL (human): 23.6912

Style shift

  • Mean marker shift: 0.6513

This baseline partially shifts text toward human-like distributions but is limited by minimal training.


Limitations

  • Not a fully trained model (smoke-test configuration)
  • Limited style transformation strength
  • No guarantee of bypassing AI detectors
  • Lower performance compared to larger/full runs

Research context

Part of the unpublished 2026 manuscript:

“Rewriting the Machine: Encoder-Decoder vs. Decoder-Only Transformers for AI-to-Human Text Style Transfer”

  • Status: Not published
  • Link: [ADD WHEN AVAILABLE]

License

MIT (placeholder). Ensure compatibility with facebook/bart-base.


Downloads last month
202
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cive202/humanize-ai-text-bart-base

Finetuned
(502)
this model

Paper for cive202/humanize-ai-text-bart-base