ποΈ Manuscript-Mimic
AI Style Transfer for Scientific Writing
An agentic system that rewrites AI-generated scientific text to statistically match the stylometric profile of pre-2022 human-authored academic manuscripts.
Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Gradio UI (app.py) β
β ββββββββββββββββ βββββββββββββββββββββββββββββββββββ β
β β Reference PDF β β Target Draft (paste) β β
β β or Text β β β β
β ββββββββ¬ββββββββ ββββββββββββββ¬βββββββββββββββββββββ β
β β β β
β βΌ βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β rewrite_agent.py β CodeAgent β β
β β β β
β β Step 1: style_extractor(reference) β ref_metricsβ β
β β Step 2: style_extractor(target) β tgt_metricsβ β
β β Step 3: Rewrite target to match ref_metrics β β
β β Step 4: style_extractor(rewritten) β verify β β
β β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β style_extractor.py β Tool β β β
β β β β β β
β β β β’ Sentence Length Variance (Ο) β β β
β β β β’ Hedging Density (per sentence) β β β
β β β β’ Passive Voice Density (per sentence) β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββ β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Rewritten Text + Metrics β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Three Stylometric Metrics
| Metric | Description | Academic Signature |
|---|---|---|
| Sentence Length Variance | Ο of word counts per sentence | High variance = mix of short and long multi-clause sentences |
| Hedging Density | Hedge words per sentence (suggest, may, putative, indicate, could...) | Pre-2022 manuscripts hedge heavily in Results/Discussion |
| Passive Voice Density | Academic passive constructions per sentence (was performed, were analyzed...) | Methods sections are dominated by passive voice |
Quick Start
pip install -r requirements.txt
export HF_TOKEN="hf_..." # your Hugging Face token
python app.py # launches Gradio on http://localhost:7860
File Structure
manuscript_mimic/
βββ __init__.py # Package marker
βββ style_extractor.py # StyleExtractorTool + metric functions
βββ rewrite_agent.py # CodeAgent orchestrator + run_mimic()
βββ app.py # Gradio web UI
βββ requirements.txt # Dependencies
βββ README.md # This file
Usage
Via Gradio UI
- Upload a reference PDF or paste reference text (pre-2022 manuscript excerpt)
- Paste your AI-generated draft
- Select a model and click "Rewrite to Match Style"
- Review the rewritten text and compare metrics
Via Python API
from style_extractor import extract_style_metrics
from rewrite_agent import run_mimic
# Analyze a text
metrics = extract_style_metrics("Your academic text here...")
print(metrics)
# Rewrite to match a reference
rewritten = run_mimic(
reference_text="Pre-2022 manuscript excerpt...",
target_text="AI-generated draft...",
)
print(rewritten)
Models
The agent works with any model available on the HF Inference API:
Qwen/Qwen2.5-Coder-32B-Instruct(default β best for code-generation agents)meta-llama/Llama-3.3-70B-Instructmistralai/Mixtral-8x7B-Instruct-v0.1
License
MIT