🖋️ Manuscript-Mimic

AI Style Transfer for Scientific Writing

An agentic system that rewrites AI-generated scientific text to statistically match the stylometric profile of pre-2022 human-authored academic manuscripts.

Architecture

┌─────────────────────────────────────────────────────────┐
│                    Gradio UI (app.py)                    │
│  ┌──────────────┐  ┌─────────────────────────────────┐  │
│  │ Reference PDF │  │      Target Draft (paste)       │  │
│  │   or Text     │  │                                 │  │
│  └──────┬───────┘  └────────────┬────────────────────┘  │
│         │                       │                        │
│         ▼                       ▼                        │
│  ┌──────────────────────────────────────────────────┐   │
│  │         rewrite_agent.py — CodeAgent             │   │
│  │                                                   │   │
│  │  Step 1: style_extractor(reference) → ref_metrics│   │
│  │  Step 2: style_extractor(target)    → tgt_metrics│   │
│  │  Step 3: Rewrite target to match ref_metrics     │   │
│  │  Step 4: style_extractor(rewritten) → verify     │   │
│  │                                                   │   │
│  │  ┌─────────────────────────────────────────────┐ │   │
│  │  │     style_extractor.py — Tool               │ │   │
│  │  │                                             │ │   │
│  │  │  • Sentence Length Variance (σ)             │ │   │
│  │  │  • Hedging Density (per sentence)           │ │   │
│  │  │  • Passive Voice Density (per sentence)     │ │   │
│  │  └─────────────────────────────────────────────┘ │   │
│  └──────────────────────────────────────────────────┘   │
│         │                                                │
│         ▼                                                │
│  ┌──────────────────────────────────────────────────┐   │
│  │              Rewritten Text + Metrics             │   │
│  └──────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────┘

Three Stylometric Metrics

Metric	Description	Academic Signature
Sentence Length Variance	σ of word counts per sentence	High variance = mix of short and long multi-clause sentences
Hedging Density	Hedge words per sentence (suggest, may, putative, indicate, could...)	Pre-2022 manuscripts hedge heavily in Results/Discussion
Passive Voice Density	Academic passive constructions per sentence (was performed, were analyzed...)	Methods sections are dominated by passive voice

Quick Start

pip install -r requirements.txt
export HF_TOKEN="hf_..."   # your Hugging Face token
python app.py               # launches Gradio on http://localhost:7860

File Structure

manuscript_mimic/
├── __init__.py           # Package marker
├── style_extractor.py    # StyleExtractorTool + metric functions
├── rewrite_agent.py      # CodeAgent orchestrator + run_mimic()
├── app.py                # Gradio web UI
├── requirements.txt      # Dependencies
└── README.md             # This file

Usage

Via Gradio UI

Upload a reference PDF or paste reference text (pre-2022 manuscript excerpt)
Paste your AI-generated draft
Select a model and click "Rewrite to Match Style"
Review the rewritten text and compare metrics

Via Python API

from style_extractor import extract_style_metrics
from rewrite_agent import run_mimic

# Analyze a text
metrics = extract_style_metrics("Your academic text here...")
print(metrics)

# Rewrite to match a reference
rewritten = run_mimic(
    reference_text="Pre-2022 manuscript excerpt...",
    target_text="AI-generated draft...",
)
print(rewritten)

Models

The agent works with any model available on the HF Inference API:

Qwen/Qwen2.5-Coder-32B-Instruct (default — best for code-generation agents)
meta-llama/Llama-3.3-70B-Instruct
mistralai/Mixtral-8x7B-Instruct-v0.1

License

MIT