# πŸ–‹οΈ Manuscript-Mimic **AI Style Transfer for Scientific Writing** An agentic system that rewrites AI-generated scientific text to statistically match the stylometric profile of pre-2022 human-authored academic manuscripts. ## Architecture ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Gradio UI (app.py) β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ Reference PDF β”‚ β”‚ Target Draft (paste) β”‚ β”‚ β”‚ β”‚ or Text β”‚ β”‚ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β–Ό β–Ό β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ rewrite_agent.py β€” CodeAgent β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ Step 1: style_extractor(reference) β†’ ref_metricsβ”‚ β”‚ β”‚ β”‚ Step 2: style_extractor(target) β†’ tgt_metricsβ”‚ β”‚ β”‚ β”‚ Step 3: Rewrite target to match ref_metrics β”‚ β”‚ β”‚ β”‚ Step 4: style_extractor(rewritten) β†’ verify β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ β”‚ β”‚ style_extractor.py β€” Tool β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β€’ Sentence Length Variance (Οƒ) β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β€’ Hedging Density (per sentence) β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β€’ Passive Voice Density (per sentence) β”‚ β”‚ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”‚ β–Ό β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ Rewritten Text + Metrics β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` ## Three Stylometric Metrics | Metric | Description | Academic Signature | |--------|-------------|-------------------| | **Sentence Length Variance** | Οƒ of word counts per sentence | High variance = mix of short and long multi-clause sentences | | **Hedging Density** | Hedge words per sentence (*suggest, may, putative, indicate, could*...) | Pre-2022 manuscripts hedge heavily in Results/Discussion | | **Passive Voice Density** | Academic passive constructions per sentence (*was performed, were analyzed*...) | Methods sections are dominated by passive voice | ## Quick Start ```bash pip install -r requirements.txt export HF_TOKEN="hf_..." # your Hugging Face token python app.py # launches Gradio on http://localhost:7860 ``` ## File Structure ``` manuscript_mimic/ β”œβ”€β”€ __init__.py # Package marker β”œβ”€β”€ style_extractor.py # StyleExtractorTool + metric functions β”œβ”€β”€ rewrite_agent.py # CodeAgent orchestrator + run_mimic() β”œβ”€β”€ app.py # Gradio web UI β”œβ”€β”€ requirements.txt # Dependencies └── README.md # This file ``` ## Usage ### Via Gradio UI 1. Upload a reference PDF or paste reference text (pre-2022 manuscript excerpt) 2. Paste your AI-generated draft 3. Select a model and click "Rewrite to Match Style" 4. Review the rewritten text and compare metrics ### Via Python API ```python from style_extractor import extract_style_metrics from rewrite_agent import run_mimic # Analyze a text metrics = extract_style_metrics("Your academic text here...") print(metrics) # Rewrite to match a reference rewritten = run_mimic( reference_text="Pre-2022 manuscript excerpt...", target_text="AI-generated draft...", ) print(rewritten) ``` ## Models The agent works with any model available on the HF Inference API: - `Qwen/Qwen2.5-Coder-32B-Instruct` (default β€” best for code-generation agents) - `meta-llama/Llama-3.3-70B-Instruct` - `mistralai/Mixtral-8x7B-Instruct-v0.1` ## License MIT