| # ποΈ Manuscript-Mimic |
|
|
| **AI Style Transfer for Scientific Writing** |
|
|
| An agentic system that rewrites AI-generated scientific text to statistically match the stylometric profile of pre-2022 human-authored academic manuscripts. |
|
|
| ## Architecture |
|
|
| ``` |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β Gradio UI (app.py) β |
| β ββββββββββββββββ βββββββββββββββββββββββββββββββββββ β |
| β β Reference PDF β β Target Draft (paste) β β |
| β β or Text β β β β |
| β ββββββββ¬ββββββββ ββββββββββββββ¬βββββββββββββββββββββ β |
| β β β β |
| β βΌ βΌ β |
| β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β |
| β β rewrite_agent.py β CodeAgent β β |
| β β β β |
| β β Step 1: style_extractor(reference) β ref_metricsβ β |
| β β Step 2: style_extractor(target) β tgt_metricsβ β |
| β β Step 3: Rewrite target to match ref_metrics β β |
| β β Step 4: style_extractor(rewritten) β verify β β |
| β β β β |
| β β βββββββββββββββββββββββββββββββββββββββββββββββ β β |
| β β β style_extractor.py β Tool β β β |
| β β β β β β |
| β β β β’ Sentence Length Variance (Ο) β β β |
| β β β β’ Hedging Density (per sentence) β β β |
| β β β β’ Passive Voice Density (per sentence) β β β |
| β β βββββββββββββββββββββββββββββββββββββββββββββββ β β |
| β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β |
| β β β |
| β βΌ β |
| β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β |
| β β Rewritten Text + Metrics β β |
| β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| ``` |
|
|
| ## Three Stylometric Metrics |
|
|
| | Metric | Description | Academic Signature | |
| |--------|-------------|-------------------| |
| | **Sentence Length Variance** | Ο of word counts per sentence | High variance = mix of short and long multi-clause sentences | |
| | **Hedging Density** | Hedge words per sentence (*suggest, may, putative, indicate, could*...) | Pre-2022 manuscripts hedge heavily in Results/Discussion | |
| | **Passive Voice Density** | Academic passive constructions per sentence (*was performed, were analyzed*...) | Methods sections are dominated by passive voice | |
|
|
| ## Quick Start |
|
|
| ```bash |
| pip install -r requirements.txt |
| export HF_TOKEN="hf_..." # your Hugging Face token |
| python app.py # launches Gradio on http://localhost:7860 |
| ``` |
|
|
| ## File Structure |
|
|
| ``` |
| manuscript_mimic/ |
| βββ __init__.py # Package marker |
| βββ style_extractor.py # StyleExtractorTool + metric functions |
| βββ rewrite_agent.py # CodeAgent orchestrator + run_mimic() |
| βββ app.py # Gradio web UI |
| βββ requirements.txt # Dependencies |
| βββ README.md # This file |
| ``` |
|
|
| ## Usage |
|
|
| ### Via Gradio UI |
| 1. Upload a reference PDF or paste reference text (pre-2022 manuscript excerpt) |
| 2. Paste your AI-generated draft |
| 3. Select a model and click "Rewrite to Match Style" |
| 4. Review the rewritten text and compare metrics |
|
|
| ### Via Python API |
| ```python |
| from style_extractor import extract_style_metrics |
| from rewrite_agent import run_mimic |
| |
| # Analyze a text |
| metrics = extract_style_metrics("Your academic text here...") |
| print(metrics) |
| |
| # Rewrite to match a reference |
| rewritten = run_mimic( |
| reference_text="Pre-2022 manuscript excerpt...", |
| target_text="AI-generated draft...", |
| ) |
| print(rewritten) |
| ``` |
|
|
| ## Models |
|
|
| The agent works with any model available on the HF Inference API: |
| - `Qwen/Qwen2.5-Coder-32B-Instruct` (default β best for code-generation agents) |
| - `meta-llama/Llama-3.3-70B-Instruct` |
| - `mistralai/Mixtral-8x7B-Instruct-v0.1` |
|
|
| ## License |
|
|
| MIT |
|
|