manuscript-mimic / README.md
Babajaan's picture
Add README
e5825f1 verified

πŸ–‹οΈ Manuscript-Mimic

AI Style Transfer for Scientific Writing

An agentic system that rewrites AI-generated scientific text to statistically match the stylometric profile of pre-2022 human-authored academic manuscripts.

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Gradio UI (app.py)                    β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ Reference PDF β”‚  β”‚      Target Draft (paste)       β”‚  β”‚
β”‚  β”‚   or Text     β”‚  β”‚                                 β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚         β”‚                       β”‚                        β”‚
β”‚         β–Ό                       β–Ό                        β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚         rewrite_agent.py β€” CodeAgent             β”‚   β”‚
β”‚  β”‚                                                   β”‚   β”‚
β”‚  β”‚  Step 1: style_extractor(reference) β†’ ref_metricsβ”‚   β”‚
β”‚  β”‚  Step 2: style_extractor(target)    β†’ tgt_metricsβ”‚   β”‚
β”‚  β”‚  Step 3: Rewrite target to match ref_metrics     β”‚   β”‚
β”‚  β”‚  Step 4: style_extractor(rewritten) β†’ verify     β”‚   β”‚
β”‚  β”‚                                                   β”‚   β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚   β”‚
β”‚  β”‚  β”‚     style_extractor.py β€” Tool               β”‚ β”‚   β”‚
β”‚  β”‚  β”‚                                             β”‚ β”‚   β”‚
β”‚  β”‚  β”‚  β€’ Sentence Length Variance (Οƒ)             β”‚ β”‚   β”‚
β”‚  β”‚  β”‚  β€’ Hedging Density (per sentence)           β”‚ β”‚   β”‚
β”‚  β”‚  β”‚  β€’ Passive Voice Density (per sentence)     β”‚ β”‚   β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚         β”‚                                                β”‚
β”‚         β–Ό                                                β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚              Rewritten Text + Metrics             β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Three Stylometric Metrics

Metric Description Academic Signature
Sentence Length Variance Οƒ of word counts per sentence High variance = mix of short and long multi-clause sentences
Hedging Density Hedge words per sentence (suggest, may, putative, indicate, could...) Pre-2022 manuscripts hedge heavily in Results/Discussion
Passive Voice Density Academic passive constructions per sentence (was performed, were analyzed...) Methods sections are dominated by passive voice

Quick Start

pip install -r requirements.txt
export HF_TOKEN="hf_..."   # your Hugging Face token
python app.py               # launches Gradio on http://localhost:7860

File Structure

manuscript_mimic/
β”œβ”€β”€ __init__.py           # Package marker
β”œβ”€β”€ style_extractor.py    # StyleExtractorTool + metric functions
β”œβ”€β”€ rewrite_agent.py      # CodeAgent orchestrator + run_mimic()
β”œβ”€β”€ app.py                # Gradio web UI
β”œβ”€β”€ requirements.txt      # Dependencies
└── README.md             # This file

Usage

Via Gradio UI

  1. Upload a reference PDF or paste reference text (pre-2022 manuscript excerpt)
  2. Paste your AI-generated draft
  3. Select a model and click "Rewrite to Match Style"
  4. Review the rewritten text and compare metrics

Via Python API

from style_extractor import extract_style_metrics
from rewrite_agent import run_mimic

# Analyze a text
metrics = extract_style_metrics("Your academic text here...")
print(metrics)

# Rewrite to match a reference
rewritten = run_mimic(
    reference_text="Pre-2022 manuscript excerpt...",
    target_text="AI-generated draft...",
)
print(rewritten)

Models

The agent works with any model available on the HF Inference API:

  • Qwen/Qwen2.5-Coder-32B-Instruct (default β€” best for code-generation agents)
  • meta-llama/Llama-3.3-70B-Instruct
  • mistralai/Mixtral-8x7B-Instruct-v0.1

License

MIT