File size: 5,363 Bytes
e5825f1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 | # ποΈ Manuscript-Mimic
**AI Style Transfer for Scientific Writing**
An agentic system that rewrites AI-generated scientific text to statistically match the stylometric profile of pre-2022 human-authored academic manuscripts.
## Architecture
```
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Gradio UI (app.py) β
β ββββββββββββββββ βββββββββββββββββββββββββββββββββββ β
β β Reference PDF β β Target Draft (paste) β β
β β or Text β β β β
β ββββββββ¬ββββββββ ββββββββββββββ¬βββββββββββββββββββββ β
β β β β
β βΌ βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β rewrite_agent.py β CodeAgent β β
β β β β
β β Step 1: style_extractor(reference) β ref_metricsβ β
β β Step 2: style_extractor(target) β tgt_metricsβ β
β β Step 3: Rewrite target to match ref_metrics β β
β β Step 4: style_extractor(rewritten) β verify β β
β β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β style_extractor.py β Tool β β β
β β β β β β
β β β β’ Sentence Length Variance (Ο) β β β
β β β β’ Hedging Density (per sentence) β β β
β β β β’ Passive Voice Density (per sentence) β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββ β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Rewritten Text + Metrics β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
```
## Three Stylometric Metrics
| Metric | Description | Academic Signature |
|--------|-------------|-------------------|
| **Sentence Length Variance** | Ο of word counts per sentence | High variance = mix of short and long multi-clause sentences |
| **Hedging Density** | Hedge words per sentence (*suggest, may, putative, indicate, could*...) | Pre-2022 manuscripts hedge heavily in Results/Discussion |
| **Passive Voice Density** | Academic passive constructions per sentence (*was performed, were analyzed*...) | Methods sections are dominated by passive voice |
## Quick Start
```bash
pip install -r requirements.txt
export HF_TOKEN="hf_..." # your Hugging Face token
python app.py # launches Gradio on http://localhost:7860
```
## File Structure
```
manuscript_mimic/
βββ __init__.py # Package marker
βββ style_extractor.py # StyleExtractorTool + metric functions
βββ rewrite_agent.py # CodeAgent orchestrator + run_mimic()
βββ app.py # Gradio web UI
βββ requirements.txt # Dependencies
βββ README.md # This file
```
## Usage
### Via Gradio UI
1. Upload a reference PDF or paste reference text (pre-2022 manuscript excerpt)
2. Paste your AI-generated draft
3. Select a model and click "Rewrite to Match Style"
4. Review the rewritten text and compare metrics
### Via Python API
```python
from style_extractor import extract_style_metrics
from rewrite_agent import run_mimic
# Analyze a text
metrics = extract_style_metrics("Your academic text here...")
print(metrics)
# Rewrite to match a reference
rewritten = run_mimic(
reference_text="Pre-2022 manuscript excerpt...",
target_text="AI-generated draft...",
)
print(rewritten)
```
## Models
The agent works with any model available on the HF Inference API:
- `Qwen/Qwen2.5-Coder-32B-Instruct` (default β best for code-generation agents)
- `meta-llama/Llama-3.3-70B-Instruct`
- `mistralai/Mixtral-8x7B-Instruct-v0.1`
## License
MIT
|