File size: 5,363 Bytes
e5825f1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
# πŸ–‹οΈ Manuscript-Mimic

**AI Style Transfer for Scientific Writing**

An agentic system that rewrites AI-generated scientific text to statistically match the stylometric profile of pre-2022 human-authored academic manuscripts.

## Architecture

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Gradio UI (app.py)                    β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ Reference PDF β”‚  β”‚      Target Draft (paste)       β”‚  β”‚
β”‚  β”‚   or Text     β”‚  β”‚                                 β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚         β”‚                       β”‚                        β”‚
β”‚         β–Ό                       β–Ό                        β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚         rewrite_agent.py β€” CodeAgent             β”‚   β”‚
β”‚  β”‚                                                   β”‚   β”‚
β”‚  β”‚  Step 1: style_extractor(reference) β†’ ref_metricsβ”‚   β”‚
β”‚  β”‚  Step 2: style_extractor(target)    β†’ tgt_metricsβ”‚   β”‚
β”‚  β”‚  Step 3: Rewrite target to match ref_metrics     β”‚   β”‚
β”‚  β”‚  Step 4: style_extractor(rewritten) β†’ verify     β”‚   β”‚
β”‚  β”‚                                                   β”‚   β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚   β”‚
β”‚  β”‚  β”‚     style_extractor.py β€” Tool               β”‚ β”‚   β”‚
β”‚  β”‚  β”‚                                             β”‚ β”‚   β”‚
β”‚  β”‚  β”‚  β€’ Sentence Length Variance (Οƒ)             β”‚ β”‚   β”‚
β”‚  β”‚  β”‚  β€’ Hedging Density (per sentence)           β”‚ β”‚   β”‚
β”‚  β”‚  β”‚  β€’ Passive Voice Density (per sentence)     β”‚ β”‚   β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚         β”‚                                                β”‚
β”‚         β–Ό                                                β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚              Rewritten Text + Metrics             β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

## Three Stylometric Metrics

| Metric | Description | Academic Signature |
|--------|-------------|-------------------|
| **Sentence Length Variance** | Οƒ of word counts per sentence | High variance = mix of short and long multi-clause sentences |
| **Hedging Density** | Hedge words per sentence (*suggest, may, putative, indicate, could*...) | Pre-2022 manuscripts hedge heavily in Results/Discussion |
| **Passive Voice Density** | Academic passive constructions per sentence (*was performed, were analyzed*...) | Methods sections are dominated by passive voice |

## Quick Start

```bash
pip install -r requirements.txt
export HF_TOKEN="hf_..."   # your Hugging Face token
python app.py               # launches Gradio on http://localhost:7860
```

## File Structure

```
manuscript_mimic/
β”œβ”€β”€ __init__.py           # Package marker
β”œβ”€β”€ style_extractor.py    # StyleExtractorTool + metric functions
β”œβ”€β”€ rewrite_agent.py      # CodeAgent orchestrator + run_mimic()
β”œβ”€β”€ app.py                # Gradio web UI
β”œβ”€β”€ requirements.txt      # Dependencies
└── README.md             # This file
```

## Usage

### Via Gradio UI
1. Upload a reference PDF or paste reference text (pre-2022 manuscript excerpt)
2. Paste your AI-generated draft
3. Select a model and click "Rewrite to Match Style"
4. Review the rewritten text and compare metrics

### Via Python API
```python
from style_extractor import extract_style_metrics
from rewrite_agent import run_mimic

# Analyze a text
metrics = extract_style_metrics("Your academic text here...")
print(metrics)

# Rewrite to match a reference
rewritten = run_mimic(
    reference_text="Pre-2022 manuscript excerpt...",
    target_text="AI-generated draft...",
)
print(rewritten)
```

## Models

The agent works with any model available on the HF Inference API:
- `Qwen/Qwen2.5-Coder-32B-Instruct` (default β€” best for code-generation agents)
- `meta-llama/Llama-3.3-70B-Instruct`
- `mistralai/Mixtral-8x7B-Instruct-v0.1`

## License

MIT