File size: 5,363 Bytes

e5825f1

# 🖋️ Manuscript-Mimic

**AI Style Transfer for Scientific Writing**

An agentic system that rewrites AI-generated scientific text to statistically match the stylometric profile of pre-2022 human-authored academic manuscripts.

## Architecture

```
┌─────────────────────────────────────────────────────────┐
│                    Gradio UI (app.py)                    │
│  ┌──────────────┐  ┌─────────────────────────────────┐  │
│  │ Reference PDF │  │      Target Draft (paste)       │  │
│  │   or Text     │  │                                 │  │
│  └──────┬───────┘  └────────────┬────────────────────┘  │
│         │                       │                        │
│         ▼                       ▼                        │
│  ┌──────────────────────────────────────────────────┐   │
│  │         rewrite_agent.py — CodeAgent             │   │
│  │                                                   │   │
│  │  Step 1: style_extractor(reference) → ref_metrics│   │
│  │  Step 2: style_extractor(target)    → tgt_metrics│   │
│  │  Step 3: Rewrite target to match ref_metrics     │   │
│  │  Step 4: style_extractor(rewritten) → verify     │   │
│  │                                                   │   │
│  │  ┌─────────────────────────────────────────────┐ │   │
│  │  │     style_extractor.py — Tool               │ │   │
│  │  │                                             │ │   │
│  │  │  • Sentence Length Variance (σ)             │ │   │
│  │  │  • Hedging Density (per sentence)           │ │   │
│  │  │  • Passive Voice Density (per sentence)     │ │   │
│  │  └─────────────────────────────────────────────┘ │   │
│  └──────────────────────────────────────────────────┘   │
│         │                                                │
│         ▼                                                │
│  ┌──────────────────────────────────────────────────┐   │
│  │              Rewritten Text + Metrics             │   │
│  └──────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────┘
```

## Three Stylometric Metrics

| Metric | Description | Academic Signature |
|--------|-------------|-------------------|
| **Sentence Length Variance** | σ of word counts per sentence | High variance = mix of short and long multi-clause sentences |
| **Hedging Density** | Hedge words per sentence (*suggest, may, putative, indicate, could*...) | Pre-2022 manuscripts hedge heavily in Results/Discussion |
| **Passive Voice Density** | Academic passive constructions per sentence (*was performed, were analyzed*...) | Methods sections are dominated by passive voice |

## Quick Start

```bash
pip install -r requirements.txt
export HF_TOKEN="hf_..."   # your Hugging Face token
python app.py               # launches Gradio on http://localhost:7860
```

## File Structure

```
manuscript_mimic/
├── __init__.py           # Package marker
├── style_extractor.py    # StyleExtractorTool + metric functions
├── rewrite_agent.py      # CodeAgent orchestrator + run_mimic()
├── app.py                # Gradio web UI
├── requirements.txt      # Dependencies
└── README.md             # This file
```

## Usage

### Via Gradio UI
1. Upload a reference PDF or paste reference text (pre-2022 manuscript excerpt)
2. Paste your AI-generated draft
3. Select a model and click "Rewrite to Match Style"
4. Review the rewritten text and compare metrics

### Via Python API
```python
from style_extractor import extract_style_metrics
from rewrite_agent import run_mimic

# Analyze a text
metrics = extract_style_metrics("Your academic text here...")
print(metrics)

# Rewrite to match a reference
rewritten = run_mimic(
    reference_text="Pre-2022 manuscript excerpt...",
    target_text="AI-generated draft...",
)
print(rewritten)
```

## Models

The agent works with any model available on the HF Inference API:
- `Qwen/Qwen2.5-Coder-32B-Instruct` (default — best for code-generation agents)
- `meta-llama/Llama-3.3-70B-Instruct`
- `mistralai/Mixtral-8x7B-Instruct-v0.1`

## License

MIT