manuscript-mimic / README.md
Babajaan's picture
Add README
e5825f1 verified
# πŸ–‹οΈ Manuscript-Mimic
**AI Style Transfer for Scientific Writing**
An agentic system that rewrites AI-generated scientific text to statistically match the stylometric profile of pre-2022 human-authored academic manuscripts.
## Architecture
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Gradio UI (app.py) β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Reference PDF β”‚ β”‚ Target Draft (paste) β”‚ β”‚
β”‚ β”‚ or Text β”‚ β”‚ β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚ β”‚
β”‚ β–Ό β–Ό β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ rewrite_agent.py β€” CodeAgent β”‚ β”‚
β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ Step 1: style_extractor(reference) β†’ ref_metricsβ”‚ β”‚
β”‚ β”‚ Step 2: style_extractor(target) β†’ tgt_metricsβ”‚ β”‚
β”‚ β”‚ Step 3: Rewrite target to match ref_metrics β”‚ β”‚
β”‚ β”‚ Step 4: style_extractor(rewritten) β†’ verify β”‚ β”‚
β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚
β”‚ β”‚ β”‚ style_extractor.py β€” Tool β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β€’ Sentence Length Variance (Οƒ) β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β€’ Hedging Density (per sentence) β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β€’ Passive Voice Density (per sentence) β”‚ β”‚ β”‚
β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚
β”‚ β–Ό β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Rewritten Text + Metrics β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
## Three Stylometric Metrics
| Metric | Description | Academic Signature |
|--------|-------------|-------------------|
| **Sentence Length Variance** | Οƒ of word counts per sentence | High variance = mix of short and long multi-clause sentences |
| **Hedging Density** | Hedge words per sentence (*suggest, may, putative, indicate, could*...) | Pre-2022 manuscripts hedge heavily in Results/Discussion |
| **Passive Voice Density** | Academic passive constructions per sentence (*was performed, were analyzed*...) | Methods sections are dominated by passive voice |
## Quick Start
```bash
pip install -r requirements.txt
export HF_TOKEN="hf_..." # your Hugging Face token
python app.py # launches Gradio on http://localhost:7860
```
## File Structure
```
manuscript_mimic/
β”œβ”€β”€ __init__.py # Package marker
β”œβ”€β”€ style_extractor.py # StyleExtractorTool + metric functions
β”œβ”€β”€ rewrite_agent.py # CodeAgent orchestrator + run_mimic()
β”œβ”€β”€ app.py # Gradio web UI
β”œβ”€β”€ requirements.txt # Dependencies
└── README.md # This file
```
## Usage
### Via Gradio UI
1. Upload a reference PDF or paste reference text (pre-2022 manuscript excerpt)
2. Paste your AI-generated draft
3. Select a model and click "Rewrite to Match Style"
4. Review the rewritten text and compare metrics
### Via Python API
```python
from style_extractor import extract_style_metrics
from rewrite_agent import run_mimic
# Analyze a text
metrics = extract_style_metrics("Your academic text here...")
print(metrics)
# Rewrite to match a reference
rewritten = run_mimic(
reference_text="Pre-2022 manuscript excerpt...",
target_text="AI-generated draft...",
)
print(rewritten)
```
## Models
The agent works with any model available on the HF Inference API:
- `Qwen/Qwen2.5-Coder-32B-Instruct` (default β€” best for code-generation agents)
- `meta-llama/Llama-3.3-70B-Instruct`
- `mistralai/Mixtral-8x7B-Instruct-v0.1`
## License
MIT