File size: 1,975 Bytes
ec4ae03
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
# Centralized Prompt Configuration

All prompts for question generation and solution generation are centralized in `src/config/prompts.py`.

## Why Centralized?

- **Consistency**: Same prompts across SFT training, GRPO, PPO, and inference
- **Maintainability**: Single source of truth for all prompt text
- **Flexibility**: Easy to tune prompts without hunting through multiple files

## Usage

### Question Generation

```python
from src.config.prompts import create_generator_messages

instruction = "Generate a problem about fractions in a shopping context"
messages = create_generator_messages(instruction)
# Returns:
# [
#   {"role": "system", "content": GENERATOR_SYSTEM_PROMPT},
#   {"role": "user", "content": "### Task: Generate Question\n{instruction}"}
# ]
```

### Solution Generation

```python
from src.config.prompts import create_solver_messages

question = "If John has 5 apples and gives 2 away, how many does he have?"
messages = create_solver_messages(question)
# Returns:
# [
#   {"role": "system", "content": SOLVER_SYSTEM_PROMPT},
#   {"role": "user", "content": "### Task: Solve Problem\nProblem: {question}\nSolution:"}
# ]
```

## Files Using Centralized Prompts

- `scripts/run_grpo_training.py` - GRPO question generation
- `scripts/dual_task_sft_pipeline.py` - SFT training
- `scripts/create_dual_task_dataset.py` - Dataset creation
- `src/rl/math_environment.py` - PPO environment
- `src/rl/triple_verifier.py` - Consensus verification

## Prompt Design Principles

### Question Generation
- **No explicit step constraints**: Let the model decide complexity naturally
- Focus on **realistic scenarios** and **simple operations** (grade-school level)
- Output **only the problem statement**, no solutions

### Solution Generation
- **Step-by-step format**: Each step on its own line starting with "Step N:"
- **Final Answer format**: Line starting with "Final Answer:"
- **Python/SymPy syntax**: All math expressions verifiable programmatically