File size: 2,562 Bytes
1af4cba
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
# SalesPath β€” Business Rules (R01–R09)

The environment enforces these 9 business rules at every step.  
Three violations β†’ episode terminates with a heavy penalty.

---

## R01 β€” Qualify Before Present

> **Must QUALIFY before PRESENT**

The agent cannot pitch the product until it has asked qualifying questions about the prospect's needs, budget, and situation.

## R02 β€” Demo Before Negotiate

> **Must OFFER_DEMO before NEGOTIATE**

No discount or price negotiation is allowed unless a product demo has been offered and scheduled.

## R03 β€” Budget Known Before Negotiate

> **Budget must be known before NEGOTIATE**

The prospect's budget must be revealed (via QUALIFY action) before the agent can enter negotiations.

## R04 β€” Discount After Objections

> **Discount only after 2 objections handled**

If the agent mentions a discount during NEGOTIATE, at least 2 prospect objections must have been successfully handled first.

## R05 β€” No Repeat Action

> **Cannot repeat same action consecutively**

The agent cannot use the same action type twice in a row. A QUALIFY cannot follow a QUALIFY, a PRESENT cannot follow a PRESENT, etc.

## R06 β€” First Action Must Be PROSPECT

> **First action must always be PROSPECT**

Every episode must begin with the PROSPECT action. Any other first action is invalid.

## R07 β€” Follow-Up Only After Silence

> **FOLLOW_UP only after prospect goes silent**

FOLLOW_UP is only valid when the prospect has disengaged (returned a `silence` response). If the prospect just responded with actual content, FOLLOW_UP is a violation.

## R08 β€” Disqualify Logic

> **DISQUALIFY only if prospect is genuinely unqualified**

DISQUALIFY is a violation if the prospect is actually closable (true budget β‰₯ close threshold AND decision maker is present). Use it only when the deal is truly unwinnable.

## R09 β€” Close Requires Demo

> **Must OFFER_DEMO before CLOSE (difficulty 2+)**

On difficulty 2 and above, the agent must have completed OFFER_DEMO before attempting to CLOSE the deal.

---

## How Rules Are Enforced

Rules are checked **before** the prospect responds to an action. Violations are accumulated in `constraints_violated` and returned in the observation:

```python
# Observation schema
{
    "constraints_violated": ["R01", "R05"],  # New violations this turn
    "steps_completed": ["PROSPECT", "QUALIFY"],
    ...
}
```

When `len(constraints_violated) >= 3`, the episode terminates with:
- `r_outcome = -0.5` (terminal penalty)
- `r_compliance = -0.2 Γ— violations` (per-turn penalty)