# SalesPath — Business Rules (R01–R09)

The environment enforces these 9 business rules at every step.  
Three violations → episode terminates with a heavy penalty.

---

## R01 — Qualify Before Present

> **Must QUALIFY before PRESENT**

The agent cannot pitch the product until it has asked qualifying questions about the prospect's needs, budget, and situation.

## R02 — Demo Before Negotiate

> **Must OFFER_DEMO before NEGOTIATE**

No discount or price negotiation is allowed unless a product demo has been offered and scheduled.

## R03 — Budget Known Before Negotiate

> **Budget must be known before NEGOTIATE**

The prospect's budget must be revealed (via QUALIFY action) before the agent can enter negotiations.

## R04 — Discount After Objections

> **Discount only after 2 objections handled**

If the agent mentions a discount during NEGOTIATE, at least 2 prospect objections must have been successfully handled first.

## R05 — No Repeat Action

> **Cannot repeat same action consecutively**

The agent cannot use the same action type twice in a row. A QUALIFY cannot follow a QUALIFY, a PRESENT cannot follow a PRESENT, etc.

## R06 — First Action Must Be PROSPECT

> **First action must always be PROSPECT**

Every episode must begin with the PROSPECT action. Any other first action is invalid.

## R07 — Follow-Up Only After Silence

> **FOLLOW_UP only after prospect goes silent**

FOLLOW_UP is only valid when the prospect has disengaged (returned a `silence` response). If the prospect just responded with actual content, FOLLOW_UP is a violation.

## R08 — Disqualify Logic

> **DISQUALIFY only if prospect is genuinely unqualified**

DISQUALIFY is a violation if the prospect is actually closable (true budget ≥ close threshold AND decision maker is present). Use it only when the deal is truly unwinnable.

## R09 — Close Requires Demo

> **Must OFFER_DEMO before CLOSE (difficulty 2+)**

On difficulty 2 and above, the agent must have completed OFFER_DEMO before attempting to CLOSE the deal.

---

## How Rules Are Enforced

Rules are checked **before** the prospect responds to an action. Violations are accumulated in `constraints_violated` and returned in the observation:

```python
# Observation schema
{
    "constraints_violated": ["R01", "R05"],  # New violations this turn
    "steps_completed": ["PROSPECT", "QUALIFY"],
    ...
}
```

When `len(constraints_violated) >= 3`, the episode terminates with:
- `r_outcome = -0.5` (terminal penalty)
- `r_compliance = -0.2 × violations` (per-turn penalty)