Spaces:
Runtime error
SalesPath β Business Rules (R01βR09)
The environment enforces these 9 business rules at every step.
Three violations β episode terminates with a heavy penalty.
R01 β Qualify Before Present
Must QUALIFY before PRESENT
The agent cannot pitch the product until it has asked qualifying questions about the prospect's needs, budget, and situation.
R02 β Demo Before Negotiate
Must OFFER_DEMO before NEGOTIATE
No discount or price negotiation is allowed unless a product demo has been offered and scheduled.
R03 β Budget Known Before Negotiate
Budget must be known before NEGOTIATE
The prospect's budget must be revealed (via QUALIFY action) before the agent can enter negotiations.
R04 β Discount After Objections
Discount only after 2 objections handled
If the agent mentions a discount during NEGOTIATE, at least 2 prospect objections must have been successfully handled first.
R05 β No Repeat Action
Cannot repeat same action consecutively
The agent cannot use the same action type twice in a row. A QUALIFY cannot follow a QUALIFY, a PRESENT cannot follow a PRESENT, etc.
R06 β First Action Must Be PROSPECT
First action must always be PROSPECT
Every episode must begin with the PROSPECT action. Any other first action is invalid.
R07 β Follow-Up Only After Silence
FOLLOW_UP only after prospect goes silent
FOLLOW_UP is only valid when the prospect has disengaged (returned a silence response). If the prospect just responded with actual content, FOLLOW_UP is a violation.
R08 β Disqualify Logic
DISQUALIFY only if prospect is genuinely unqualified
DISQUALIFY is a violation if the prospect is actually closable (true budget β₯ close threshold AND decision maker is present). Use it only when the deal is truly unwinnable.
R09 β Close Requires Demo
Must OFFER_DEMO before CLOSE (difficulty 2+)
On difficulty 2 and above, the agent must have completed OFFER_DEMO before attempting to CLOSE the deal.
How Rules Are Enforced
Rules are checked before the prospect responds to an action. Violations are accumulated in constraints_violated and returned in the observation:
# Observation schema
{
"constraints_violated": ["R01", "R05"], # New violations this turn
"steps_completed": ["PROSPECT", "QUALIFY"],
...
}
When len(constraints_violated) >= 3, the episode terminates with:
r_outcome = -0.5(terminal penalty)r_compliance = -0.2 Γ violations(per-turn penalty)