salespath-env / RULES.md
Imsachin010's picture
HF Spaces GPU training pipeline
1af4cba

SalesPath β€” Business Rules (R01–R09)

The environment enforces these 9 business rules at every step.
Three violations β†’ episode terminates with a heavy penalty.


R01 β€” Qualify Before Present

Must QUALIFY before PRESENT

The agent cannot pitch the product until it has asked qualifying questions about the prospect's needs, budget, and situation.

R02 β€” Demo Before Negotiate

Must OFFER_DEMO before NEGOTIATE

No discount or price negotiation is allowed unless a product demo has been offered and scheduled.

R03 β€” Budget Known Before Negotiate

Budget must be known before NEGOTIATE

The prospect's budget must be revealed (via QUALIFY action) before the agent can enter negotiations.

R04 β€” Discount After Objections

Discount only after 2 objections handled

If the agent mentions a discount during NEGOTIATE, at least 2 prospect objections must have been successfully handled first.

R05 β€” No Repeat Action

Cannot repeat same action consecutively

The agent cannot use the same action type twice in a row. A QUALIFY cannot follow a QUALIFY, a PRESENT cannot follow a PRESENT, etc.

R06 β€” First Action Must Be PROSPECT

First action must always be PROSPECT

Every episode must begin with the PROSPECT action. Any other first action is invalid.

R07 β€” Follow-Up Only After Silence

FOLLOW_UP only after prospect goes silent

FOLLOW_UP is only valid when the prospect has disengaged (returned a silence response). If the prospect just responded with actual content, FOLLOW_UP is a violation.

R08 β€” Disqualify Logic

DISQUALIFY only if prospect is genuinely unqualified

DISQUALIFY is a violation if the prospect is actually closable (true budget β‰₯ close threshold AND decision maker is present). Use it only when the deal is truly unwinnable.

R09 β€” Close Requires Demo

Must OFFER_DEMO before CLOSE (difficulty 2+)

On difficulty 2 and above, the agent must have completed OFFER_DEMO before attempting to CLOSE the deal.


How Rules Are Enforced

Rules are checked before the prospect responds to an action. Violations are accumulated in constraints_violated and returned in the observation:

# Observation schema
{
    "constraints_violated": ["R01", "R05"],  # New violations this turn
    "steps_completed": ["PROSPECT", "QUALIFY"],
    ...
}

When len(constraints_violated) >= 3, the episode terminates with:

  • r_outcome = -0.5 (terminal penalty)
  • r_compliance = -0.2 Γ— violations (per-turn penalty)