Spaces:
Sleeping
Sleeping
File size: 2,423 Bytes
d6a76d5 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 |
"""
// Testing, conceptually
Test What it verifies
ask_info info collection logic
resolve (after) success path
resolve (before) penalty logic
reward values correctness of shaping
done flag termination logic
> Detailed test flow
- ask_info
Conceptually, checks whether agent can reduce uncerainiy by asking the correct question
-- The environment is partially observable — the agent doesn’t know everything upfront --
Real-world analogy:
Support agent asking the client of their email
- resolve (after)
Conceptually, checks:
“Can the agent complete the task after gathering required info?”
This is goal completion
- resolve (before)
Conceptually, checks:
“Does the system penalize shortcut / lazy behavior?”
Without this:
Agent would always jump to resolve
- Reward values
Conceptually, checks:
“Is the agent receiving useful learning signals?”
With the reward-mechanism implemented:
Behavior Reward
correct info +0.3
correct resolution +1.0
final score +0.0 → +1.0
wrong action negative
technically, we validate:
reward accumulation works
no random jumps
consistent scaling
This is critical, because:
. Bad reward = bad agent/system
. Good reward = learnable system
- done flag
Conceptually, checks:
“Does the environment know when the episode ends?”
- no score field in /reset, since at reset:
Episode has not happened yet
→ No performance → No score
These tests collectively validate:
MDP (Markov Decision Process) -> (State, Action, Reward, Transition, Termination) -> Thorough RL Environment
Component Verified by
State reset
Action ask_info / resolve
Reward reward tests
Transition state updates
Termination done flag
// Expected behavior
Good Agent Flow:
Reset
→ ask_info (+0.3)
→ resolve (+1.0 + bonus)
Bad Agent Flow:
Reset
→ resolve (-0.3)
→ ask random info (-0.1)
→ timeout (-1.0)
"""
import requests
BASE = "http://127.0.0.1:8001"
# Reset
r = requests.get(f"{BASE}/reset")
print(f"\nRESET: \n\n{r.json()}")
# Ask info
r = requests.post(f"{BASE}/step", json={
"type": "ask_info",
"field": "account_email"
})
#print("ASK INFO:", r.json())
print(f"\nASK INFO: \n\n{r.json()}")
# Resolve
r = requests.post(f"{BASE}/step", json={
"type": "resolve"
})
print(f"\nRESOLVE: \n\n{r.json()}")
#print(f"\n"RESOLVE:", {r.json()}) |