Spaces:
Sleeping
Sleeping
Commit ·
93fd30a
1
Parent(s): a1cbdb1
Added latest svg pics
Browse files- README.md +1 -1
- assets/how_it_works_v2.svg +1 -0
README.md
CHANGED
|
@@ -4,7 +4,7 @@ Sieve is a reinforcement learning environment that simulates a real-world custom
|
|
| 4 |
|
| 5 |
## How It Works
|
| 6 |
|
| 7 |
-

|
| 8 |
|
| 9 |
The agent calls `/reset` to start an episode, then loops — reading the current email from the `Observation`, posting an `Action` to `/step`, and receiving a `Reward` and next `Observation` — until `done=true`. Each step reward reflects immediate quality. A `-0.005` step penalty discourages unnecessary actions. The final grader score from `/grader` is a holistic metric computed over the full episode.
|
| 10 |
|
assets/how_it_works_v2.svg
ADDED
|
|