jampuramprem commited on
Commit
93fd30a
·
1 Parent(s): a1cbdb1

Added latest svg pics

Browse files
Files changed (2) hide show
  1. README.md +1 -1
  2. assets/how_it_works_v2.svg +1 -0
README.md CHANGED
@@ -4,7 +4,7 @@ Sieve is a reinforcement learning environment that simulates a real-world custom
4
 
5
  ## How It Works
6
 
7
- ![How It Works](assets/how_it_works.svg)
8
 
9
  The agent calls `/reset` to start an episode, then loops — reading the current email from the `Observation`, posting an `Action` to `/step`, and receiving a `Reward` and next `Observation` — until `done=true`. Each step reward reflects immediate quality. A `-0.005` step penalty discourages unnecessary actions. The final grader score from `/grader` is a holistic metric computed over the full episode.
10
 
 
4
 
5
  ## How It Works
6
 
7
+ ![How It Works](assets/how_it_works_v2.svg)
8
 
9
  The agent calls `/reset` to start an episode, then loops — reading the current email from the `Observation`, posting an `Action` to `/step`, and receiving a `Reward` and next `Observation` — until `done=true`. Each step reward reflects immediate quality. A `-0.005` step penalty discourages unnecessary actions. The final grader score from `/grader` is a holistic metric computed over the full episode.
10
 
assets/how_it_works_v2.svg ADDED