Spaces:

Jibrann
/

app

Runtime error

App Files Files Community

Jibrann commited on 22 days ago

Commit

c17891d

verified ·

1 Parent(s): 2d55563

Upload folder using huggingface_hub

Browse files

Files changed (8) hide show

README.md +48 -2
client.py +5 -0
grader.py +124 -0
inference.py +40 -17
models.py +95 -0
server/app.py +13 -13
server/app_environment.py +59 -6
utils.py +173 -11

README.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-title: Object Placer
 emoji: 🔊
 colorFrom: purple
 colorTo: yellow
@@ -10,4 +10,50 @@ app_file: server/app.py
 pinned: false
 app_port: 8000
 base_path: /web
----

 ---
+title: The Sorter Project
 emoji: 🔊
 colorFrom: purple
 colorTo: yellow
 pinned: false
 app_port: 8000
 base_path: /web
+---
+# The Sorter Project
+## The Purpose
+Building an environment to make AI models learn on how to **identify**, **place** and **adjust** the position of things in the environment which are scattered in a *random* fashion.
+## Real Life Application
+We came up with this idea, keeping in mind its application in factories, warehouses and storage facilities. _(and even your coffee table!)_
+## The Problem
+### **The Industrial Perspective / Micro Perspective**
+Companies spend milllions if not billions on establishing, maintaining and organisising warehouses and storage facilities, and in a densely populated country like India with increasing demand for land and with the surging property prices efficient storage and orgsation becomes the ***need of the hour***, leading to the demand for an environment or an agent that can help companies and organisations and provide them with ways for the maximum efficient and logical storage of their "objects".
+The environments and agents that specialise in full fledged identifying, sorting, stacking and organising of objects or warehouse material are few in number, and ***we are here to fill that gap***.
+### **The Populational Perspective / Macro Persepective**
+With increase in population causing decrease of 'Open Spaces' it becomes extremely important to **build societies and localities that can cater to a huge chunk of population** and in such a case, The Sorter Project though being mainly built for industrial application, becomes an extremely useful tool that allows proper space utilisation to accomatodate more people whilst taking minimum space. _(so in the near future we might not have to shift to mars)_
+## Our Solution
+_We have developed this environment with 'ease' thanks to OpenEnv!_ <br>
+Our Sorter Project consists of ***_3_ different parts*** and ***_4_ different processes/tasks***:
+### **Part 1**: The Segmentation Problem<br>
+**Task 1:** Our Project has an ****Segmentation Action**** that makes agents identify objects which is rare to find in multiple similar environments.
+### **Part 2**: The Identification Problem<br>
+**Task 2:** Our Project has a ****Identification Action**** which is though a part of the Segmentation task, is slightly different, it allows agents to segregate objects into **stackable** and **not stackable** which will be of high importance while addressing the next problem.
+### **Part 3**: The Placement Problem<br>
+**Task 3:** Our Project has a ****Placement Action**** that allows agents to place things it has found.<br>
+**Task 4:** It also provides an ****Adjust Action**** for the agent to adjust things _(because no one's good in their first try ! )_
+## Technical Details
+### Reward Logic
+### Demonstration
+## Links
+****Huggingface Link**** (to run `inference.py`): https://huggingface.co/spaces/Jibrann/app <br>
+****Github Link**** (this page): https://github.com/jibcamun/Reinforcement-Learning-Object-Placement
+## Related Works
+[Jumanji](https://github.com/instadeepai/jumanji)<br>
+[miniRL](https://proxyapps.exascaleproject.org/app/minirl/)<br>
+[BabyAI](https://arxiv.org/abs/1810.08272)<br>

client.py CHANGED Viewed

@@ -16,6 +16,7 @@ class AppEnv(EnvClient[AppAction, AppObservation, AppState]):
             "placement": action.placement,
             "isSegmentation": action.isSegmentation,
             "findObjects": action.findObjects,
         }
     def _parse_result(self, payload: Dict) -> StepResult[AppObservation]:
@@ -30,6 +31,8 @@ class AppEnv(EnvClient[AppAction, AppObservation, AppState]):
             isDone=obs_data.get("isDone", False),
             rewardFeedback=obs_data.get("rewardFeedback", []),
             rewardList=obs_data.get("rewardList", []),
         )
         return StepResult(
@@ -52,4 +55,6 @@ class AppEnv(EnvClient[AppAction, AppObservation, AppState]):
             ObjectsPresent=payload.get("ObjectsPresent", {}),
             rewardFeedback=payload.get("rewardFeedback", []),
             rewardList=payload.get("rewardList", []),
         )

             "placement": action.placement,
             "isSegmentation": action.isSegmentation,
             "findObjects": action.findObjects,
+            "adjust": action.adjust,
         }
     def _parse_result(self, payload: Dict) -> StepResult[AppObservation]:
             isDone=obs_data.get("isDone", False),
             rewardFeedback=obs_data.get("rewardFeedback", []),
             rewardList=obs_data.get("rewardList", []),
+            numberPlaced=obs_data.get("numberPlaced", 0),
+            ObjectsPlaced=obs_data.get("ObjectsPlaced", {}),
         )
         return StepResult(
             ObjectsPresent=payload.get("ObjectsPresent", {}),
             rewardFeedback=payload.get("rewardFeedback", []),
             rewardList=payload.get("rewardList", []),
+            numberPlaced=payload.get("numberPlaced", 0),
+            ObjectsPlaced=payload.get("ObjectsPlaced", {}),
         )

grader.py ADDED Viewed

	@@ -0,0 +1,124 @@

+from sklearn.preprocessing import MinMaxScaler
+import os
+from dotenv import load_dotenv
+from openai import OpenAI
+import json
+from json import JSONDecodeError
+from numpy import average
+load_dotenv()
+API_URL = os.getenv("API_BASE_URL")
+MODEL = os.getenv("MODEL_NAME")
+API_KEY = os.getenv("API_KEY") or os.getenv("HF_TOKEN")
+SYSTEM_PROMPT_GRADING = """
+    You are a professional object sorter who works at the industry level and
+    has a good knowledge about how and where things are to be places, you shall receieve the
+    list of feedbacks from an accomplice hired, you shall rate the feedback on the scale of 0.0 to 1.0 ONLY.
+    Rules:
+    - You shall rate the feedback on the scale of 0.0 to 1.0 ONLY AND also provide a one-line feedback
+    - You WILL STRICTLY ABIDE BY THIS JSON FORMAT:
+        {
+            "grade": float,
+            "feedback": str,
+        }
+    """.strip()
+TEMPERATURE = 0.2
+def _feed_llm(input):
+    if not API_URL or not MODEL or not API_KEY:
+        missing = [
+            name
+            for name, value in (
+                ("API_BASE_URL", API_URL),
+                ("MODEL_NAME", MODEL),
+                ("API_KEY/HF_TOKEN", API_KEY),
+            )
+            if not value
+        ]
+        raise RuntimeError(
+            f"Missing required environment variables: {', '.join(missing)}"
+        )
+    client = OpenAI(
+        base_url=API_URL,
+        api_key=API_KEY,
+    )
+    llm_output = client.chat.completions.create(
+        model=MODEL,
+        messages=[
+            {"role": "system", "content": SYSTEM_PROMPT_GRADING},
+            {"role": "user", "content": f"{input}"},
+        ],
+        temperature=TEMPERATURE,
+    )
+    return llm_output.choices[0].message.content or ""
+def _extract_json_payload(output_str: str):
+    output_str = output_str.strip()
+    if output_str.startswith("```"):
+        lines = output_str.splitlines()
+        if len(lines) >= 3:
+            output_str = "\n".join(lines[1:-1]).strip()
+    start = output_str.find("{")
+    end = output_str.rfind("}")
+    if start == -1 or end == -1 or end < start:
+        raise JSONDecodeError("No JSON object found in model output", output_str, 0)
+    return output_str[start : end + 1]
+def parse_output(output_str):
+    data = json.loads(_extract_json_payload(output_str))
+    return data
+def grade_segmentation(appObs):
+    scaler = MinMaxScaler()
+    reward = appObs.rewardListSegment
+    feedback = appObs.rewardFeedbackSegment
+    scaler.fit(reward)
+    grade = average(scaler.transform(reward))
+    llmOutput = parse_output(_feed_llm(f"Feedback: {feedback}, Reward: {reward}"))
+    outputFeedback = llmOutput.get("feedback", "")
+    outputGrade = llmOutput.get("grade", 0.0)
+    cumulativeGrade = (grade + outputGrade) / 2.0
+    return (grade, outputGrade, cumulativeGrade, outputFeedback)
+def grade_placement(appObs):
+    scaler = MinMaxScaler()
+    reward = appObs.rewardListPlace
+    feedback = appObs.rewardFeedbackPlace
+    scaler.fit(reward)
+    grade = average(scaler.transform(reward))
+    llmOutput = parse_output(_feed_llm(f"Feedback: {feedback}, Reward: {reward}"))
+    outputFeedback = llmOutput.get("feedback", "")
+    outputGrade = llmOutput.get("grade", 0.0)
+    cumulativeGrade = (grade + outputGrade) / 2.0
+    return (grade, outputGrade, cumulativeGrade, outputFeedback)
+def grade_segmentation(appObs):
+    scaler = MinMaxScaler()
+    reward = appObs.rewardListAdjust
+    feedback = appObs.rewardFeedbackAdjust
+    scaler.fit(reward)
+    grade = average(scaler.transform(reward))
+    llmOutput = parse_output(_feed_llm(f"Feedback: {feedback}, Reward: {reward}"))
+    outputFeedback = llmOutput.get("feedback", "")
+    outputGrade = llmOutput.get("grade", 0.0)
+    cumulativeGrade = (grade + outputGrade) / 2.0
+    return (grade, outputGrade, cumulativeGrade, outputFeedback)

inference.py CHANGED Viewed

@@ -3,7 +3,7 @@ from dotenv import load_dotenv
 from openai import OpenAI
 import json
 from json import JSONDecodeError
-import time
 try:
     from models import AppAction, AppObservation
@@ -15,8 +15,14 @@ try:
 except ImportError:
     from app.server.app_environment import AppEnvironment
 load_dotenv()
 API_URL = os.getenv("API_BASE_URL")
 MODEL = os.getenv("MODEL_NAME")
@@ -28,6 +34,7 @@ FALLBACK_ACTION = {
     "isSegmentation": False,
     "placement": {},
     "findObjects": {},
 }
 DEBUG = True
@@ -38,7 +45,8 @@ SYSTEM_PROMPT = """
         1. **Segment objects** in the environment if `isSegmentation=True`.
         2. **Identify objects** and their properties (name, stackable) accurately.
         3. **Place objects** in the 3D grid respecting stacking rules and dimensions.
-        4. **Use rewards and feedback** from previous steps to improve future actions.
         You must strictly return actions that conform to this Pydantic schema:
@@ -47,37 +55,39 @@ SYSTEM_PROMPT = """
             placement: Dict[str, Tuple[int, int, int, bool]]
             isSegmentation: bool
             findObjects: Dict[str, Tuple[int, int, int, bool]]
         }
         Rules:
         - Only report objects that are found or placed; empty dicts are valid if none.
-        - Do not modify objects that are already placed unless instructed.
         - Coordinates must be within the grid bounds.
         - Respect stackable property: non-stackable objects cannot be placed on top of another object.
         - Use previous step’s reward and rewardFeedback to adjust your strategy.
         Output:
         - Always return a valid JSON object conforming to the schema.
         - Do not include any extra text, explanations, or commentary.
-        - If no action is possible, return empty dicts for `placement` and `findObjects`.
         Your goal:
         - Maximize cumulative reward.
         - Identify all objects correctly.
-        - Place objects efficiently while respecting stacking rules.
         - Learn from reward feedback to improve placement in future steps.
         Always return a valid JSON that conforms exactly to the AppAction Pydantic model:
-        {"placement": Dict[str, Tuple[int,int,int,bool]] or {}, "isSegmentation": bool, "findObjects": Dict[str, Tuple[int,int,int,bool]] or {}}
         Actions:
-        - To place an object: {"isSegmentation": false, "placement": {"object_name": [x, y, z, stackable]}, "findObjects": {}}
-        - To segment objects: {"isSegmentation": true, "placement": {}, "findObjects": {"object_name": [x, y, z, stackable]}}
         Do not include explanations, text, or extra fields.
-        If no objects are found or placed, return empty dicts for placement and findObjects.
-        The output must be parseable and valid for AppAction(**json_output).
-    """.strip()
 MESSAGES = [{"role": "system", "content": SYSTEM_PROMPT}]
 HISTORY = []
@@ -125,7 +135,9 @@ def main() -> None:
             )
             if not value
         ]
-        raise RuntimeError(f"Missing required environment variables: {', '.join(missing)}")
     env = AppEnvironment()
     observation: AppObservation = env.reset()
@@ -148,22 +160,33 @@ def main() -> None:
         llm_output = client.chat.completions.create(
             model=MODEL,
-            messages=MESSAGES,
             temperature=TEMPERATURE,
         )
         message_content = llm_output.choices[0].message.content or ""
         action: AppAction = parse_output(message_content)
-        MESSAGES.append({"role": "assistant", "content": message_content})
         observation: AppObservation = env.step(action)
         HISTORY.append(observation)
         if observation.isDone:
             break
-        time.sleep(100)
     print(HISTORY)

 from openai import OpenAI
 import json
 from json import JSONDecodeError
+from numpy import set_printoptions
 try:
     from models import AppAction, AppObservation
 except ImportError:
     from app.server.app_environment import AppEnvironment
+try:
+    from grader import *
+except ImportError:
+    from app.grader import *
 load_dotenv()
+set_printoptions(precision=2, suppress=True)
 API_URL = os.getenv("API_BASE_URL")
 MODEL = os.getenv("MODEL_NAME")
     "isSegmentation": False,
     "placement": {},
     "findObjects": {},
+    "adjust": ("", "", 0),
 }
 DEBUG = True
         1. **Segment objects** in the environment if `isSegmentation=True`.
         2. **Identify objects** and their properties (name, stackable) accurately.
         3. **Place objects** in the 3D grid respecting stacking rules and dimensions.
+        4. **Adjust object positions** if necessary to optimize placement and maximize rewards.
+        5. **Use rewards and feedback** from previous steps to improve future actions.
         You must strictly return actions that conform to this Pydantic schema:
             placement: Dict[str, Tuple[int, int, int, bool]]
             isSegmentation: bool
             findObjects: Dict[str, Tuple[int, int, int, bool]]
+            adjust : Tuple[str, str, int]
         }
         Rules:
         - Only report objects that are found or placed; empty dicts are valid if none.
         - Coordinates must be within the grid bounds.
         - Respect stackable property: non-stackable objects cannot be placed on top of another object.
         - Use previous step’s reward and rewardFeedback to adjust your strategy.
+        - Directions for adjustments for an object can be "UP", "DOWN", "LEFT", "RIGHT", "FORWARD", "BACKWARD", "ROTATE" with a positive integer amount.
         Output:
         - Always return a valid JSON object conforming to the schema.
         - Do not include any extra text, explanations, or commentary.
+        - If no action is possible, return empty dicts for `placement` and `findObjects` and an empty tuple for `adjust`.
         Your goal:
         - Maximize cumulative reward.
         - Identify all objects correctly.
+        - Place objects efficiently while respecting stacking rules (PS: Do not place the objects in the same location as where it is originally found and use adjust function wherever required.)
         - Learn from reward feedback to improve placement in future steps.
         Always return a valid JSON that conforms exactly to the AppAction Pydantic model:
+        {"placement": Dict[str, Tuple[int,int,int,bool]] or {}, "isSegmentation": bool, "findObjects": Dict[str, Tuple[int,int,int,bool]] or {},"adjust": Tuple[str,str,int] or ("", "", 0)}
         Actions:
+        - To place an object: {"isSegmentation": false, "placement": {"object_name": [x, y, z, stackable]}, "findObjects": {}, "adjust":("", "", 0)}
+        - To segment objects: {"isSegmentation": true, "placement": {}, "findObjects": {"object_name": [x, y, z, stackable]}, "adjust":("", "", 0)}
+        - To adjust objects: {"isSegmentation": false, "placement": {}, "findObjects": {}, "adjust":("object_name", "direction", amount)}
+        - To adjust and place objects: {"isSegmentation": false, "placement": {"object_name": [x, y, z, stackable]}, "findObjects": {}, "adjust":("object_name", "direction", amount)}
         Do not include explanations, text, or extra fields.
+        If no objects are found, placed or adjusted, return empty dicts for placement and findObjects and empty tuple for adjust.
+        The output must be parseable and valid for AppAction(**json_output).""".strip()
 MESSAGES = [{"role": "system", "content": SYSTEM_PROMPT}]
 HISTORY = []
             )
             if not value
         ]
+        raise RuntimeError(
+            f"Missing required environment variables: {', '.join(missing)}"
+        )
     env = AppEnvironment()
     observation: AppObservation = env.reset()
         llm_output = client.chat.completions.create(
             model=MODEL,
+            messages=[
+                MESSAGES[0],
+                {
+                    "role": "user",
+                    "content": f"""Observation: {observation.model_dump_json()},
+                    Previous reward: {observation.reward},
+                    Previous reward list: {observation.rewardList},
+                    Previous reward feedback: {observation.rewardFeedback},
+                    Step: {i}""".strip(),
+                },
+            ],
             temperature=TEMPERATURE,
         )
         message_content = llm_output.choices[0].message.content or ""
         action: AppAction = parse_output(message_content)
         observation: AppObservation = env.step(action)
+        MESSAGES.append({"role": "assistant", "content": message_content})
+        print(message_content)
         HISTORY.append(observation)
+        print(observation)
         if observation.isDone:
             break
     print(HISTORY)

models.py CHANGED Viewed

@@ -9,13 +9,20 @@ class AppAction(Action):
     placement: Dict[str, Tuple[int, int, int, bool]] = Field(
         default_factory=dict, description="Placement of the object in a 3D grid"
     )
     isSegmentation: bool = Field(
         default=True, description="Whether the model is segmenting the objects"
     )
     findObjects: Dict[str, Tuple[int, int, int, bool]] = Field(
         default_factory=dict, description="Dictionary of objects"
     )
 class AppObservation(Observation):
     """Observation from the App environment"""
@@ -24,21 +31,26 @@ class AppObservation(Observation):
         default_factory=list,
         description="Current placement of the objects in a 3D grid",
     )
     positions: Dict[str, Tuple[int, int, int, bool]] = Field(
         default_factory=dict,
         description="Dictionary of objects with their positions in the environment",
     )
     objectsLeft: List[str] = Field(
         default_factory=list,
         description="List of unorganised objects left in the environment",
     )
     objectsFound: List[str] = Field(
         default_factory=list,
         description="List of objects found in the environment",
     )
     reward: float = Field(
         default=0.0, description="Reward received after taking the action"
     )
     isDone: bool = Field(default=False, description="Whether the episode has ended")
     rewardFeedback: list[str] = Field(
@@ -51,6 +63,46 @@ class AppObservation(Observation):
         description="List of reward values received after taking the action",
     )
 class AppState(State):
     """State for the App environment"""
@@ -69,13 +121,16 @@ class AppState(State):
         default_factory=list,
         description="List of unorganised objects left in the environment",
     )
     objectsFound: List[str] = Field(
         default_factory=list,
         description="List of objects found in the environment",
     )
     reward: float = Field(
         default=0.0, description="Reward received after taking the action"
     )
     isDone: bool = Field(default=False, description="Whether the episode has ended")
     ObjectsPresent: Dict[str, Tuple[int, int, int, bool]] = Field(
@@ -83,6 +138,11 @@ class AppState(State):
         description="Placed objects and their current positions in the environment",
     )
     rewardFeedback: list[str] = Field(
         default_factory=list,
         description="List of feedback strings describing the reward received after taking the action",
@@ -92,3 +152,38 @@ class AppState(State):
         default_factory=list,
         description="List of reward values received after taking the action",
     )

     placement: Dict[str, Tuple[int, int, int, bool]] = Field(
         default_factory=dict, description="Placement of the object in a 3D grid"
     )
     isSegmentation: bool = Field(
         default=True, description="Whether the model is segmenting the objects"
     )
     findObjects: Dict[str, Tuple[int, int, int, bool]] = Field(
         default_factory=dict, description="Dictionary of objects"
     )
+    adjust: Tuple[str, str, int] = Field(
+        default=("", "", 0),
+        description="Adjustment action for moving or rotating objects. Format: (object_name, direction, amount)",
+    )
 class AppObservation(Observation):
     """Observation from the App environment"""
         default_factory=list,
         description="Current placement of the objects in a 3D grid",
     )
     positions: Dict[str, Tuple[int, int, int, bool]] = Field(
         default_factory=dict,
         description="Dictionary of objects with their positions in the environment",
     )
     objectsLeft: List[str] = Field(
         default_factory=list,
         description="List of unorganised objects left in the environment",
     )
     objectsFound: List[str] = Field(
         default_factory=list,
         description="List of objects found in the environment",
     )
     reward: float = Field(
         default=0.0, description="Reward received after taking the action"
     )
     isDone: bool = Field(default=False, description="Whether the episode has ended")
     rewardFeedback: list[str] = Field(
         description="List of reward values received after taking the action",
     )
+    numberPlaced: int = Field(
+        default=0,
+        description="Number of objects successfully placed in the environment",
+    )
+    ObjectsPlaced: Dict[str, Tuple[int, int, int, bool]] = Field(
+        default_factory=dict,
+        description="Objects that have been successfully placed in the environment",
+    )
+    rewardListSegment: list[float] = Field(
+        default_factory=list,
+        description="List of reward values received after taking the action",
+    )
+    rewardFeedbackSegment: list[str] = Field(
+        default_factory=list,
+        description="List of feedback strings describing the reward received after taking the action",
+    )
+    rewardListPlace: list[float] = Field(
+        default_factory=list,
+        description="List of feedback strings describing the reward received after taking the action",
+    )
+    rewardFeedbackPlace: list[str] = Field(
+        default_factory=list,
+        description="List of feedback strings describing the reward received after taking the action",
+    )
+    rewardListAdjust: list[float] = Field(
+        default_factory=list,
+        description="List of feedback strings describing the reward received after taking the action",
+    )
+    rewardFeedbackAdjust: list[str] = Field(
+        default_factory=list,
+        description="List of feedback strings describing the reward received after taking the action",
+    )
 class AppState(State):
     """State for the App environment"""
         default_factory=list,
         description="List of unorganised objects left in the environment",
     )
     objectsFound: List[str] = Field(
         default_factory=list,
         description="List of objects found in the environment",
     )
     reward: float = Field(
         default=0.0, description="Reward received after taking the action"
     )
     isDone: bool = Field(default=False, description="Whether the episode has ended")
     ObjectsPresent: Dict[str, Tuple[int, int, int, bool]] = Field(
         description="Placed objects and their current positions in the environment",
     )
+    ObjectsPlaced: Dict[str, Tuple[int, int, int, bool]] = Field(
+        default_factory=dict,
+        description="Objects that have been successfully placed in the environment",
+    )
     rewardFeedback: list[str] = Field(
         default_factory=list,
         description="List of feedback strings describing the reward received after taking the action",
         default_factory=list,
         description="List of reward values received after taking the action",
     )
+    numberPlaced: int = Field(
+        default=0,
+        description="Number of objects successfully placed in the environment",
+    )
+    rewardListSegment: list[float] = Field(
+        default_factory=list,
+        description="List of reward values received after taking the action",
+    )
+    rewardFeedbackSegment: list[str] = Field(
+        default_factory=list,
+        description="List of feedback strings describing the reward received after taking the action",
+    )
+    rewardListPlace: list[float] = Field(
+        default_factory=list,
+        description="List of feedback strings describing the reward received after taking the action",
+    )
+    rewardFeedbackPlace: list[str] = Field(
+        default_factory=list,
+        description="List of feedback strings describing the reward received after taking the action",
+    )
+    rewardListAdjust: list[float] = Field(
+        default_factory=list,
+        description="List of feedback strings describing the reward received after taking the action",
+    )
+    rewardFeedbackAdjust: list[str] = Field(
+        default_factory=list,
+        description="List of feedback strings describing the reward received after taking the action",
+    )

server/app.py CHANGED Viewed

@@ -1,6 +1,6 @@
 try:
     from openenv.core.env_server.http_server import create_app
-except Exception as e:  # pragma: no cover
     raise ImportError(
         "openenv is required for the web interface. Install dependencies with '\n    uv sync\n'"
     ) from e
@@ -18,21 +18,21 @@ app = create_app(
     AppAction,
     AppObservation,
     env_name="app",
-    max_concurrent_envs=1,  # increase this number to allow more concurrent WebSocket sessions
 )
-@app.get("/health")
-def health() -> dict[str, str]:
-    return {"status": "ok"}
-@app.get("/")
-def root() -> dict[str, str]:
-    return {
-        "message": "Object Placer API is running",
-        "health": "/health",
-    }
 def main(host: str = "0.0.0.0", port: int = 8000):

 try:
     from openenv.core.env_server.http_server import create_app
+except Exception as e:
     raise ImportError(
         "openenv is required for the web interface. Install dependencies with '\n    uv sync\n'"
     ) from e
     AppAction,
     AppObservation,
     env_name="app",
+    max_concurrent_envs=1,
 )
+# @app.get("/health")
+# def health() -> dict[str, str]:
+#    return {"status": "ok"}
+#
+#
+# @app.get("/")
+# def root() -> dict[str, str]:
+#    return {
+#        "message": "Object Placer API is running",
+#        "health": "/health",
+#    }
 def main(host: str = "0.0.0.0", port: int = 8000):

server/app_environment.py CHANGED Viewed

@@ -22,6 +22,7 @@ class AppEnvironment(Environment):
         self._reset_count = 0
     def _coerce_state(self) -> AppState:
         if isinstance(self._state, AppState):
             return self._state
@@ -46,8 +47,16 @@ class AppEnvironment(Environment):
             reward=0.0,
             isDone=False,
             ObjectsPresent=placed,
             rewardFeedback=[],
             rewardList=[],
         )
     def reset(self) -> AppObservation:
@@ -62,6 +71,14 @@ class AppEnvironment(Environment):
             isDone=self._state.isDone,
             rewardFeedback=self._state.rewardFeedback,
             rewardList=self._state.rewardList,
         )
     def step(self, action: AppAction) -> AppObservation:
@@ -77,6 +94,7 @@ class AppEnvironment(Environment):
             reward -= 10.0
             appendRewardFeedback(
                 state,
                 "No action is of invalid schema or format. Penalty applied.",
                 reward,
             )
@@ -89,24 +107,51 @@ class AppEnvironment(Environment):
                 isDone=state.isDone,
                 rewardFeedback=state.rewardFeedback,
                 rewardList=state.rewardList,
             )
         if action.isSegmentation and action is not None:
             reward += 10.0
-            appendRewardFeedback(state, "Segmentation successful.", reward)
         if action.placement and action is not None:
-            reward += place(action.isSegmentation, action.placement, state)
-            appendRewardFeedback(state, "Object placed successfully.", reward)
         if action.findObjects and action is not None:
             reward += findobject(action.isSegmentation, action.findObjects, state)
-            appendRewardFeedback(state, "Object found successfully.", reward)
-        if len(state.objectsLeft) == 0:
             state.isDone = True
             reward += 10.0
-            appendRewardFeedback(state, "All objects found. Episode completed!", reward)
         state.reward += reward / (10**state.step_count)
@@ -119,6 +164,14 @@ class AppEnvironment(Environment):
             isDone=state.isDone,
             rewardFeedback=state.rewardFeedback,
             rewardList=state.rewardList,
         )
     @property

         self._reset_count = 0
     def _coerce_state(self) -> AppState:
         if isinstance(self._state, AppState):
             return self._state
             reward=0.0,
             isDone=False,
             ObjectsPresent=placed,
+            ObjectsPlaced={},
             rewardFeedback=[],
             rewardList=[],
+            numberPlaced=0,
+            rewardListSegment=[],
+            rewardFeedbackSegment=[],
+            rewardListPlace=[],
+            rewardFeedbackPlace=[],
+            rewardListAdjust=[],
+            rewardFeedbackAdjust=[],
         )
     def reset(self) -> AppObservation:
             isDone=self._state.isDone,
             rewardFeedback=self._state.rewardFeedback,
             rewardList=self._state.rewardList,
+            numberPlaced=self._state.numberPlaced,
+            ObjectsPlaced=self._state.ObjectsPlaced,
+            rewardListSegment=self._state.rewardListSegment,
+            rewardFeedbackSegment=self._state.rewardFeedbackSegment,
+            rewardListPlace=self._state.rewardListPlace,
+            rewardFeedbackPlace=self._state.rewardFeedbackPlace,
+            rewardListAdjust=self._state.rewardListAdjust,
+            rewardFeedbackAdjust=self._state.rewardFeedbackAdjust,
         )
     def step(self, action: AppAction) -> AppObservation:
             reward -= 10.0
             appendRewardFeedback(
                 state,
+                "",
                 "No action is of invalid schema or format. Penalty applied.",
                 reward,
             )
                 isDone=state.isDone,
                 rewardFeedback=state.rewardFeedback,
                 rewardList=state.rewardList,
+                numberPlaced=state.numberPlaced,
+                ObjectsPlaced=state.ObjectsPlaced,
+                rewardListSegment=state.rewardListSegment,
+                rewardFeedbackSegment=state.rewardFeedbackSegment,
+                rewardListPlace=state.rewardListPlace,
+                rewardFeedbackPlace=state.rewardFeedbackPlace,
+                rewardListAdjust=state.rewardListAdjust,
+                rewardFeedbackAdjust=state.rewardFeedbackAdjust,
             )
         if action.isSegmentation and action is not None:
             reward += 10.0
+            appendRewardFeedback(state, "segment", "Segmentation successful.", reward)
         if action.placement and action is not None:
+            placement_reward, placement_failed = place(
+                action.isSegmentation, action.placement, state
+            )
+            reward += placement_reward
+            if placement_failed:
+                appendRewardFeedback(state, "place", "Failed to place object.", reward)
+            else:
+                appendRewardFeedback(
+                    state, "place", "Object placed successfully.", reward
+                )
+        if action.adjust and action is not None:
+            reward += adjustment(action.isSegmentation, action.adjust, state)
+            appendRewardFeedback(
+                state, "adjust", "Object adjusted successfully.", reward
+            )
         if action.findObjects and action is not None:
             reward += findobject(action.isSegmentation, action.findObjects, state)
+            appendRewardFeedback(state, "segment", "Object found successfully.", reward)
+        if (
+            len(state.objectsLeft) == 0
+            and len(state.ObjectsPresent) == state.numberPlaced
+        ):
             state.isDone = True
             reward += 10.0
+            appendRewardFeedback(
+                state, "segment", "All objects found. Episode completed!", reward
+            )
         state.reward += reward / (10**state.step_count)
             isDone=state.isDone,
             rewardFeedback=state.rewardFeedback,
             rewardList=state.rewardList,
+            numberPlaced=state.numberPlaced,
+            ObjectsPlaced=state.ObjectsPlaced,
+            rewardListSegment=state.rewardListSegment,
+            rewardFeedbackSegment=state.rewardFeedbackSegment,
+            rewardListPlace=state.rewardListPlace,
+            rewardFeedbackPlace=state.rewardFeedbackPlace,
+            rewardListAdjust=state.rewardListAdjust,
+            rewardFeedbackAdjust=state.rewardFeedbackAdjust,
         )
     @property

utils.py CHANGED Viewed

@@ -47,10 +47,29 @@ OBJECT_NAMES = [
     "pouch",
 ]
-def appendRewardFeedback(state, feedback, reward):
-    state.rewardFeedback.append(feedback)
     state.rewardList.append(reward)
 def initDimentions(obj):
@@ -124,7 +143,7 @@ def initGrid():
 def initWeightedGrid(shape=None):
     if shape is None:
-        shape = (randint(5, 11), randint(5, 11), randint(5, 11))
     grid = random.uniform(0, 1, shape)
@@ -157,21 +176,27 @@ def _get_weight_value(weight, x, y, z):
 def place(segment, objects, state):
     dims = state.currentGrid
     weight = state.weightedGrid
     reward = 0.0
     totalObjs = len(objects)
     reward_per_obj_placed = 45.0 / totalObjs
-    if segment or segment is None:
         appendRewardFeedback(
-            state, "Placing objects without segmentation is not allowed.", -60.0
         )
         return -60.0
     for obj_name, pos in objects.items():
         obj = OBJECTS.get(obj_name)
         if obj is None:
             appendRewardFeedback(
-                state, f"Object '{obj_name}' is not recognized.", -reward_per_obj_placed
             )
             reward -= reward_per_obj_placed
             continue
@@ -190,6 +215,7 @@ def place(segment, objects, state):
                         reward -= reward_per_obj_placed
                         appendRewardFeedback(
                             state,
                             f"Object '{obj_name}' placement is out of bounds.",
                             -reward_per_obj_placed,
                         )
@@ -200,6 +226,7 @@ def place(segment, objects, state):
                         reward -= reward_per_obj_placed
                         appendRewardFeedback(
                             state,
                             f"Object '{obj_name}' placement overlaps with another object and stacking is not allowed.",
                             -reward_per_obj_placed,
                         )
@@ -223,6 +250,7 @@ def place(segment, objects, state):
                             reward += bonus
                             appendRewardFeedback(
                                 state,
                                 f"Object '{obj_name}' placed with stacking. Bonus: {bonus:.2f}",
                                 bonus,
                             )
@@ -230,6 +258,7 @@ def place(segment, objects, state):
                             reward -= reward_per_obj_placed
                             appendRewardFeedback(
                                 state,
                                 f"Object '{obj_name}' placement failed. No space for stacking.",
                                 -reward_per_obj_placed,
                             )
@@ -245,6 +274,7 @@ def place(segment, objects, state):
                         reward += bonus
                         appendRewardFeedback(
                             state,
                             f"Object '{obj_name}' placed successfully. Bonus: {bonus:.2f}",
                             bonus,
                         )
@@ -254,16 +284,48 @@ def place(segment, objects, state):
                 break
         if not placement_failed:
-            state.ObjectsPresent[obj_name] = pos
-    return reward
 def findobject(segment, objects, state):
     if not segment or segment is None:
         appendRewardFeedback(
-            state, "Finding objects without segmentation is not allowed.", -60.0
         )
         return -60.0
@@ -275,7 +337,10 @@ def findobject(segment, objects, state):
         if pos_real is None:
             reward -= glMetric
             appendRewardFeedback(
-                state, f"Object '{obj_found}' not found in the environment.", -glMetric
             )
             continue
@@ -283,6 +348,7 @@ def findobject(segment, objects, state):
             reward += glMetric
             appendRewardFeedback(
                 state,
                 f"Object '{obj_found}' found with correct position and stacking.",
                 glMetric,
             )
@@ -292,6 +358,7 @@ def findobject(segment, objects, state):
             reward -= mse
             appendRewardFeedback(
                 state,
                 f"Object '{obj_found}' found with incorrect position. MSE: {mse:.2f}",
                 -mse,
             )
@@ -300,6 +367,7 @@ def findobject(segment, objects, state):
             reward -= glMetric / 4.0
             appendRewardFeedback(
                 state,
                 f"Object '{obj_found}' found with incorrect stacking. Penalty: {glMetric / 4.0}",
                 -glMetric / 4.0,
             )
@@ -307,6 +375,7 @@ def findobject(segment, objects, state):
             reward += glMetric / 4.0
             appendRewardFeedback(
                 state,
                 f"Object '{obj_found}' found with correct stacking. Bonus: {glMetric / 4.0}",
                 glMetric / 4.0,
             )
@@ -316,3 +385,96 @@ def findobject(segment, objects, state):
         state.objectsFound.append(obj)
     return reward

     "pouch",
 ]
+ACTION_CONFIG = {
+    "RIGHT": [1, 0, 0],
+    "LEFT": [-1, 0, 0],
+    "UP": [0, 1, 0],
+    "DOWN": [0, -1, 0],
+    "FORWARD": [0, 0, 1],
+    "BACKWARD": [0, 0, -1],
+    "ROTATE": [0, 0, 0],
+}
+def appendRewardFeedback(state, choice, feedback, reward):
     state.rewardList.append(reward)
+    state.rewardFeedback.append(feedback)
+    if choice == "segment":
+        state.rewardFeedbackSegment.append(feedback)
+        state.rewardListSegment.append(reward)
+    elif choice == "place":
+        state.rewardFeedbackPlace.append(feedback)
+        state.rewardListPlace.append(reward)
+    elif choice == "adjust":
+        state.rewardFeedbackAdjust.append(feedback)
+        state.rewardListAdjust.append(reward)
 def initDimentions(obj):
 def initWeightedGrid(shape=None):
     if shape is None:
+        shape = (randint(8, 12), randint(8, 12), randint(8, 12))
     grid = random.uniform(0, 1, shape)
 def place(segment, objects, state):
     dims = state.currentGrid
     weight = state.weightedGrid
+    objsPresent = state.ObjectsPresent
     reward = 0.0
     totalObjs = len(objects)
     reward_per_obj_placed = 45.0 / totalObjs
+    if segment:
         appendRewardFeedback(
+            state, "place", "Placing objects with segmentation is not allowed.", -60.0
         )
         return -60.0
     for obj_name, pos in objects.items():
         obj = OBJECTS.get(obj_name)
         if obj is None:
             appendRewardFeedback(
+                state,
+                "place",
+                f"Object '{obj_name}' is not recognized.",
+                -reward_per_obj_placed,
             )
             reward -= reward_per_obj_placed
             continue
                         reward -= reward_per_obj_placed
                         appendRewardFeedback(
                             state,
+                            "place",
                             f"Object '{obj_name}' placement is out of bounds.",
                             -reward_per_obj_placed,
                         )
                         reward -= reward_per_obj_placed
                         appendRewardFeedback(
                             state,
+                            "place",
                             f"Object '{obj_name}' placement overlaps with another object and stacking is not allowed.",
                             -reward_per_obj_placed,
                         )
                             reward += bonus
                             appendRewardFeedback(
                                 state,
+                                "place",
                                 f"Object '{obj_name}' placed with stacking. Bonus: {bonus:.2f}",
                                 bonus,
                             )
                             reward -= reward_per_obj_placed
                             appendRewardFeedback(
                                 state,
+                                "place",
                                 f"Object '{obj_name}' placement failed. No space for stacking.",
                                 -reward_per_obj_placed,
                             )
                         reward += bonus
                         appendRewardFeedback(
                             state,
+                            "place",
                             f"Object '{obj_name}' placed successfully. Bonus: {bonus:.2f}",
                             bonus,
                         )
                 break
         if not placement_failed:
+            state.ObjectsPlaced[obj_name] = pos
+            state.numberPlaced += 1
+            try:
+                if objsPresent[obj_name] == state.ObjectsPlaced[obj_name]:
+                    reward -= 45.0 / totalObjs
+                    appendRewardFeedback(
+                        state,
+                        "place",
+                        f"Object '{obj_name}' is being placed in the same location",
+                        -reward_per_obj_placed,
+                    )
+            except KeyError:
+                reward -= reward_per_obj_placed
+                appendRewardFeedback(
+                    state,
+                    "place",
+                    f"Object '{obj_name}' is present in the environment, but is placed in same location as originally found.",
+                    -reward_per_obj_placed,
+                )
+                continue
+    return (reward, placement_failed)
 def findobject(segment, objects, state):
     if not segment or segment is None:
         appendRewardFeedback(
+            state,
+            "segment",
+            "Finding objects without segmentation is not allowed.",
+            -60.0,
+        )
+        return -60.0
+    if state.ObjectsPresent == state.objectsFound:
+        appendRewardFeedback(
+            state,
+            "segment",
+            "No point in finding more objects as all are already found Make the IsSegement attribute false and execute the place method.",
+            -60.0,
         )
         return -60.0
         if pos_real is None:
             reward -= glMetric
             appendRewardFeedback(
+                state,
+                "segment",
+                f"Object '{obj_found}' not found in the environment.",
+                -glMetric,
             )
             continue
             reward += glMetric
             appendRewardFeedback(
                 state,
+                "segment",
                 f"Object '{obj_found}' found with correct position and stacking.",
                 glMetric,
             )
             reward -= mse
             appendRewardFeedback(
                 state,
+                "segment",
                 f"Object '{obj_found}' found with incorrect position. MSE: {mse:.2f}",
                 -mse,
             )
             reward -= glMetric / 4.0
             appendRewardFeedback(
                 state,
+                "segment",
                 f"Object '{obj_found}' found with incorrect stacking. Penalty: {glMetric / 4.0}",
                 -glMetric / 4.0,
             )
             reward += glMetric / 4.0
             appendRewardFeedback(
                 state,
+                "segment",
                 f"Object '{obj_found}' found with correct stacking. Bonus: {glMetric / 4.0}",
                 glMetric / 4.0,
             )
         state.objectsFound.append(obj)
     return reward
+def _remove_object(state, obj_name):
+    reward = 0
+    try:
+        pos = state.ObjectsPlaced.pop(obj_name)
+    except KeyError:
+        reward -= 45.0 / len(state.ObjectsPresent)
+        appendRewardFeedback(
+            state,
+            "adjust",
+            f"Object '{obj_name}' is not placed in the environment.",
+            -reward,
+        )
+        return reward
+    state.numberPlaced -= 1
+    dims = state.currentGrid
+    obj = OBJECTS.get(obj_name)
+    objGrid = initDimentions(obj)
+    for i in range(len(objGrid)):
+        for j in range(len(objGrid[0])):
+            for k in range(len(objGrid[0][0])):
+                if dims[pos[0] + i][pos[1] + j][pos[2] + k] > 0:
+                    dims[pos[0] + i][pos[1] + j][pos[2] + k] -= 1
+def _adjustment_helper(state, name, pos, change, direction):
+    _remove_object(state, name)
+    if direction == "ROTATE":
+        newPos = (pos[1], pos[0], pos[2], pos[3])
+    else:
+        newPos = (pos[0] + change[0], pos[1] + change[1], pos[2] + change[2], pos[3])
+    reward, isNotPlaced = place(False, {name: newPos}, state)
+    if isNotPlaced:
+        dummyReward = place(False, {name: pos}, state)[0]
+        appendRewardFeedback(
+            state,
+            "adjust",
+            f"Failed to adjust object '{name}' in direction {direction}. Reverting to original position.",
+            -dummyReward,
+        )
+        return -dummyReward
+    appendRewardFeedback(
+        state,
+        "adjust",
+        f"Object '{name}' moved {direction} successfully.",
+        reward,
+    )
+    return reward
+def adjustment(segment, action, state):
+    objsPlaced = state.ObjectsPlaced
+    if segment:
+        appendRewardFeedback(
+            state, "adjust", "Placing objects with segmentation is not allowed.", -60.0
+        )
+        return -60.0
+    try:
+        initPos = objsPlaced[action[0]]
+        name = action[0]
+    except KeyError:
+        reward_per_obj_placed = 45.0 / len(state.ObjectsPresent)
+        appendRewardFeedback(
+            state,
+            "adjust",
+            f"Object '{action[0]}' is not placed in the environment, so it cannot be adjusted.",
+            -reward_per_obj_placed,
+        )
+        return -reward_per_obj_placed
+    if action[1] in ACTION_CONFIG:
+        reward = _adjustment_helper(
+            state, name, initPos, ACTION_CONFIG.get(action[1]), action[1]
+        )
+        return reward
+    else:
+        reward_per_obj_placed = 45.0 / len(state.ObjectsPresent)
+        appendRewardFeedback(
+            state,
+            "adjust",
+            f"Invalid adjustment direction '{action[1]}'. Valid directions are RIGHT, LEFT, UP, DOWN, FORWARD, BACKWARD, ROTATE.",
+            -reward_per_obj_placed,
+        )
+        return -reward_per_obj_placed