--- library_name: ml-agents tags: - Pyramids - deep-reinforcement-learning - reinforcement-learning - ML-Agents-Pyramids - PPO - Unity model-index: - name: PPO-PyramidsTraining6 results: - task: type: reinforcement-learning name: reinforcement-learning dataset: name: Pyramids type: Unity-MLAgents-Env metrics: - type: mean_reward value: 1.381 name: mean_reward verified: false - type: std_reward value: 0.0 name: std_reward verified: false --- # 🏛️ **PPO Agent on Pyramids** This repository contains a trained **Proximal Policy Optimization (PPO)** agent that plays the **Pyramids** environment using the [Unity ML-Agents Library](https://github.com/Unity-Technologies/ml-agents). --- ## 📊 Model Card **Model Name:** `ppo-PyramidsTraining` **Environment:** `Pyramids` (Unity ML-Agents) **Algorithm:** PPO (Proximal Policy Optimization) **Performance Metric:** - Achieves stable performance in navigating and solving pyramid-based tasks - Demonstrates convergence to an effective policy --- ## 🚀 Usage (with ML-Agents) Documentation: [ML-Agents Toolkit Docs](https://unity-technologies.github.io/ml-agents/ML-Agents-Toolkit-Documentation/) ### Resume Training ```bash mlagents-learn --run-id= --resume ``` ### Load and Run ```python # Example: loading the trained PPO model # (requires Unity ML-Agents setup) model_id = "KraTUZen/ppo-PyramidsTraining" # Select your .nn or .onnx file from the repo ``` --- ## 🧠 Notes - The agent is trained using **PPO**, a robust on-policy algorithm widely used in Unity ML-Agents. - The environment involves **pyramid navigation and puzzle-solving**, requiring precision and strategy. - The trained model is stored as `.nn` or `.onnx` files for direct Unity integration. --- ## 📂 Repository Structure - `Pyramids.nn` / `Pyramids.onnx` → Trained PPO policy - `README.md` → Documentation and usage guide --- ## ✅ Results - The agent learns to navigate pyramid structures and solve tasks efficiently. - Demonstrates stable training and effective policy convergence using PPO. --- ## 🔎 Environment Overview - **Observation Space:** Continuous (agent position, pyramid state, environment features) - **Action Space:** Continuous (movement, interaction) - **Objective:** Solve pyramid-based tasks and maximize rewards - **Reward:** Positive reward for successful task completion, penalties for failures --- ## 📚 Learning Highlights - **Algorithm:** PPO (Proximal Policy Optimization) - **Update Rule:** Clipped surrogate objective to ensure stable updates - **Strengths:** Robust, stable, widely used in Unity ML-Agents - **Limitations:** Requires careful tuning of hyperparameters (clip ratio, learning rate, batch size) --- ## 🎮 Watch Your Agent Play You can watch your agent **directly in your browser**: 1. Visit [Unity ML-Agents on Hugging Face](https://huggingface.co/unity) 2. Find your model ID: `KraTUZen/ppo-PyramidsTraining` 3. Select your `.nn` or `.onnx` file 4. Click **Watch the agent play 👀**