Spaces:

Astocoder
/

quant-gym

Sleeping

App Files Files Community

Astocoder commited on 7 days ago

Commit

89faf8b

1 Parent(s): 65b751b

update changes

Browse files

Files changed (4) hide show

README.md +49 -20
__init__.py +2 -0
inference.py +147 -63
openenv.yaml +5 -0

README.md CHANGED Viewed

@@ -11,6 +11,8 @@ pinned: false
 An OpenEnv-compliant environment that tests AI agents on financial data analysis, market sentiment, and trading strategy evaluation.
 ## 🎯 Overview
 Quant-Gym is a benchmark environment where AI agents can practice:
@@ -21,6 +23,7 @@ Quant-Gym is a benchmark environment where AI agents can practice:
 **This is a research benchmark for evaluating AI reasoning in financial contexts, not a trading tool.**
 ## 📊 Environment Tasks
 | Task | Description | Difficulty |
@@ -29,19 +32,24 @@ Quant-Gym is a benchmark environment where AI agents can practice:
 | **Task 2** | Analyze news headlines and recommend Buy/Sell/Hold with explanation | Medium |
 | **Task 3** | Backtest a trading strategy (momentum/mean reversion) with Sharpe ratio & drawdown | Hard |
 ## 🏗️ API Endpoints
 | Endpoint | Method | Description |
 |----------|--------|-------------|
 | `/` | GET | Welcome message |
 | `/health` | GET | Health check |
 | `/reset` | POST | Reset environment to initial state |
-| `/step` | POST | Execute an action (BUY/SELL/GET_PRICE/BACKTEST/GET_NEWS) |
 | `/state` | GET | Get current environment state |
 | `/tasks` | GET | List all available tasks |
-| `/docs` | GET | Interactive API documentation (FastAPI) |
 ## 🔧 Installation
 ### Prerequisites
 - Python 3.10+
 - Docker (for containerized deployment)
@@ -56,13 +64,14 @@ cd quant-gym-openenv
 # Install dependencies
 pip install -r requirements.txt
-# Set up Hugging Face token ( for LLM features)  (.env file)
- 'HF_TOKEN=your_hf_token_here'
 # Start the server
 python -m uvicorn server.app:app --host 0.0.0.0 --port 8000 --reload
 🎮 Action Schema
 The agent can take the following actions:
@@ -102,9 +111,11 @@ json
     "total_return": 0.18
   }
 }
-🏃 Running the Baseline Agent
 # Set your Hugging Face token
 export HF_TOKEN="your_hf_token_here"
@@ -112,13 +123,15 @@ export HF_TOKEN="your_hf_token_here"
 python inference.py
 Expected Output
 text
-[INFO] HF_TOKEN found (length: 37 chars)
-[START] task=quant-gym env=quant-gym model=meta-llama/Llama-3.2-3B-Instruct
 [STEP] step=1 action=BUY 5 reward=0.15 done=false error=null
 [STEP] step=2 action=GET_PRICE reward=0.05 done=false error=null
 [STEP] step=3 action=SELL 5 reward=0.20 done=false error=null
 ...
 [END] success=true steps=10 score=0.650 rewards=...
 🐳 Docker Deployment
 Build and run with Docker:
@@ -133,6 +146,7 @@ Then access the API at http://localhost:7860
 🌐 Hugging Face Space
 Live demo: https://huggingface.co/spaces/Astocoder/quant-gym
 📁 Project Structure
 text
 quant-gym-openenv/
@@ -140,26 +154,29 @@ quant-gym-openenv/
 ├── inference.py            # Baseline agent script
 ├── models.py               # Pydantic schemas
 ├── openenv.yaml            # OpenEnv configuration
-├── pyproject.toml
 ├── requirements.txt        # Python dependencies
 ├── README.md               # This file
 ├── server/
-│   ├── app.py             # FastAPI server
-│   ├── environment.py     # Trading logic
 │   └── data/
-│       ├── prices.csv     # Market data
-│       └── news.json      # News headlines
-└── graders/
-    ├── task1_grader.py    # Price fetch grader
-    ├── task2_grader.py    # News analysis grader
-    └── task3_grader.py    # Backtest grader
 🔐 Environment Variables
 Variable	Description	Default
-HF_TOKEN	Hugging Face API token	None
-API_BASE_URL	HF API endpoint	https://api-inference.huggingface.co/v1
-MODEL_NAME	LLM model name	meta-llama/Llama-3.2-3B-Instruct
 BASE_URL	Quant-Gym API URL	http://localhost:8000
@@ -172,13 +189,25 @@ Reward Function: Partial progress signals for meaningful learning
 Reproducibility: Static data ensures consistent results
 ⚠️ Disclaimer
 This is a research benchmark environment for evaluating AI agent reasoning. It does not provide financial advice or real trading recommendations. All data is for simulation purposes only.
 📄 License
 MIT License - See LICENSE file for details.
 Built with: Python, FastAPI, OpenEnv, Hugging Face, Docker

 An OpenEnv-compliant environment that tests AI agents on financial data analysis, market sentiment, and trading strategy evaluation.
 ## 🎯 Overview
 Quant-Gym is a benchmark environment where AI agents can practice:
 **This is a research benchmark for evaluating AI reasoning in financial contexts, not a trading tool.**
 ## 📊 Environment Tasks
 | Task | Description | Difficulty |
 | **Task 2** | Analyze news headlines and recommend Buy/Sell/Hold with explanation | Medium |
 | **Task 3** | Backtest a trading strategy (momentum/mean reversion) with Sharpe ratio & drawdown | Hard |
 ## 🏗️ API Endpoints
 | Endpoint | Method | Description |
 |----------|--------|-------------|
 | `/` | GET | Welcome message |
 | `/health` | GET | Health check |
+| `/metadata` | GET | Environment metadata |
+| `/schema` | GET | Action/observation schemas |
 | `/reset` | POST | Reset environment to initial state |
+| `/step` | POST | Execute an action |
 | `/state` | GET | Get current environment state |
 | `/tasks` | GET | List all available tasks |
+| `/docs` | GET | Interactive API documentation |
 ## 🔧 Installation
 ### Prerequisites
 - Python 3.10+
 - Docker (for containerized deployment)
 # Install dependencies
 pip install -r requirements.txt
+# Set up Hugging Face token for LLM features (create .env file)
+echo 'HF_TOKEN=your_hf_token_here' > .env
 # Start the server
 python -m uvicorn server.app:app --host 0.0.0.0 --port 8000 --reload
 🎮 Action Schema
 The agent can take the following actions:
     "total_return": 0.18
   }
 }
+🏃 Running the Baseline Agent
+bash
 # Set your Hugging Face token
 export HF_TOKEN="your_hf_token_here"
 python inference.py
 Expected Output
 text
+[INFO] Starting Quant-Gym Inference
+[START] task=quant-gym env=quant-gym model=gpt-3.5-turbo
 [STEP] step=1 action=BUY 5 reward=0.15 done=false error=null
 [STEP] step=2 action=GET_PRICE reward=0.05 done=false error=null
 [STEP] step=3 action=SELL 5 reward=0.20 done=false error=null
 ...
 [END] success=true steps=10 score=0.650 rewards=...
 🐳 Docker Deployment
 Build and run with Docker:
 🌐 Hugging Face Space
 Live demo: https://huggingface.co/spaces/Astocoder/quant-gym
 📁 Project Structure
 text
 quant-gym-openenv/
 ├── inference.py            # Baseline agent script
 ├── models.py               # Pydantic schemas
 ├── openenv.yaml            # OpenEnv configuration
+├── pyproject.toml          # Python project config
 ├── requirements.txt        # Python dependencies
 ├── README.md               # This file
+├── task1_grader.py         # Price fetch grader
+├── task2_grader.py         # News analysis grader
+├── task3_grader.py         # Backtest grader
 ├── server/
+│   ├── app.py              # FastAPI server
+│   ├── environment.py      # Trading logic
 │   └── data/
+│       ├── prices.csv      # Market data
+│       └── news.json       # News headlines
+└── graders/                # Backup grader folder
+    ├── task1_grader.py
+    ├── task2_grader.py
+    └── task3_grader.py
 🔐 Environment Variables
 Variable	Description	Default
+HF_TOKEN	Hugging Face API token	None (optional)
+API_BASE_URL	LLM API endpoint	None (judge provides)
+API_KEY	LLM API key	None (judge provides)
 BASE_URL	Quant-Gym API URL	http://localhost:8000
 Reproducibility: Static data ensures consistent results
+💡 Unique Innovation
+Unlike traditional trading environments that only measure profit, Quant-Gym rewards explanation quality:
+Agents must explain their reasoning for each trade
+Graders evaluate financial terminology, logical reasoning, and detail
+Promotes transparent, auditable AI decision-making
 ⚠️ Disclaimer
 This is a research benchmark environment for evaluating AI agent reasoning. It does not provide financial advice or real trading recommendations. All data is for simulation purposes only.
 📄 License
 MIT License - See LICENSE file for details.
 Built with: Python, FastAPI, OpenEnv, Hugging Face, Docker

__init__.py ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ """Quant-Gym: Financial Analysis Environment for AI Agents"""
2	+ __version__ = "1.0.0"

inference.py CHANGED Viewed

@@ -18,10 +18,10 @@ TEMPERATURE = 0.7
 MAX_TOKENS = 200
 SUCCESS_SCORE_THRESHOLD = 0.7
-# System prompt
 SYSTEM_PROMPT = textwrap.dedent(
     """
-    You are a financial analyst AI agent. Analyze market data and make trading decisions.
     Available actions:
     - GET_PRICE: Get current stock price
@@ -30,9 +30,18 @@ SYSTEM_PROMPT = textwrap.dedent(
     - BACKTEST [strategy]: Backtest a strategy (momentum or mean_reversion)
     - GET_NEWS: Get latest news headline
     Respond with EXACTLY one action in format: ACTION [parameter]
     Example: BUY 10
     Example: GET_PRICE
     """
 ).strip()
@@ -56,11 +65,14 @@ def log_end(success: bool, steps: int, score: float, rewards: List[float]) -> No
 class QuantGymClient:
     def __init__(self, base_url: str):
         self.base_url = base_url
         self.session = requests.Session()
     def reset(self):
         try:
             response = self.session.post(f"{self.base_url}/reset")
             return response.json()
@@ -68,31 +80,32 @@ class QuantGymClient:
             print(f"[ERROR] Reset failed: {e}", flush=True)
             return {"observation": {"price": 150, "balance": 10000, "holdings": 0, "portfolio_value": 10000}}
-    def step(self, action: str):
         action_upper = action.upper()
         if action_upper == "GET_PRICE":
             payload = {"type": "GET_PRICE"}
         elif action_upper.startswith("BUY"):
-            amount = 5
             if " " in action_upper:
                 try:
                     amount = int(action_upper.split()[1])
                 except:
-                    pass
             payload = {"type": "BUY", "amount": amount}
         elif action_upper.startswith("SELL"):
-            amount = 5
             if " " in action_upper:
                 try:
                     amount = int(action_upper.split()[1])
                 except:
-                    pass
             payload = {"type": "SELL", "amount": amount}
         elif action_upper.startswith("BACKTEST"):
-            payload = {"type": "BACKTEST", "strategy": "momentum"}
-        elif action_upper == "GET_NEWS":
-            payload = {"type": "GET_NEWS", "explanation": "Analyzing market sentiment"}
         else:
             payload = {"type": "GET_PRICE"}
@@ -103,30 +116,95 @@ class QuantGymClient:
             print(f"[ERROR] Step failed: {e}", flush=True)
             return {"observation": {"price": 150, "balance": 10000, "holdings": 0, "portfolio_value": 10000}}
     def close(self):
         self.session.close()
 def get_model_action(client: OpenAI, step: int, observation: dict, history: List[str]) -> str:
     """Get action from LLM using the judge's proxy"""
     user_prompt = textwrap.dedent(
         f"""
-        Step: {step}
-        Current price: ${observation.get('price', 'unknown')}
-        Balance: ${observation.get('balance', 'unknown')}
-        Holdings: {observation.get('holdings', 0)} shares
-        Portfolio value: ${observation.get('portfolio_value', 'unknown')}
-        Latest news: {observation.get('last_news', {}).get('headline', 'No news')}
-        What is your next action? (BUY X, SELL X, GET_PRICE, BACKTEST, or GET_NEWS)
         """
     ).strip()
     try:
-        # CRITICAL: This MUST go through their proxy using BOTH env vars
         completion = client.chat.completions.create(
-            model="gpt-3.5-turbo",  # Their proxy expects this
             messages=[
                 {"role": "system", "content": SYSTEM_PROMPT},
                 {"role": "user", "content": user_prompt},
@@ -135,64 +213,61 @@ def get_model_action(client: OpenAI, step: int, observation: dict, history: List
             max_tokens=MAX_TOKENS,
         )
         text = completion.choices[0].message.content or ""
-        return parse_action_from_response(text)
     except Exception as e:
         print(f"[DEBUG] LLM error: {e}, using fallback", flush=True)
         return fallback_strategy(observation)
-def parse_action_from_response(text: str) -> str:
-    text = text.strip().upper()
-    if text.startswith("BUY"):
-        parts = text.split()
-        if len(parts) > 1 and parts[1].isdigit():
-            return f"BUY {parts[1]}"
-        return "BUY 5"
-    elif text.startswith("SELL"):
-        parts = text.split()
-        if len(parts) > 1 and parts[1].isdigit():
-            return f"SELL {parts[1]}"
-        return "SELL 5"
-    elif text.startswith("BACKTEST"):
-        return "BACKTEST"
-    elif text.startswith("GET_NEWS"):
-        return "GET_NEWS"
-    else:
-        return "GET_PRICE"
-def fallback_strategy(observation: dict) -> str:
     sentiment = observation.get('last_news', {}).get('sentiment', 'neutral')
     if sentiment == 'positive':
-        return "BUY 5"
     elif sentiment == 'negative':
-        return "SELL 5"
     else:
-        return "GET_PRICE"
 async def main() -> None:
     print("[INFO] Starting Quant-Gym Inference", flush=True)
     # CRITICAL CHECK: Both environment variables MUST be set
     if not API_BASE_URL:
-        print("[ERROR] API_BASE_URL environment variable not set!", flush=True)
-        print("[ERROR] This must be provided by the hackathon judge.", flush=True)
-        return
     if not API_KEY:
-        print("[ERROR] API_KEY environment variable not set!", flush=True)
-        print("[ERROR] This must be provided by the hackathon judge.", flush=True)
-        return
-    print(f"[INFO] Using API_BASE_URL: {API_BASE_URL}", flush=True)
-    # Initialize OpenAI client with judge's proxy - MUST use BOTH
-    client = OpenAI(
-        base_url=API_BASE_URL,  # Their proxy URL
-        api_key=API_KEY,        # Their API key
-    )
     env = QuantGymClient(BASE_URL)
@@ -202,21 +277,27 @@ async def main() -> None:
     success = False
     final_score = 0.0
-    log_start(task=TASK_NAME, env=BENCHMARK, model="gpt-3.5-turbo")
     try:
         result = env.reset()
         observation = result.get('observation', {})
         for step in range(1, MAX_STEPS + 1):
-            action_str = get_model_action(client, step, observation, history)
             result = env.step(action_str)
             observation = result.get('observation', {})
-            portfolio_value = observation.get('portfolio_value', 10000)
-            profit_reward = max(0, (portfolio_value - 10000) / 10000)
-            reward = min(1.0, max(0.0, profit_reward))
             done = step >= MAX_STEPS - 1
             error = None
@@ -226,7 +307,8 @@ async def main() -> None:
             log_step(step=step, action=action_str, reward=reward, done=done, error=error)
-            history.append(f"Step {step}: {action_str}")
             if done:
                 break
@@ -236,6 +318,8 @@ async def main() -> None:
     except Exception as e:
         print(f"[ERROR] {e}", flush=True)
         success = False
         final_score = 0.0
     finally:

 MAX_TOKENS = 200
 SUCCESS_SCORE_THRESHOLD = 0.7
+# System prompt for financial analysis
 SYSTEM_PROMPT = textwrap.dedent(
     """
+    It is a financial analyst AI agent. It's goal is to analyze market data and make trading decisions.
     Available actions:
     - GET_PRICE: Get current stock price
     - BACKTEST [strategy]: Backtest a strategy (momentum or mean_reversion)
     - GET_NEWS: Get latest news headline
+    Strategy tips:
+    - Positive news sentiment suggests BUY
+    - Negative news sentiment suggests SELL
+    - Momentum strategy: Buy when price is rising
+    - Mean reversion: Buy when price is low relative to recent average
     Respond with EXACTLY one action in format: ACTION [parameter]
     Example: BUY 10
     Example: GET_PRICE
+    Example: BACKTEST momentum
+    For GET_NEWS, also provide a brief explanation of your analysis.
     """
 ).strip()
 class QuantGymClient:
+    """Client for interacting with Quant-Gym environment"""
     def __init__(self, base_url: str):
         self.base_url = base_url
         self.session = requests.Session()
     def reset(self):
+        """Reset environment"""
         try:
             response = self.session.post(f"{self.base_url}/reset")
             return response.json()
             print(f"[ERROR] Reset failed: {e}", flush=True)
             return {"observation": {"price": 150, "balance": 10000, "holdings": 0, "portfolio_value": 10000}}
+    def step(self, action: str, amount: int = 0, explanation: str = "", strategy: str = ""):
+        """Execute an action"""
         action_upper = action.upper()
         if action_upper == "GET_PRICE":
             payload = {"type": "GET_PRICE"}
+        elif action_upper == "GET_NEWS":
+            payload = {"type": "GET_NEWS", "explanation": explanation if explanation else "Analyzing market sentiment"}
         elif action_upper.startswith("BUY"):
             if " " in action_upper:
                 try:
                     amount = int(action_upper.split()[1])
                 except:
+                    amount = 5
             payload = {"type": "BUY", "amount": amount}
         elif action_upper.startswith("SELL"):
             if " " in action_upper:
                 try:
                     amount = int(action_upper.split()[1])
                 except:
+                    amount = 5
             payload = {"type": "SELL", "amount": amount}
         elif action_upper.startswith("BACKTEST"):
+            if " " in action_upper:
+                strategy = action_upper.split()[1]
+            payload = {"type": "BACKTEST", "strategy": strategy if strategy else "momentum"}
         else:
             payload = {"type": "GET_PRICE"}
             print(f"[ERROR] Step failed: {e}", flush=True)
             return {"observation": {"price": 150, "balance": 10000, "holdings": 0, "portfolio_value": 10000}}
+    def get_tasks(self):
+        """Get available tasks"""
+        try:
+            response = self.session.get(f"{self.base_url}/tasks")
+            return response.json()
+        except Exception as e:
+            print(f"[ERROR] Get tasks failed: {e}", flush=True)
+            return {"tasks": []}
     def close(self):
+        """Close the session"""
         self.session.close()
+def parse_action_from_response(text: str) -> str:
+    """Parse LLM response into action string"""
+    text = text.strip().upper()
+    if text.startswith("BUY"):
+        parts = text.split()
+        if len(parts) > 1 and parts[1].isdigit():
+            return f"BUY {parts[1]}"
+        return "BUY 5"
+    elif text.startswith("SELL"):
+        parts = text.split()
+        if len(parts) > 1 and parts[1].isdigit():
+            return f"SELL {parts[1]}"
+        return "SELL 5"
+    elif text.startswith("BACKTEST"):
+        parts = text.split()
+        if len(parts) > 1:
+            return f"BACKTEST {parts[1]}"
+        return "BACKTEST momentum"
+    elif text.startswith("GET_NEWS"):
+        return "GET_NEWS"
+    else:
+        return "GET_PRICE"
+def fallback_strategy(observation: dict) -> str:
+    """Rule-based strategy when LLM is unavailable"""
+    sentiment = observation.get('last_news', {}).get('sentiment', 'neutral')
+    if sentiment == 'positive':
+        return "BUY 5"
+    elif sentiment == 'negative':
+        return "SELL 5"
+    else:
+        return "GET_PRICE"
 def get_model_action(client: OpenAI, step: int, observation: dict, history: List[str]) -> str:
     """Get action from LLM using the judge's proxy"""
+    # If no API credentials, use fallback
+    if not API_BASE_URL or not API_KEY:
+        print("[DEBUG] No API credentials, using fallback strategy", flush=True)
+        return fallback_strategy(observation)
+    # Get news headline for context
+    news = observation.get('last_news', {})
+    headline = news.get('headline', 'No recent news')
+    sentiment = news.get('sentiment', 'neutral')
     user_prompt = textwrap.dedent(
         f"""
+        Step: {step} of {MAX_STEPS}
+        Current Market Data:
+        - Price: ${observation.get('price', 'unknown')}
+        - Balance: ${observation.get('balance', 'unknown')}
+        - Holdings: {observation.get('holdings', 0)} shares
+        - Portfolio Value: ${observation.get('portfolio_value', 'unknown')}
+        Latest News:
+        - Headline: "{headline}"
+        - Sentiment: {sentiment}
+        Previous actions this episode:
+        {chr(10).join(history[-5:]) if history else "No previous actions"}
+        Based on this information, what is your next action?
+        Respond with EXACTLY one action in format: ACTION [parameter]
+        Examples: BUY 10, SELL 5, GET_PRICE, BACKTEST momentum, GET_NEWS
         """
     ).strip()
     try:
         completion = client.chat.completions.create(
+            model="gpt-3.5-turbo",
             messages=[
                 {"role": "system", "content": SYSTEM_PROMPT},
                 {"role": "user", "content": user_prompt},
             max_tokens=MAX_TOKENS,
         )
         text = completion.choices[0].message.content or ""
+        action = parse_action_from_response(text)
+        print(f"[DEBUG] LLM suggested: {text[:100]}... -> {action}", flush=True)
+        return action
     except Exception as e:
         print(f"[DEBUG] LLM error: {e}, using fallback", flush=True)
         return fallback_strategy(observation)
+def calculate_reward(observation: dict, step: int) -> float:
+    """Calculate reward based on portfolio performance and actions"""
+    portfolio_value = observation.get('portfolio_value', 10000)
+    price = observation.get('price', 150)
+    # Profit reward (0 to 0.6)
+    profit_reward = max(0, (portfolio_value - 10000) / 10000) * 0.6
+    # News sentiment bonus (0 to 0.2)
     sentiment = observation.get('last_news', {}).get('sentiment', 'neutral')
     if sentiment == 'positive':
+        sentiment_bonus = 0.2
     elif sentiment == 'negative':
+        sentiment_bonus = -0.1
     else:
+        sentiment_bonus = 0.05
+    # Step completion bonus (0 to 0.2)
+    step_bonus = min(0.2, step / MAX_STEPS * 0.2)
+    reward = max(0.0, min(1.0, profit_reward + sentiment_bonus + step_bonus))
+    return reward
 async def main() -> None:
     print("[INFO] Starting Quant-Gym Inference", flush=True)
+    print(f"[INFO] Python version: {os.sys.version}", flush=True)
     # CRITICAL CHECK: Both environment variables MUST be set
     if not API_BASE_URL:
+        print("[WARNING] API_BASE_URL environment variable not set!", flush=True)
+        print("[WARNING] Using fallback strategy without LLM.", flush=True)
+    else:
+        print(f"[INFO] API_BASE_URL: {API_BASE_URL}", flush=True)
     if not API_KEY:
+        print("[WARNING] API_KEY environment variable not set!", flush=True)
+        print("[WARNING] Using fallback strategy without LLM.", flush=True)
+    # Initialize OpenAI client if credentials available
+    client = None
+    if API_BASE_URL and API_KEY:
+        try:
+            client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
+            print("[INFO] OpenAI client initialized successfully", flush=True)
+        except Exception as e:
+            print(f"[WARNING] Failed to initialize OpenAI client: {e}", flush=True)
     env = QuantGymClient(BASE_URL)
     success = False
     final_score = 0.0
+    log_start(task=TASK_NAME, env=BENCHMARK, model="gpt-3.5-turbo" if client else "fallback-rule-based")
     try:
+        # Reset environment
         result = env.reset()
         observation = result.get('observation', {})
+        print(f"[INFO] Reset complete. Initial price: ${observation.get('price', 'unknown')}", flush=True)
         for step in range(1, MAX_STEPS + 1):
+            # Get action from LLM or fallback
+            if client:
+                action_str = get_model_action(client, step, observation, history)
+            else:
+                action_str = fallback_strategy(observation)
+            # Execute action
             result = env.step(action_str)
             observation = result.get('observation', {})
+            # Calculate reward
+            reward = calculate_reward(observation, step)
             done = step >= MAX_STEPS - 1
             error = None
             log_step(step=step, action=action_str, reward=reward, done=done, error=error)
+            # Update history
+            history.append(f"Step {step}: {action_str} -> reward {reward:.2f}")
             if done:
                 break
     except Exception as e:
         print(f"[ERROR] {e}", flush=True)
+        import traceback
+        traceback.print_exc()
         success = False
         final_score = 0.0
     finally:

openenv.yaml CHANGED Viewed

@@ -26,6 +26,11 @@ tasks:
     grader: "task3_grader.grade_task3"
     max_score: 1.0
 action_schema:
   type: "object"
   properties:

     grader: "task3_grader.grade_task3"
     max_score: 1.0
+graders:
+  task1: "task1_grader.grade_task1"
+  task2: "task2_grader.grade_task2"
+  task3: "task3_grader.grade_task3"
 action_schema:
   type: "object"
   properties: