Spaces:

kan0621
/

Stimuli_Generator

Running

App Files Files Community

kan0621 commited on Jan 26

Commit

bbe4066

verified ·

1 Parent(s): ee1c958

Version 1.0

Browse files

Files changed (5) hide show

Dockerfile +30 -0
README.md +579 -6
backend.py +1436 -0
index.html +414 -0
requirements.txt +9 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,30 @@

+FROM python:3.11-slim
+WORKDIR /app
+# Install system dependencies
+RUN apt-get update && apt-get install -y \
+    gcc \
+    && rm -rf /var/lib/apt/lists/*
+# Copy requirements first for better caching
+COPY requirements.txt .
+# Install Python dependencies
+RUN pip install --no-cache-dir -r requirements.txt
+# Copy application files
+COPY . .
+# Create necessary directories
+RUN mkdir -p sessions
+# Expose port
+EXPOSE 7860
+# Set environment variables for production
+ENV PRODUCTION=true
+ENV PORT=7860
+# Run the application
+CMD ["python", "app.py"]

README.md CHANGED Viewed

@@ -1,12 +1,585 @@
 ---
 title: Stimuli Generator
-emoji: 💻
-colorFrom: yellow
-colorTo: indigo
 sdk: docker
-pinned: false
 license: apache-2.0
-short_description: Multi-Agent Stimulus Material Generation Tool
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
 title: Stimuli Generator
+emoji: ⚡
+colorFrom: blue
+colorTo: purple
 sdk: docker
+pinned: true
 license: apache-2.0
+short_description: A LLM-based Stimulus Material Generation Tool
 ---
+# 🧠 Stimulus Generator
+<div align="center">
+![Version](https://img.shields.io/badge/Version-1.0.0-blue)
+![License](https://img.shields.io/badge/License-Apache%202.0-green)
+![Python Version](https://img.shields.io/badge/Python-3.8+-yellow)
+[![Documentation](https://img.shields.io/badge/Documentation-Latest-brightgreen)](https://github.com/xufengduan/Stimuli_generator)
+</div>
+<p align="center">
+  <b>A Large Language Model-based Stimulus Material Generation Tool for Psycholinguistic Research</b>
+</p>
+<div align="center">
+  <a href="#english">English</a> | <a href="#chinese" onclick="document.getElementById('chinese-content').open = true;">中文</a>
+</div>
+---
+<a id="english"></a>
+## 📖 Project Introduction
+Stimulus Generator is a tool based on large language models designed specifically for psycholinguistic research to generate stimulus materials. It can automatically generate experimental stimulus materials that meet requirements based on the experimental design and examples defined by researchers. This tool adopts a multi-agent architecture, including Generator, Validator, and Scorer, to ensure that the generated stimulus materials meet experimental requirements and are of high quality.
+## ✨ Main Features
+- **🤖 Multi-agent Architecture**:
+  - **Generator**: Generates stimulus materials based on experimental design
+  - **Validator**: Verifies whether the generated materials meet experimental requirements
+  - **Scorer**: Scores materials in multiple dimensions
+- **🔄 Flexible Model Selection**:
+  - Supports GPT-4 (requires OpenAI API Key)
+  - Supports Meta Llama 3.3 70B Instruct model
+- **📊 Real-time Progress Monitoring**:
+  - WebSocket real-time updates of generation progress
+  - Detailed log information display
+  - Generation process can be stopped at any time
+- **🎨 User-friendly Interface**:
+  - Intuitive form design
+  - Real-time validation and feedback
+  - Responsive layout design
+  - Detailed help information
+## 💻 System Requirements
+| Requirement | Details |
+|-------------|---------|
+| Python | 3.8 or higher |
+| Browser | Modern web browser (Chrome, Firefox, Safari, etc.) |
+| Network | Stable network connection |
+| Socket.IO | Client version 4.x (compatible with server side) |
+## 🚀 Installation Instructions
+### Clone directly from GitHub repository
+```bash
+# 1. Clone the project code
+git clone https://github.com/xufengduan/Stimuli_generator.git
+cd Stimuli_generator
+# 2. Create and activate a virtual environment (recommended)
+python -m venv venv
+# Windows
+venv\Scripts\activate
+# Linux/Mac
+source venv/bin/activate
+# 3. Install required dependencies
+pip install -e .
+```
+## 📝 Usage Instructions
+### Launch Web Interface
+After installation, you can use the command-line tool to start the web interface:
+```bash
+stimulus-generator webui
+```
+By default, the web interface will run at http://127.0.0.1:5001.
+### Command Line Arguments
+```bash
+stimulus-generator webui --port 5001
+```
+| Argument | Description |
+|----------|-------------|
+| `--host` | Specify host address (default: 0.0.0.0) |
+| `--port` | Specify port number (default: 5001) |
+| `--debug` | Enable debug mode |
+| `--share` | Create public link (requires additional dependencies) |
+## 🎯 Usage Steps
+### 1. Configure Generation Parameters
+#### 1.1 Select Language Model
+![Select Language Model](static/images/select_model.png)
+Choose between:
+- GPT-4 (requires OpenAI API Key)
+- Meta Llama 3.3 70B Instruct model
+#### 1.2 Enter API Key (if using GPT-4)
+![Enter API Key](static/images/enter_api_key.png)
+If you selected GPT-4, enter your OpenAI API Key in the designated field.
+#### 1.3 Add Example Stimulus Materials
+![Add Example Materials](static/images/add_examples.png)
+Components are the building blocks of your stimuli. For example, in a study investigating contextual predictability on word choice:
+- A word pair (e.g., math/mathematics)
+- Supportive context (high predictability)
+- Neutral context
+Each component should be filled with its corresponding content. For instance:
+- Word pair: "math/mathematics"
+- Supportive context: "The student solved the simple arithmetic problem using basic..."
+- Neutral context: "The student was working on a problem that required..."
+To add more examples:
+1. Complete all components for the first item
+2. Click "Add Item" in the bottom right corner
+3. Repeat for additional examples (recommended: at least 3 examples)
+#### 1.4 Fill in Experimental Design Description
+![Experimental Design](static/images/experimental_design.png)
+When writing your experimental design description, include these key components:
+1. **Purpose of the Stimuli**
+   - Explain the experiment's goal
+   - Describe how the stimuli support this goal
+   - Example: "We are designing stimuli for an experiment investigating whether people prefer shorter words in predictive contexts."
+2. **Core Structure of Each Stimulus Item**
+   - Describe the components of each item
+   - Example: "Each stimulus item includes a word pair and two contexts."
+3. **Detailed Description of Each Element**
+   For each component, specify:
+   - What it is
+   - How it's constructed
+   - What constraints apply
+   - What to avoid
+   - Example: "The word pair consists of a short and a long form of the same word... Avoid pairs where either word forms part of a fixed or common phrase."
+4. **Experimental Conditions or Variants**
+   Explain:
+   - Definition of each condition
+   - Construction criteria
+   - Matching constraints
+   - Example: "The supportive context should strongly predict the missing final word... The two contexts should be matched for length."
+5. **Example Item**
+   Include at least one complete example with labeled parts.
+6. **Formatting Guidelines**
+   Note any specific formatting or submission requirements.
+#### 1.5 Review Auto-generated Properties
+![Review Properties](static/images/review_properties.png)
+After completing the experimental design:
+1. Click "Auto-generate Properties"
+2. The system will automatically set:
+   - Validation conditions
+   - Scoring dimensions
+3. **Important**: Review and adjust these auto-generated properties as needed
+### 2. Start Generation
+![Start Generation](static/images/Generating.gif)
+1. Click the "Generate stimulus" button
+2. Monitor progress in real-time
+3. View detailed logs
+4. Use "Stop" button if needed
+### 3. Get Results
+![Get Results](static/images/get_results.png)
+- CSV file automatically downloads upon completion
+- Contains generated materials and scores
+## 📂 File Structure
+```
+Stimulus-Generator/
+├── stimulus_generator/    # Main Python package
+│   ├── __init__.py        # Package initialization file
+│   ├── app.py             # Flask backend server
+│   ├── backend.py         # Core backend functionality
+│   └── cli.py             # Command line interface
+├── run.py                 # Quick start script
+├── setup.py               # Package installation configuration
+├── static/
+│   ├── script.js          # Frontend JavaScript code
+│   ├── styles.css         # Page stylesheet
+│   └── Stimulus Generator Web Logo.png  # Website icon
+├── webpage.html           # Main page
+├── requirements.txt       # Python dependency list
+└── README.md              # Project documentation
+```
+## 🛠️ Command Line Tools
+After installation, you can use the following command-line tools:
+```bash
+# Launch web interface
+stimulus-generator webui [--host HOST] [--port PORT] [--debug] [--share]
+# View help
+stimulus-generator --help
+```
+If you don't want to install, you can also run directly:
+```bash
+# After cloning the repository, run in the project directory
+python run.py webui
+```
+## ⚠️ Notes
+1. **API Key Security**:
+   - Please keep your OpenAI API Key secure
+   - Do not expose API Keys in public environments
+2. **Generation Process**:
+   - Generation process may take some time, please be patient
+   - You can monitor generation status in real time through the log panel
+   - You can stop generation at any time if there are issues
+3. **Results Usage**:
+   - It is recommended to check if the generated materials meet experimental requirements
+   - Manual screening or modification of generated materials may be needed
+## ❓ FAQ
+<details>
+<summary><b>What to do if the generation process gets stuck?</b></summary>
+<br>
+- Check if the network connection is normal
+- Click the "Stop" button to stop the current generation
+- Refresh the page to restart
+- If the page is unresponsive for a long time, wait for 30 seconds and the system will automatically unlock the interface
+</details>
+<details>
+<summary><b>How to solve WebSocket connection errors?</b></summary>
+<br>
+- Ensure that the network environment does not block WebSocket connections
+- If you see WebSocket error messages, refresh the page to re-establish the connection
+- Restart the server or try using a different browser
+- WebSocket connection issues will not affect main functionality, the system has automatic recovery mechanisms
+</details>
+<details>
+<summary><b>How to optimize generation quality?</b></summary>
+<br>
+- Provide more detailed examples
+- Improve experimental design description
+- Set appropriate validation conditions
+</details>
+<details>
+<summary><b>How to handle slow generation speed?</b></summary>
+<br>
+- Consider reducing the number of items to generate
+- Ensure stable network connection
+- Choose a model with faster response
+</details>
+## 📞 Technical Support
+For questions or suggestions, please contact:
+- Submit an [Issue](https://github.com/xufengduan/Stimuli_generator/issues)
+- Send an email to: ...
+## 📄 License
+This project is licensed under the [Apache License 2.0](LICENSE). See the LICENSE file for details.
+---
+<details id="chinese-content">
+<summary><a id="chinese"></a>中文版本 (Chinese Version)</summary>
+## 📖 项目简介
+Stimulus Generator 是一个基于大语言模型的刺激材料生成工具，专门为心理语言学研究设计。它能够根据研究者定义的实验设计和示例，自动生成符合要求的实验刺激材料。该工具采用多代理架构，包含生成器(Generator)、验证器(Validator)和评分器(Scorer)三个代理，确保生成的刺激材料满足实验要求并具有良好的质量。
+## ✨ 主要特点
+- **🤖 多代理架构**：
+  - **Generator**：根据实验设计生成刺激材料
+  - **Validator**：验证生成的材料是否符合实验要求
+  - **Scorer**：对材料进行多维度评分
+- **🔄 灵活的模型选择**：
+  - 支持 GPT-4 (需要 OpenAI API Key)
+  - 支持 Meta Llama 3.3 70B Instruct 模型
+- **📊 实时进度监控**：
+  - WebSocket 实时更新生成进度
+  - 详细的日志信息显示
+  - 可随时停止生成过程
+- **🎨 用户友好界面**：
+  - 直观的表单设计
+  - 实时验证和反馈
+  - 响应式布局设计
+  - 详细的帮助信息提示
+## 💻 系统要求
+| 要求 | 详情 |
+|------|------|
+| Python | 3.8 或更高版本 |
+| 浏览器 | 现代网页浏览器（Chrome、Firefox、Safari 等） |
+| 网络 | 稳定的网络连接 |
+| Socket.IO | 客户端版本 4.x（与服务器端兼容） |
+## 🚀 安装说明
+### 直接从GitHub仓库中克隆
+```bash
+# 1. 克隆项目代码
+git clone https://github.com/xufengduan/Stimuli_generator.git
+cd Stimuli_generator
+# 2. 创建并激活虚拟环境（推荐）
+python -m venv venv
+# Windows
+venv\Scripts\activate
+# Linux/Mac
+source venv/bin/activate
+# 3. 安装项目所需依赖
+pip install -e .
+```
+## 📝 使用说明
+### 启动Web界面
+安装完成后，可以直接使用命令行工具启动Web界面：
+```bash
+stimulus-generator webui
+```
+默认情况下，Web界面将在 http://127.0.0.1:5001 上运行。
+### 命令行参数
+```bash
+stimulus-generator webui --port 5001
+```
+| 参数 | 描述 |
+|------|------|
+| `--host` | 指定主机地址（默认：0.0.0.0） |
+| `--port` | 指定端口号（默认：5001） |
+| `--debug` | 启用调试模式 |
+| `--share` | 创建公共链接（需要安装额外依赖） |
+## 🎯 使用步骤
+### 1. 配置生成参数
+#### 1.1 选择语言模型
+![选择语言模型](static/images/select_model.png)
+可选择：
+- GPT-4（需要 OpenAI API Key）
+- Meta Llama 3.3 70B Instruct 模型
+#### 1.2 输入 API Key（如果使用 GPT-4）
+![输入 API Key](static/images/enter_api_key.png)
+如果选择了 GPT-4，请在指定字段中输入您的 OpenAI API Key。
+#### 1.3 添加示例刺激材料
+![添加示例材料](static/images/add_examples.png)
+组件（Components）是刺激材料的组成部分。例如，在研究语境可预测性对词汇选择的影响时：
+- 词对（例如：math/mathematics）
+- 支持性语境（高可预测性）
+- 中性语境
+每个组件都需要填写相应的内容。例如：
+- 词对："math/mathematics"
+- 支持性语境："学生使用基本的算术解决了这个简单的问题..."
+- 中性语境："学生正在解决一个需要..."
+添加更多示例：
+1. 完成第一个项目的所有组件
+2. 点击右下角的"添加项目"按钮
+3. 重复上述步骤添加更多示例（建议至少添加3个示例）
+#### 1.4 填写实验设计说明
+![实验设计](static/images/experimental_design.png)
+在编写实验设计说明时，请包含以下关键部分：
+1. **刺激材料的目的**
+   - 解释实验目标
+   - 描述刺激材料如何支持这个目标
+   - 示例："我们正在设计用于研究人们在可预测语境中是否倾向于使用较短词汇的实验刺激材料。"
+2. **每个刺激项目的核心结构**
+   - 描述每个项目的组成部分
+   - 示例："每个刺激项目包含一个词对和两个语境。"
+3. **每个元素的详细描述**
+   对于每个组件，请说明：
+   - 它是什么
+   - 如何构建
+   - 适用的约束条件
+   - 需要避免的内容
+   - 示例："词对由同一个词的短形式和长形式组成...避免使用固定搭配或常见短语中的词。"
+4. **实验条件或变体**
+   说明：
+   - 每个条件的定义
+   - 构建标准
+   - 匹配约束
+   - 示例："支持性语境应该强烈预测缺失的最后一个词...两个语境应该在长度上匹配。"
+5. **示例项目**
+   包含至少一个完整的示例，并标注各个部分。
+6. **格式指南**
+   注明任何特定的格式或提交要求。
+#### 1.5 检查自��生成的属性
+![检查属性](static/images/review_properties.png)
+完成实验设计后：
+1. 点击"自动生成属性"按钮
+2. 系统将自动设置：
+   - 验证条件
+   - 评分维度
+3. **重要**：请检查并根据需要调整这些自动生成的属性
+### 2. 开始生成
+![开始生成](static/images/Generating.gif)
+1. 点击"生成刺激材料"按钮
+2. 实时监控进度
+3. 查看详细日志
+4. 必要时使用"停止"按钮
+### 3. 获取结果
+![获取结果](static/images/get_results.png)
+- 完成后自动下载 CSV 格式的结果文件
+- 包含生成的刺激材料及其评分
+## 📂 文件结构
+```
+Stimulus-Generator/
+├── stimulus_generator/    # 主Python包
+│   ├── __init__.py        # 包初始化文件
+│   ├── app.py             # Flask 后端服务器
+│   ├── backend.py         # 后端核心功能
+│   └── cli.py             # 命令行接口
+├── run.py                 # 快速启动脚本
+├── setup.py               # 包安装配置
+├── static/
+│   ├── script.js          # 前端 JavaScript 代码
+│   ├── styles.css         # 页面样式表
+│   └── Stimulus Generator Web Logo.png  # 网站图标
+├── webpage.html           # 主页面
+├── requirements.txt       # Python 依赖列表
+└── README.md              # 项目说明文档
+```
+## 🛠️ 命令行工具
+安装后，可以使用以下命令行工具：
+```bash
+# 启动Web界面
+stimulus-generator webui [--host HOST] [--port PORT] [--debug] [--share]
+# 查看帮助
+stimulus-generator --help
+```
+如果您不想安装，也可以直接使用以下方式运行：
+```bash
+# 克隆仓库后，在项目目录中运行
+python run.py webui
+```
+## ⚠️ 注意事项
+1. **API 密钥安全**：
+   - 请妥善保管您的 OpenAI API Key
+   - 不要在公共环境中暴露 API Key
+2. **生成过程**：
+   - 生成过程可能需要一定时间，请耐心等待
+   - 可以通过日志面板实时监控生成状态
+   - 如遇到问题可以随时停止生成
+3. **结果使用**：
+   - 建议检查生成的材料是否符合实验要求
+   - 可能需要对生成的材料进行人工筛选或修改
+## ❓ 常见问题
+<details>
+<summary><b>生成过程卡住怎么办？</b></summary>
+<br>
+   - 检查网络连接是否正常
+   - 点击 "Stop" 按钮停止当前生成
+   - 刷新页面重新开始
+   - 如果页面长时间无响应，可以等待30秒，系统会自动解除界面锁定
+</details>
+<details>
+<summary><b>WebSocket连接错误如何解决？</b></summary>
+<br>
+   - 确保网络环境没有阻止WebSocket连接
+   - 如果看到WebSocket错误信息，可以刷新页面重新建立连接
+   - 重启服务器或尝试使用不同的浏览器
+   - WebSocket连接问题不会影响主要功能，系统有自动恢复机制
+</details>
+<details>
+<summary><b>如何优化生成质量？</b></summary>
+<br>
+   - 提供更多详细的示例
+   - 完善实验设计说明
+   - 设置合适的验证条件
+</details>
+<details>
+<summary><b>生成速度较慢怎么处理？</b></summary>
+<br>
+   - 考虑减少生成数量
+   - 确保网络连接稳定
+   - 选择响应更快的模型
+</details>
+## 📞 技术支持
+如有问题或建议，请通过以下方式联系：
+- 提交 [Issue](https://github.com/xufengduan/Stimuli_generator/issues)
+- 发送邮件至：...
+## 📄 许可证
+本项目采用 [Apache License 2.0](LICENSE) 许可证。详见 LICENSE 文件。
+</details>

backend.py ADDED Viewed

	@@ -0,0 +1,1436 @@

+import json
+import os
+import queue
+import random
+import time
+import traceback
+from abc import ABC, abstractmethod
+from multiprocessing import Process, Queue
+import openai
+import pandas as pd
+import requests
+from requests.exceptions import RequestException, Timeout, ConnectionError as RequestsConnectionError
+# Set OpenAI API key
+# openai.api_key = ""
+# Set Chutes AI API key (commented out)
+# Use multiprocessing to implement real timeout mechanism
+def _timeout_target(queue, func, args, kwargs):
+    """multiprocessing target function, must be defined at module level to be pickled"""
+    try:
+        result = func(*args, **kwargs)
+        queue.put(('success', result))
+    except Exception as e:
+        tb = traceback.format_exc()
+        print(f"Exception in subprocess:\n{tb}")
+        queue.put(('error', f"{type(e).__name__}: {str(e)}\n{tb}"))
+def call_with_timeout(func, args, kwargs, timeout_seconds=60):
+    """use multiprocessing to implement API call timeout, can force terminate"""
+    queue = Queue()
+    process = Process(target=_timeout_target, args=(queue, func, args, kwargs))
+    process.start()
+    process.join(timeout_seconds)
+    if process.is_alive():
+        # force terminate process
+        process.terminate()
+        process.join()
+        print(
+            f"API call timed out after {timeout_seconds} seconds and process was terminated")
+        return {"error": f"API call timed out after {timeout_seconds} seconds"}
+    try:
+        result_type, result = queue.get_nowait()
+        if result_type == 'success':
+            return result
+        else:
+            return {"error": result}
+    except queue.Empty:
+        return {"error": "Process completed but no result returned"}
+# ======================
+# 1. Configuration (Prompt + Schema)
+# ======================
+# ---- Agent 1 Prompt ----
+AGENT_1_PROMPT_TEMPLATE = """\
+Please help me construct one item as stimuli for a psycholinguistic experiment based on the description:
+Experimental stimuli design: {experiment_design}
+Existing stimuli (DO NOT repeat any of these): {previous_stimuli}
+Previously rejected stimuli with validation feedback (learn from these failures and avoid similar issues):
+{rejected_stimuli}
+CRITICAL REQUIREMENTS:
+1. Generate a COMPLETELY NEW and UNIQUE stimulus that is DIFFERENT from ALL existing stimuli above.
+2. Do NOT repeat or slightly modify any existing stimulus - create something entirely original.
+3. Avoid any content that overlaps with existing or rejected stimuli.
+4. Learn from the rejected stimuli above - understand why they failed validation and avoid making similar mistakes.
+{generation_requirements}
+Please return in JSON format.
+"""
+# ---- Agent 2 Prompt ----
+AGENT_2_PROMPT_TEMPLATE = """\
+Please verify the following NEW STIMULUS with utmost precision, ensuring they meet the Experimental stimuli design and following strict criteria.
+NEW STIMULUS: {new_stimulus};
+Experimental stimuli design: {experiment_design}
+Please return in JSON format.
+"""
+# ---- Agent 3 Prompt ----
+AGENT_3_PROMPT_TEMPLATE = """\
+Please rate the following STIMULUS based on the Experimental stimuli design provided for a psychological experiment:
+STIMULUS: {valid_stimulus}
+Experimental stimuli design: {experiment_design}
+SCORING REQUIREMENTS:
+{scoring_requirements}
+Please return in JSON format including the score for each dimension within the specified ranges.
+"""
+# ---- Agent 1 Stimulus Schema ----
+AGENT_1_PROPERTIES = {}
+# ---- Agent 2 Validation Result Schema ----
+AGENT_2_PROPERTIES = {}
+# ---- Agent 3 Scoring Result Schema ----
+AGENT_3_PROPERTIES = {}
+# ======================
+# 2. Abstract Model Client Interface
+# ======================
+class ModelClient(ABC):
+    """Abstract base class for model clients"""
+    @abstractmethod
+    def generate_completion(self, prompt, properties, params=None):
+        """Generate a completion with JSON schema response format"""
+        pass
+    @abstractmethod
+    def get_default_params(self):
+        """Get default parameters for this model"""
+        pass
+# ======================
+# 3. Concrete Model Client Implementations
+# ======================
+class OpenAIClient(ModelClient):
+    """OpenAI GPT model client"""
+    def __init__(self, api_key=None):
+        self.api_key = api_key
+        if api_key:
+            openai.api_key = api_key
+            print("OpenAI API key configured successfully")
+        else:
+            print("Warning: No OpenAI API key provided!")
+    def _api_call(self, prompt, properties, params, api_key):
+        """API call function, will be called by multiprocessing"""
+        # set API key in subprocess
+        openai.api_key = api_key
+        return openai.ChatCompletion.create(
+            model=params["model"],
+            messages=[{"role": "user", "content": prompt}],
+            response_format={
+                "type": "json_schema",
+                "json_schema": {
+                    "name": "response_schema",
+                    "schema": {
+                        "type": "object",
+                        "properties": properties,
+                        "required": list(properties.keys()),
+                        "additionalProperties": False
+                    }
+                }
+            }
+        )
+    def generate_completion(self, prompt, properties, params=None):
+        """Generate completion using OpenAI API"""
+        if params is None:
+            params = self.get_default_params()
+        # retry mechanism
+        for attempt in range(3):
+            try:
+                response = call_with_timeout(
+                    self._api_call, (prompt, properties, params, self.api_key), {}, 60)
+                if isinstance(response, dict) and "error" in response:
+                    print(f"OpenAI API timeout attempt {attempt + 1}/3")
+                    if attempt == 2:  # last attempt
+                        return {"error": "API timeout after 3 attempts"}
+                    time.sleep(2 ** attempt)  # exponential backoff
+                    continue
+                return json.loads(response['choices'][0]['message']['content'])
+            except json.JSONDecodeError as e:
+                print(f"Failed to parse OpenAI JSON response: {e}")
+                return {"error": f"Failed to parse response: {str(e)}"}
+            except (openai.error.APIError, openai.error.RateLimitError) as e:
+                print(f"OpenAI API error attempt {attempt + 1}/3: {e}")
+                if attempt == 2:
+                    return {"error": f"OpenAI API error after 3 attempts: {str(e)}"}
+                time.sleep(2 ** attempt)
+            except openai.error.AuthenticationError as e:
+                print(f"OpenAI authentication error: {e}")
+                return {"error": f"Authentication failed: {str(e)}"}
+            except openai.error.InvalidRequestError as e:
+                print(f"OpenAI invalid request: {e}")
+                return {"error": f"Invalid request: {str(e)}"}
+    def get_default_params(self):
+        return {"model": "gpt-4o"}
+# class HuggingFaceClient(ModelClient):
+#     """Hugging Face model client"""
+#     def __init__(self, api_key):
+#         self.api_key = api_key
+#     def _api_call(self, messages, response_format, params):
+#         """API call function that will be called by multiprocessing"""
+#         client = InferenceClient(
+#             params["model"],
+#             token=self.api_key,
+#             headers={"x-use-cache": "false"}
+#         )
+#         return client.chat_completion(
+#             messages=messages,
+#             response_format=response_format,
+#             max_tokens=params.get("max_tokens", 1000),
+#             temperature=params.get("temperature", 0.7)
+#         )
+#     def generate_completion(self, prompt, properties, params=None):
+#         """Generate completion using Hugging Face API"""
+#         if params is None:
+#             params = self.get_default_params()
+#         response_format = {
+#             "type": "json_schema",
+#             "json_schema": {
+#                 "name": "response_schema",
+#                 "schema": {
+#                     "type": "object",
+#                     "properties": properties,
+#                     "required": list(properties.keys()),
+#                     "additionalProperties": False
+#                 }
+#             }
+#         }
+#         messages = [{"role": "user", "content": prompt}]
+#         # Retry mechanism
+#         for attempt in range(3):
+#             try:
+#                 response = call_with_timeout(
+#                     self._api_call, (messages, response_format, params), {}, 60)
+#                 if isinstance(response, dict) and "error" in response:
+#                     print(f"HuggingFace API timeout attempt {attempt + 1}/3")
+#                     if attempt == 2:
+#                         return {"error": "API timeout after 3 attempts"}
+#                     time.sleep(2 ** attempt)
+#                     continue
+#                 content = response.choices[0].message.content
+#                 return json.loads(content)
+#             except (json.JSONDecodeError, AttributeError, IndexError) as e:
+#                 print(f"Failed to parse HuggingFace JSON response: {e}")
+#                 return {"error": "Failed to parse response"}
+#             except Exception as e:
+#                 print(f"HuggingFace API error attempt {attempt + 1}/3: {e}")
+#                 if attempt == 2:
+#                     return {"error": f"API error after 3 attempts: {str(e)}"}
+#                 time.sleep(2 ** attempt)
+#     def get_default_params(self):
+#         return {
+#             "model": "meta-llama/Llama-3.3-70B-Instruct",
+#         }
+class CustomModelClient(ModelClient):
+    """Custom model client for user-defined APIs"""
+    def __init__(self, api_url, api_key, model_name):
+        self.api_url = api_url
+        self.api_key = api_key
+        self.model_name = model_name
+    def _api_call(self, request_data, headers):
+        """API call function, will be called by multiprocessing"""
+        try:
+            response = requests.post(
+                self.api_url,
+                headers=headers,
+                json=request_data,
+                timeout=60  # timeout for requests
+            )
+            response.raise_for_status()
+            return response.json()
+        except Timeout:
+            raise Timeout(
+                f"Request to {self.api_url} timed out after 60 seconds")
+        except RequestsConnectionError as e:
+            raise RequestsConnectionError(
+                f"Failed to connect to {self.api_url}: {str(e)}")
+        except RequestException as e:
+            raise RequestException(f"Request failed: {str(e)}")
+    def generate_completion(self, prompt, properties, params=None):
+        is_deepseek = self.api_url.strip().startswith("https://api.deepseek.com")
+        if is_deepseek:
+            rand_stamp = int(time.time())
+            # Generate field list
+            field_list = ', '.join([f'"{k}"' for k in properties.keys()])
+            # Determine agent type
+            # If starts with "Please verify the following NEW STIMULUS ", then return at the end of prompt, each field can only return boolean value
+            if prompt.strip().startswith("Please verify the following NEW STIMULUS"):
+                prompt = prompt.rstrip() + \
+                    f"\nPlease return in strict JSON format, fields must include: {field_list}, requirements for each field are as follows: {properties}, each field can only return boolean values (True/False)"
+            elif prompt.strip().startswith("Please rate the following STIMULUS"):
+                prompt = prompt.rstrip() + \
+                    f"\nPlease return in strict JSON format, fields must include: {field_list}, requirements for each field are as follows: {properties}, each field can only return numbers"
+            else:
+                prompt = prompt.rstrip() + \
+                    f"\nPlease return in strict JSON format, fields must include: {field_list}, requirements for each field are as follows: {properties}"
+            request_data = {
+                "model": self.model_name,
+                "messages": [
+                    {"role": "system", "content": f"RAND:{rand_stamp}"},
+                    {"role": "user", "content": prompt}
+                ],
+                "stream": False,
+                "response_format": {"type": "json_object"}
+            }
+        else:
+            # build base request
+            request_data = {
+                "model": self.model_name,
+                "messages": [{"role": "user", "content": prompt}],
+                "stream": False,
+                "response_format": {
+                    "type": "json_schema",
+                    "json_schema": {
+                        "name": "response_schema",
+                        "schema": {
+                            "type": "object",
+                            "properties": properties,
+                            "required": list(properties.keys()),
+                            "additionalProperties": False
+                        }
+                    }
+                }
+            }
+        if params is not None:
+            request_data.update(params)
+        headers = {
+            "Authorization": f"Bearer {self.api_key}",
+            "Content-Type": "application/json"
+        }
+        # retry mechanism
+        for attempt in range(3):
+            try:
+                print("Sending request to Custom API with:",
+                      json.dumps(request_data, indent=2))
+                result = call_with_timeout(
+                    self._api_call, (request_data, headers), {}, 600)
+                if isinstance(result, dict) and "error" in result:
+                    print(f"Custom API timeout attempt {attempt + 1}/3")
+                    if attempt == 2:
+                        return {"error": "API timeout after 3 attempts"}
+                    time.sleep(2 ** attempt)
+                    continue
+                print("Response from Custom API:",
+                      json.dumps(result, indent=2))
+                content = result["choices"][0]["message"]["content"]
+                return json.loads(content)
+            except json.JSONDecodeError as e:
+                print(
+                    f"Custom API JSON parsing error attempt {attempt + 1}/3: {e}")
+                if attempt == 2:
+                    return {"error": f"API JSON parsing error after 3 attempts: {str(e)}"}
+                time.sleep(2 ** attempt)
+            except KeyError as e:
+                print(
+                    f"Custom API response missing expected key attempt {attempt + 1}/3: {e}")
+                if attempt == 2:
+                    return {"error": f"API response missing expected key after 3 attempts: {str(e)}"}
+                time.sleep(2 ** attempt)
+            except (Timeout, RequestsConnectionError) as e:
+                print(
+                    f"Custom API connection error attempt {attempt + 1}/3: {e}")
+                if attempt == 2:
+                    return {"error": f"API connection error after 3 attempts: {str(e)}"}
+                time.sleep(2 ** attempt)
+            except RequestException as e:
+                print(f"Custom API request error attempt {attempt + 1}/3: {e}")
+                if attempt == 2:
+                    return {"error": f"API request error after 3 attempts: {str(e)}"}
+                time.sleep(2 ** attempt)
+    def get_default_params(self):
+        return {
+        }
+# ======================
+# 4. Model Client Factory
+# ======================
+def create_model_client(model_choice, settings=None):
+    """Factory function to create appropriate model client"""
+    if model_choice == 'GPT-4o':
+        api_key = settings.get('api_key') if settings else None
+        return OpenAIClient(api_key)
+    elif model_choice == 'custom':
+        if not settings:
+            raise ValueError("Settings required for custom model")
+        return CustomModelClient(
+            api_url=settings.get('apiUrl'),
+            api_key=settings.get('api_key'),
+            model_name=settings.get('modelName')
+        )
+    # elif model_choice == 'HuggingFace':
+    #     api_key = settings.get('api_key')
+    #     return HuggingFaceClient(api_key)
+    else:
+        raise ValueError(f"Unsupported model choice: {model_choice}")
+# ======================
+# 5. Unified Agent Functions
+# ======================
+def check_stimulus_repetition(new_stimulus_dict, previous_stimuli_list):
+    """
+    If the value of any key (dimension) in new_stimulus_dict is exactly the same as the corresponding value in any stimulus in previous_stimuli_list, it is considered a repetition.
+    """
+    for existing_stimulus in previous_stimuli_list:
+        for key, new_value in new_stimulus_dict.items():
+            # If the key exists in existing_stimulus and the values are the same, it is considered a repetition
+            if key in existing_stimulus:
+                try:
+                    existing_val = str(existing_stimulus[key]).lower()
+                    new_val = str(new_value).lower()
+                    if existing_val == new_val:
+                        return True
+                except (AttributeError, TypeError):
+                    # Skip comparison if values can't be converted to string
+                    continue
+    return False
+def agent_1_generate_stimulus(
+        model_client,
+        experiment_design,
+        previous_stimuli,
+        properties,
+        rejected_stimuli=None,
+        prompt_template=AGENT_1_PROMPT_TEMPLATE,
+        params=None,
+        stop_event=None):
+    """
+    Agent 1: Generate new stimulus using the provided model client
+    """
+    if stop_event and stop_event.is_set():
+        print("Generation stopped by user in agent_1_generate_stimulus.")
+        return {"stimulus": "STOPPED"}
+    # Use fixed generation_requirements
+    generation_requirements = "5. Follow the same JSON format as the existing stimuli."
+    if rejected_stimuli is None:
+        rejected_stimuli = []
+    prompt = prompt_template.format(
+        experiment_design=experiment_design,
+        previous_stimuli=previous_stimuli,
+        rejected_stimuli=rejected_stimuli,
+        generation_requirements=generation_requirements
+    )
+    try:
+        result = model_client.generate_completion(prompt, properties, params)
+        # Check stop event again
+        if stop_event and stop_event.is_set():
+            print(
+                "Generation stopped by user after API call in agent_1_generate_stimulus.")
+            return {"stimulus": "STOPPED"}
+        if "error" in result:
+            return {"stimulus": "ERROR/ERROR"}
+        return result
+    except (json.JSONDecodeError, KeyError, TypeError) as e:
+        print(f"Error parsing response in agent_1_generate_stimulus: {e}")
+        return {"stimulus": "ERROR/ERROR"}
+    except (RequestException, Timeout) as e:
+        print(f"Network error in agent_1_generate_stimulus: {e}")
+        return {"stimulus": "ERROR/ERROR"}
+def agent_2_validate_stimulus(
+        model_client,
+        new_stimulus,
+        experiment_design,
+        properties,
+        prompt_template=AGENT_2_PROMPT_TEMPLATE,
+        stop_event=None):
+    """
+    Agent 2: Validate experimental stimulus using the provided model client
+    """
+    if stop_event and stop_event.is_set():
+        print("Generation stopped by user in agent_2_validate_stimulus.")
+        return {"error": "Stopped by user"}
+    prompt = prompt_template.format(
+        experiment_design=experiment_design,
+        new_stimulus=new_stimulus
+    )
+    try:
+        # use temperature=0 parameter, get model-specific default params and override temperature
+        fixed_params = model_client.get_default_params()
+        fixed_params["temperature"] = 0
+        result = model_client.generate_completion(
+            prompt, properties, fixed_params)
+        print("Agent 2 Output:", result)
+        # Check stop event again
+        if stop_event and stop_event.is_set():
+            print(
+                "Generation stopped by user after API call in agent_2_validate_stimulus.")
+            return {"error": "Stopped by user"}
+        if "error" in result:
+            print(f"Agent 2 API error: {result}")
+            return {"error": f"Failed to validate stimulus: {result.get('error', 'Unknown error')}"}
+        return result
+    except (json.JSONDecodeError, KeyError, TypeError) as e:
+        print(f"Error parsing validation response: {e}")
+        return {"error": f"Failed to parse validation response: {str(e)}"}
+    except (RequestException, Timeout) as e:
+        print(f"Network error in validation: {e}")
+        return {"error": f"Network error during validation: {str(e)}"}
+def agent_2_validate_stimulus_individual(
+        model_client,
+        new_stimulus,
+        experiment_design,
+        properties,
+        prompt_template=AGENT_2_PROMPT_TEMPLATE,
+        stop_event=None,
+        websocket_callback=None):
+    """
+    Agent 2: Validate experimental stimulus by checking each criterion individually
+    """
+    if stop_event and stop_event.is_set():
+        print("Generation stopped by user in agent_2_validate_stimulus_individual.")
+        return {"error": "Stopped by user"}
+    validation_results = {}
+    # Create individual prompt template for each criterion
+    individual_prompt_template = """\
+Please verify the following NEW STIMULUS with utmost precision for the specific criterion mentioned below.
+NEW STIMULUS: {new_stimulus}
+Experimental stimuli design: {experiment_design}
+SPECIFIC CRITERION TO VALIDATE:
+Property: {property_name}
+Description: {property_description}
+Please return in JSON format with only one field: "{property_name}" (boolean: true if criterion is met, false otherwise).
+"""
+    try:
+        total_criteria = len(properties)
+        current_criterion = 0
+        for property_name, property_description in properties.items():
+            current_criterion += 1
+            if stop_event and stop_event.is_set():
+                print(
+                    f"Generation stopped by user while validating {property_name}.")
+                return {"error": "Stopped by user"}
+            if websocket_callback:
+                websocket_callback(
+                    "validator", f"Validating criterion {current_criterion}/{total_criteria}: {property_name}")
+            # Create prompt for individual criterion
+            prompt = individual_prompt_template.format(
+                new_stimulus=new_stimulus,
+                experiment_design=experiment_design,
+                property_name=property_name,
+                property_description=property_description
+            )
+            # Create properties dict with single criterion
+            single_property = {property_name: property_description}
+            # Get model-specific default params and override temperature
+            fixed_params = model_client.get_default_params()
+            fixed_params["temperature"] = 0
+            result = model_client.generate_completion(
+                prompt, single_property, fixed_params)
+            print(f"Agent 2 Individual Validation - {property_name}: {result}")
+            if "error" in result:
+                print(
+                    f"Agent 2 Individual API error for {property_name}: {result}")
+                if websocket_callback:
+                    websocket_callback(
+                        "validator", f"Error validating criterion {property_name}: {result.get('error', 'Unknown error')}")
+                return {"error": f"Failed to validate criterion {property_name}: {result.get('error', 'Unknown error')}"}
+            # Extract the validation result for this criterion
+            if property_name in result:
+                validation_results[property_name] = result[property_name]
+                status = "PASSED" if result[property_name] else "FAILED"
+                if websocket_callback:
+                    websocket_callback(
+                        "validator", f"Criterion {property_name}: {status}")
+                # Early stop: if any criterion fails, immediately reject
+                if not result[property_name]:
+                    if websocket_callback:
+                        websocket_callback(
+                            "validator", f"Early rejection: Criterion {property_name} failed. Stopping validation.")
+                    print(
+                        f"Agent 2 Individual Validation - Early stop: {property_name} failed")
+                    return validation_results
+            else:
+                print(
+                    f"Warning: {property_name} not found in result, assuming False")
+                validation_results[property_name] = False
+                if websocket_callback:
+                    websocket_callback(
+                        "validator", f"Criterion {property_name}: FAILED (parsing error)")
+                    websocket_callback(
+                        "validator", f"Early rejection: Criterion {property_name} failed. Stopping validation.")
+                print(
+                    f"Agent 2 Individual Validation - Early stop: {property_name} failed (parsing error)")
+                return validation_results
+        print("Agent 2 Individual Validation - All Results:", validation_results)
+        if websocket_callback:
+            websocket_callback(
+                "validator", "All criteria passed successfully!")
+        return validation_results
+    except (json.JSONDecodeError, KeyError, TypeError) as e:
+        print(f"Error parsing individual validation response: {e}")
+        return {"error": f"Failed to parse validation response: {str(e)}"}
+    except (RequestException, Timeout) as e:
+        print(f"Network error in individual validation: {e}")
+        return {"error": f"Network error during validation: {str(e)}"}
+def generate_scoring_requirements(properties):
+    """
+    Generate scoring requirements text from properties dictionary
+    """
+    if not properties:
+        return "No specific scoring requirements provided."
+    requirements = []
+    for aspect_name, aspect_details in properties.items():
+        min_score = aspect_details.get('minimum', 0)
+        max_score = aspect_details.get('maximum', 10)
+        description = aspect_details.get('description', aspect_name)
+        requirements.append(
+            f"- {aspect_name}: {description} (Score range: {min_score} to {max_score})")
+    return "\n".join(requirements)
+def agent_3_score_stimulus(
+        model_client,
+        valid_stimulus,
+        experiment_design,
+        properties,
+        prompt_template=AGENT_3_PROMPT_TEMPLATE,
+        stop_event=None):
+    """
+    Agent 3: Score experimental stimulus using the provided model client
+    """
+    if stop_event and stop_event.is_set():
+        print("Generation stopped by user after API call in agent_3_score_stimulus.")
+        return {field: 0 for field in properties.keys()} if properties else {}
+    # Generate scoring requirements text
+    scoring_requirements = generate_scoring_requirements(properties)
+    prompt = prompt_template.format(
+        experiment_design=experiment_design,
+        valid_stimulus=valid_stimulus,
+        scoring_requirements=scoring_requirements
+    )
+    try:
+        # use temperature=0 parameter, get model-specific default params and override temperature
+        fixed_params = model_client.get_default_params()
+        fixed_params["temperature"] = 0
+        result = model_client.generate_completion(
+            prompt, properties, fixed_params)
+        if stop_event and stop_event.is_set():
+            print("Generation stopped by user after API call in agent_3_score_stimulus.")
+            return {field: 0 for field in properties.keys()} if properties else {}
+        if "error" in result:
+            print(f"Agent 3 API error: {result}")
+            return {field: 0 for field in properties.keys()}
+        return result
+    except (json.JSONDecodeError, KeyError, TypeError) as e:
+        print(f"Error parsing scoring response: {e}")
+        return {field: 0 for field in properties.keys()}
+    except (RequestException, Timeout) as e:
+        print(f"Network error in scoring: {e}")
+        return {field: 0 for field in properties.keys()}
+def agent_3_score_stimulus_individual(
+        model_client,
+        valid_stimulus,
+        experiment_design,
+        properties,
+        prompt_template=AGENT_3_PROMPT_TEMPLATE,
+        stop_event=None,
+        websocket_callback=None):
+    """
+    Agent 3: Score experimental stimulus by evaluating each aspect individually
+    """
+    if stop_event and stop_event.is_set():
+        print("Generation stopped by user in agent_3_score_stimulus_individual.")
+        return {field: 0 for field in properties.keys()} if properties else {}
+    scoring_results = {}
+    # Create individual prompt template for each aspect
+    individual_prompt_template = """\
+Please rate the following STIMULUS based on the specific aspect mentioned below for a psychological experiment:
+STIMULUS: {valid_stimulus}
+Experimental stimuli design: {experiment_design}
+SPECIFIC ASPECT TO SCORE:
+- Aspect Name: {aspect_name}
+- Description: {aspect_description}
+- Minimum Score: {min_score}
+- Maximum Score: {max_score}
+- Score Range: You must provide an integer score between {min_score} and {max_score} (inclusive)
+SCORING INSTRUCTIONS:
+Rate this stimulus on the "{aspect_name}" dimension based on the provided description. Your score should reflect how well the stimulus meets this criterion, with {min_score} being the lowest possible score and {max_score} being the highest possible score.
+Please return in JSON format with only one field: "{aspect_name}" (integer score within the specified range {min_score}-{max_score}).
+"""
+    try:
+        total_aspects = len(properties)
+        current_aspect = 0
+        for aspect_name, aspect_details in properties.items():
+            current_aspect += 1
+            if stop_event and stop_event.is_set():
+                print(
+                    f"Generation stopped by user while scoring {aspect_name}.")
+                return {field: 0 for field in properties.keys()}
+            if websocket_callback:
+                websocket_callback(
+                    "scorer", f"Evaluating aspect {current_aspect}/{total_aspects}: {aspect_name}")
+            # Extract min and max scores from aspect details
+            min_score = aspect_details.get('minimum', 0)
+            max_score = aspect_details.get('maximum', 10)
+            description = aspect_details.get('description', aspect_name)
+            # Create prompt for individual aspect
+            prompt = individual_prompt_template.format(
+                valid_stimulus=valid_stimulus,
+                experiment_design=experiment_design,
+                aspect_name=aspect_name,
+                aspect_description=description,
+                min_score=min_score,
+                max_score=max_score
+            )
+            # Create properties dict with single aspect (include all details for JSON schema)
+            single_aspect = {aspect_name: {
+                'type': 'integer',
+                'description': description,
+                'minimum': min_score,
+                'maximum': max_score
+            }}
+            # Get model-specific default params and override temperature
+            fixed_params = model_client.get_default_params()
+            fixed_params["temperature"] = 0
+            result = model_client.generate_completion(
+                prompt, single_aspect, fixed_params)
+            print(f"Agent 3 Individual Scoring - {aspect_name}: {result}")
+            if "error" in result:
+                print(
+                    f"Agent 3 Individual API error for {aspect_name}: {result}")
+                if websocket_callback:
+                    websocket_callback(
+                        "scorer", f"Error scoring aspect {aspect_name}: {result.get('error', 'Unknown error')}")
+                scoring_results[aspect_name] = 0
+                continue
+            # Extract the scoring result for this aspect
+            if aspect_name in result:
+                score = result[aspect_name]
+                # Ensure score is within valid range
+                if isinstance(score, (int, float)):
+                    score = max(min_score, min(max_score, int(score)))
+                    scoring_results[aspect_name] = score
+                    if websocket_callback:
+                        websocket_callback(
+                            "scorer", f"Aspect {aspect_name}: {score}/{max_score}")
+                else:
+                    print(
+                        f"Warning: Invalid score for {aspect_name}, assuming 0")
+                    scoring_results[aspect_name] = 0
+                    if websocket_callback:
+                        websocket_callback(
+                            "scorer", f"Aspect {aspect_name}: 0/{max_score} (invalid response)")
+            else:
+                print(
+                    f"Warning: {aspect_name} not found in result, assuming 0")
+                scoring_results[aspect_name] = 0
+                if websocket_callback:
+                    websocket_callback(
+                        "scorer", f"Aspect {aspect_name}: 0/{max_score} (parsing error)")
+        print("Agent 3 Individual Scoring - All Results:", scoring_results)
+        if websocket_callback:
+            total_score = sum(scoring_results.values())
+            max_possible = sum(aspect_details.get('maximum', 10)
+                               for aspect_details in properties.values())
+            websocket_callback(
+                "scorer", f"Individual scoring completed! Total: {total_score}/{max_possible}")
+        return scoring_results
+    except (json.JSONDecodeError, KeyError, TypeError) as e:
+        print(f"Error parsing individual scoring response: {e}")
+        return {field: 0 for field in properties.keys()}
+    except (RequestException, Timeout) as e:
+        print(f"Network error in individual scoring: {e}")
+        return {field: 0 for field in properties.keys()}
+# ======================
+# 6. Main Flow Function
+# ======================
+def generate_stimuli(settings):
+    stop_event = settings['stop_event']
+    current_iteration = settings['current_iteration']
+    total_iterations = settings['total_iterations']
+    experiment_design = settings['experiment_design']
+    previous_stimuli = settings['previous_stimuli'] if settings['previous_stimuli'] else [
+    ]
+    model_choice = settings.get('model_choice', 'GPT-4o')
+    ablation = settings.get('ablation', {
+        "use_agent_2": True,
+        "use_agent_3": True
+    })
+    repetition_count = 0
+    validation_fails = 0
+    # Get custom parameters for custom model
+    custom_params = settings.get('params', None)
+    # Get session_update_callback function and websocket_callback function
+    session_update_callback = settings.get('session_update_callback')
+    websocket_callback = settings.get('websocket_callback')
+    # Ensure progress value is correctly initialized
+    with current_iteration.get_lock(), total_iterations.get_lock():
+        current_iteration.value = 0
+        total_iterations.value = settings['iteration']
+        # Immediately send correct initial progress
+        if session_update_callback:
+            session_update_callback()
+    # Check stop event at each critical point
+    def check_stop(message="Generation stopped by user."):
+        if stop_event.is_set():
+            print(message)
+            if websocket_callback:
+                websocket_callback("all", message)
+            return True
+        return False
+    # Helper function to create partial result when error or stop occurs
+    def create_partial_result(record_list, message, is_error=True):
+        nonlocal total_iterations
+        if len(record_list) > 0:
+            df = pd.DataFrame(record_list)
+            session_id = settings.get('session_id', 'default')
+            timestamp = int(time.time())
+            unique_id = ''.join(random.choice('0123456789abcdef')
+                                for _ in range(6))
+            suffix = "_partial" if is_error else "_stopped"
+            suggested_filename = f"experiment_stimuli_results_{session_id}_{timestamp}_{unique_id}{suffix}.csv"
+            df['generation_timestamp'] = timestamp
+            df['batch_id'] = unique_id
+            df['total_iterations'] = total_iterations.value
+            df['stopped_by_user'] = not is_error
+            df['error_occurred'] = is_error
+            df['message'] = message
+            df['completed_iterations'] = len(record_list)
+            os.makedirs("outputs", exist_ok=True)
+            suggested_filename = os.path.join("outputs", suggested_filename)
+            return df, suggested_filename
+        return None, None
+    # Helper function to check stop and return partial data if available
+    def check_stop_and_return(message="Generation stopped by user."):
+        if stop_event.is_set():
+            print(message)
+            if websocket_callback:
+                websocket_callback("all", message)
+            return True, create_partial_result(record_list, message, is_error=False)
+        return False, (None, None)
+    # Immediately check if stopped
+    if check_stop("Generation stopped before starting."):
+        return None, None
+    record_list = []
+    rejected_stimuli_memory = []
+    agent_1_properties = settings.get('agent_1_properties', {})
+    print("Agent 1 Properties:", agent_1_properties)
+    if websocket_callback:
+        websocket_callback(
+            "setup", f"Agent 1 Properties: {agent_1_properties}")
+    if check_stop():
+        return None, None
+    agent_2_properties = settings.get('agent_2_properties', {})
+    print("Agent 2 Properties:", agent_2_properties)
+    if websocket_callback:
+        websocket_callback(
+            "setup", f"Agent 2 Properties: {agent_2_properties}")
+    if check_stop():
+        return None, None
+    agent_3_properties = settings.get('agent_3_properties', {})
+    print("Agent 3 Properties:", agent_3_properties)
+    if websocket_callback:
+        websocket_callback(
+            "setup", f"Agent 3 Properties: {agent_3_properties}")
+    if check_stop():
+        return None, None
+    # Create model client using factory
+    try:
+        model_client = create_model_client(model_choice, settings)
+        print(f"Using model: {model_choice}")
+        if websocket_callback:
+            websocket_callback("setup", f"Using model: {model_choice}")
+    except Exception as e:
+        error_msg = f"Failed to create model client: {str(e)}"
+        print(error_msg)
+        if websocket_callback:
+            websocket_callback("setup", error_msg)
+        return None, None
+    if check_stop():
+        return None, None
+    # Create a function specifically for updating progress
+    def update_progress(completed_iterations):
+        if check_stop():
+            return
+        with current_iteration.get_lock(), total_iterations.get_lock():
+            current_value = min(completed_iterations, total_iterations.value)
+            if current_value > current_iteration.value:
+                current_iteration.value = current_value
+                if session_update_callback:
+                    session_update_callback()
+    # Get actual total iterations
+    total_iter_value = total_iterations.value
+    for iteration_num in range(total_iter_value):
+        stopped, partial_result = check_stop_and_return()
+        if stopped:
+            return partial_result
+        round_message = f"=== No. {iteration_num + 1} Round ==="
+        print(round_message)
+        if websocket_callback:
+            websocket_callback("all", round_message)
+        # Step 1: Generate stimulus
+        current_retry_count = 0  # Retry counter for this iteration
+        while True:
+            stopped, partial_result = check_stop_and_return()
+            if stopped:
+                return partial_result
+            try:
+                stimuli = agent_1_generate_stimulus(
+                    model_client=model_client,
+                    experiment_design=experiment_design,
+                    previous_stimuli=previous_stimuli,
+                    properties=agent_1_properties,
+                    rejected_stimuli=rejected_stimuli_memory,
+                    prompt_template=AGENT_1_PROMPT_TEMPLATE,
+                    params=custom_params,
+                    stop_event=stop_event
+                )
+                if isinstance(stimuli, dict) and stimuli.get('stimulus') == 'STOPPED':
+                    stopped, partial_result = check_stop_and_return(
+                        "Generation stopped after 'Generator'.")
+                    if stopped:
+                        return partial_result
+                # Skip validation if Agent 1 returned an error
+                if isinstance(stimuli, dict) and stimuli.get('stimulus') == 'ERROR/ERROR':
+                    print("Agent 1 returned ERROR, regenerating...")
+                    if websocket_callback:
+                        websocket_callback(
+                            "generator", "Generator returned ERROR, regenerating...")
+                    continue
+                print("Agent 1 Output:", stimuli)
+                if websocket_callback:
+                    websocket_callback(
+                        "generator", f"Generator's Output: {json.dumps(stimuli, indent=2)}")
+                stopped, partial_result = check_stop_and_return(
+                    "Generation stopped after 'Generator'.")
+                if stopped:
+                    return partial_result
+                # Step 1.5: Check if stimulus already exists
+                if check_stimulus_repetition(stimuli, previous_stimuli):
+                    repetition_count += 1
+                    current_retry_count += 1
+                    # Add retry limit to avoid infinite loops (but never accept duplicates)
+                    max_repetition_retries = 50
+                    if current_retry_count > max_repetition_retries:
+                        error_msg = f"Failed to generate unique stimulus after {max_repetition_retries} attempts. Consider adjusting experiment design or reducing target count."
+                        print(error_msg)
+                        if websocket_callback:
+                            websocket_callback("generator", error_msg)
+                        # Return partial results instead of raising exception
+                        return create_partial_result(record_list, error_msg)
+                    if ablation["use_agent_2"]:
+                        print("Detected repeated stimulus, regenerating...")
+                        if websocket_callback:
+                            websocket_callback(
+                                "generator", "Detected repeated stimulus, regenerating...")
+                        continue
+                    else:
+                        print(
+                            "Ablation: Skipping Agent 2 (Repetition Check)")
+                        if websocket_callback:
+                            websocket_callback(
+                                "generator", "Ablation: Skipping Agent 2 (Repetition Check)")
+                stopped, partial_result = check_stop_and_return()
+                if stopped:
+                    return partial_result
+                # Step 2: Validate stimulus
+                # Check if individual validation is enabled
+                individual_validation = settings.get(
+                    'agent_2_individual_validation', False)
+                if individual_validation:
+                    if websocket_callback:
+                        websocket_callback(
+                            "validator", f"Using individual validation mode - checking {len(agent_2_properties)} criteria...")
+                    validation_result = agent_2_validate_stimulus_individual(
+                        model_client=model_client,
+                        new_stimulus=stimuli,
+                        experiment_design=experiment_design,
+                        properties=agent_2_properties,
+                        stop_event=stop_event,
+                        websocket_callback=websocket_callback
+                    )
+                else:
+                    if websocket_callback:
+                        websocket_callback(
+                            "validator", "Using batch validation mode...")
+                    validation_result = agent_2_validate_stimulus(
+                        model_client=model_client,
+                        new_stimulus=stimuli,
+                        experiment_design=experiment_design,
+                        properties=agent_2_properties,
+                        prompt_template=AGENT_2_PROMPT_TEMPLATE,
+                        stop_event=stop_event
+                    )
+                if isinstance(validation_result, dict) and validation_result.get('error') == 'Stopped by user':
+                    stopped, partial_result = check_stop_and_return(
+                        "Generation stopped after 'Validator'.")
+                    if stopped:
+                        return partial_result
+                print("Agent 2 Output:", validation_result)
+                if websocket_callback:
+                    websocket_callback(
+                        "validator", f"Validator's Output: {json.dumps(validation_result, indent=2)}")
+                stopped, partial_result = check_stop_and_return(
+                    "Generation stopped after 'Validator'.")
+                if stopped:
+                    return partial_result
+                # Check if there was an error first
+                if 'error' in validation_result:
+                    print(f"Validation error: {validation_result['error']}")
+                    if websocket_callback:
+                        websocket_callback(
+                            "validator", f"Validation error: {validation_result['error']}")
+                    continue  # Skip to next iteration
+                # Check validation fields
+                failed_fields = [
+                    key for key, value in validation_result.items() if not value]
+                if failed_fields:
+                    # Some fields failed validation
+                    validation_fails += 1
+                    current_retry_count += 1
+                    # Add to rejected memory (only if it's a valid stimulus, not an error)
+                    is_error_stimulus = (
+                        isinstance(stimuli, dict) and
+                        stimuli.get('stimulus') in ['ERROR/ERROR', 'STOPPED']
+                    )
+                    if not is_error_stimulus:
+                        rejected_item = {
+                            "stimulus": stimuli,
+                            "validation_result": validation_result,
+                            "failed_fields": failed_fields
+                        }
+                        rejected_stimuli_memory.append(rejected_item)
+                        # Limit memory size to prevent unbounded growth
+                        MAX_REJECTED_MEMORY = 20
+                        if len(rejected_stimuli_memory) > MAX_REJECTED_MEMORY:
+                            rejected_stimuli_memory = rejected_stimuli_memory[-MAX_REJECTED_MEMORY:]
+                    print(
+                        f"Failed validation for fields: {failed_fields}, regenerating...")
+                    if websocket_callback:
+                        websocket_callback(
+                            "validator", f"Failed validation for fields: {failed_fields}, regenerating...")
+                    # Check retry limit to avoid infinite loops
+                    max_retries = 50
+                    if current_retry_count > max_retries:
+                        error_msg = f"Failed to generate valid stimulus after {max_retries} attempts. Consider adjusting validation criteria."
+                        print(error_msg)
+                        if websocket_callback:
+                            websocket_callback("validator", error_msg)
+                        # Return partial results instead of raising exception
+                        return create_partial_result(record_list, error_msg)
+                    if ablation["use_agent_2"]:
+                        continue  # Regenerate
+                    else:
+                        print("Ablation: Skipping Agent 2 (Validation)")
+                        if websocket_callback:
+                            websocket_callback(
+                                "validator", "Ablation: Skipping Agent 2 (Validation)")
+                        update_progress(iteration_num + 1)
+                        break
+                else:
+                    # All validations passed
+                    print("All validations passed, proceeding to next step...")
+                    if websocket_callback:
+                        websocket_callback(
+                            "validator", "All validations passed, proceeding to next step...")
+                    update_progress(iteration_num + 1)
+                    break
+            except Exception as e:
+                error_msg = f"Error in generation/validation step: {str(e)}"
+                print(error_msg)
+                if websocket_callback:
+                    websocket_callback("all", error_msg)
+                if len(record_list) > 0:
+                    df = pd.DataFrame(record_list)
+                    session_id = settings.get('session_id', 'default')
+                    timestamp = int(time.time())
+                    unique_id = ''.join(random.choice(
+                        '0123456789abcdef') for _ in range(6))
+                    suggested_filename = f"experiment_stimuli_results_{session_id}_{timestamp}_{unique_id}_error.csv"
+                    df['generation_timestamp'] = timestamp
+                    df['batch_id'] = unique_id
+                    df['total_iterations'] = total_iter_value
+                    df['error_occurred'] = True
+                    df['error_message'] = str(e)
+                    os.makedirs("outputs", exist_ok=True)
+                    suggested_filename = os.path.join(
+                        "outputs", f"experiment_stimuli_results_{session_id}_{timestamp}_{unique_id}.csv")
+                    return df, suggested_filename
+                else:
+                    raise e
+        stopped, partial_result = check_stop_and_return(
+            "Generation stopped after 'Validator'.")
+        if stopped:
+            return partial_result
+        try:
+            stopped, partial_result = check_stop_and_return(
+                "Generation stopped before Scorer.")
+            if stopped:
+                return partial_result
+            # Step 3: Score
+            if ablation["use_agent_3"]:
+                # Check if individual scoring is enabled
+                individual_scoring = settings.get(
+                    'agent_3_individual_scoring', False)
+                if individual_scoring:
+                    if websocket_callback:
+                        websocket_callback(
+                            "scorer", f"Using individual scoring mode - evaluating {len(agent_3_properties)} aspects...")
+                    scores = agent_3_score_stimulus_individual(
+                        model_client=model_client,
+                        valid_stimulus=stimuli,
+                        experiment_design=experiment_design,
+                        properties=agent_3_properties,
+                        stop_event=stop_event,
+                        websocket_callback=websocket_callback
+                    )
+                else:
+                    if websocket_callback:
+                        websocket_callback(
+                            "scorer", "Using batch scoring mode...")
+                    scores = agent_3_score_stimulus(
+                        model_client=model_client,
+                        valid_stimulus=stimuli,
+                        experiment_design=experiment_design,
+                        properties=agent_3_properties,
+                        prompt_template=AGENT_3_PROMPT_TEMPLATE,
+                        stop_event=stop_event
+                    )
+                if isinstance(scores, dict) and all(v == 0 for v in scores.values()):
+                    if stop_event.is_set():
+                        stopped, partial_result = check_stop_and_return(
+                            "Generation stopped after 'Scorer'.")
+                        if stopped:
+                            return partial_result
+                print("Agent 3 Output:", scores)
+                if websocket_callback:
+                    websocket_callback(
+                        "scorer", f"Scorer's Output: {json.dumps(scores, indent=2)}")
+                stopped, partial_result = check_stop_and_return(
+                    "Generation stopped after 'Scorer'.")
+                if stopped:
+                    return partial_result
+            else:
+                print("Ablation: Skipping Agent 3 (Scoring)")
+                if websocket_callback:
+                    websocket_callback("scorer", "Ablation: Skipping Agent 3")
+            # Save results
+            record = {
+                "stimulus_id": iteration_num + 1,
+                "stimulus_content": stimuli,
+                "repetition_count": repetition_count,
+                "validation_fails": validation_fails,
+                "validation_failure_reasons": validation_result
+            }
+            if ablation["use_agent_3"]:
+                record.update(scores or {})
+            record_list.append(record)
+            # Update previous_stimuli
+            previous_stimuli.append(stimuli)
+            # If some records have been generated, create intermediate results
+            if (iteration_num + 1) % 5 == 0 or iteration_num + 1 == total_iter_value:
+                temp_df = pd.DataFrame(record_list)
+                session_id = settings.get('session_id', 'default')
+                timestamp = int(time.time())
+                unique_id = ''.join(random.choice('0123456789abcdef')
+                                    for _ in range(6))
+                suggested_filename = f"experiment_stimuli_results_{session_id}_{timestamp}_{unique_id}.csv"
+                temp_df['generation_timestamp'] = timestamp
+                temp_df['batch_id'] = unique_id
+                temp_df['total_iterations'] = total_iter_value
+                if check_stop():
+                    return temp_df, suggested_filename
+                if iteration_num + 1 == total_iter_value:
+                    update_progress(total_iter_value)
+                    return temp_df, suggested_filename
+        except Exception as e:
+            error_msg = f"Error in scoring step: {str(e)}"
+            print(error_msg)
+            if websocket_callback:
+                websocket_callback("all", error_msg)
+            if len(record_list) > 0:
+                df = pd.DataFrame(record_list)
+                session_id = settings.get('session_id', 'default')
+                timestamp = int(time.time())
+                unique_id = ''.join(random.choice('0123456789abcdef')
+                                    for _ in range(6))
+                suggested_filename = f"experiment_stimuli_results_{session_id}_{timestamp}_{unique_id}_error.csv"
+                df['generation_timestamp'] = timestamp
+                df['batch_id'] = unique_id
+                df['total_iterations'] = total_iter_value
+                df['error_occurred'] = True
+                df['error_message'] = str(e)
+                return df, suggested_filename
+            else:
+                raise e
+    # Check again if stopped at final step
+    if check_stop("Generation stopped at final step."):
+        if len(record_list) > 0:
+            df = pd.DataFrame(record_list)
+            session_id = settings.get('session_id', 'default')
+            timestamp = int(time.time())
+            unique_id = ''.join(random.choice('0123456789abcdef')
+                                for _ in range(6))
+            suggested_filename = f"experiment_stimuli_results_{session_id}_{timestamp}_{unique_id}.csv"
+            df['generation_timestamp'] = timestamp
+            df['batch_id'] = unique_id
+            df['total_iterations'] = total_iter_value
+            df['error_occurred'] = False
+            df['error_message'] = ""
+            completion_msg = f"Data generation completed for session {session_id}"
+            print(completion_msg)
+            if websocket_callback:
+                websocket_callback("all", completion_msg)
+            return df, suggested_filename
+        return None, None
+    # Only generate DataFrame and return results after all iterations
+    if len(record_list) > 0:
+        update_progress(total_iter_value)
+        df = pd.DataFrame(record_list)
+        session_id = settings.get('session_id', 'default')
+        timestamp = int(time.time())
+        unique_id = ''.join(random.choice('0123456789abcdef')
+                            for _ in range(6))
+        suggested_filename = f"experiment_stimuli_results_{session_id}_{timestamp}_{unique_id}.csv"
+        df['generation_timestamp'] = timestamp
+        df['batch_id'] = unique_id
+        df['total_iterations'] = total_iter_value
+        df['error_occurred'] = False
+        df['error_message'] = ""
+        completion_msg = f"Data generation completed for session {session_id}"
+        print(completion_msg)
+        if websocket_callback:
+            websocket_callback("all", completion_msg)
+        return df, suggested_filename
+    else:
+        print("No records generated.")
+        if websocket_callback:
+            websocket_callback("all", "No records generated.")
+        return None, None
+# ======================
+# 7. Legacy Support Function (maintain backward compatibility)
+# ======================
+def custom_model_inference_handler(session_id, prompt, model, api_url, api_key, params=None):
+    """Legacy function for backward compatibility"""
+    try:
+        client = CustomModelClient(api_url, api_key, model)
+        result = client.generate_completion(prompt, {}, params)
+        if "error" in result:
+            return {'error': result["error"]}, 500
+        return {'response': json.dumps(result)}, 200
+    except Exception as e:
+        return {'error': f'Unexpected error: {str(e)}'}, 500

index.html ADDED Viewed

	@@ -0,0 +1,414 @@

+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>Stimulus Generator</title>
+    <link rel="stylesheet" href="/static/styles.css">
+    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.0.0/css/all.min.css">
+    <script src="https://cdnjs.cloudflare.com/ajax/libs/socket.io/4.6.0/socket.io.min.js"></script>
+</head>
+<body>
+    <!-- Restart countdown timer -->
+    <div id="restart-countdown" class="restart-countdown" style="display: none;">
+        <div class="countdown-content">
+            <i class="fas fa-clock"></i>
+            <span id="countdown-time">20:00</span>
+            <span class="countdown-label">Restart countdown</span>
+        </div>
+    </div>
+    <div class="container">
+        <div class="header-container">
+            <h1>Stimulus Generator</h1>
+        </div>
+        <h2>Parameter Settings</h2>
+        <div class="pale-blue-section">
+            <p class="instruction-text">Fill in your experiment design and let the model generate customized stimulus
+                materials.</p>
+            <div class="form-group">
+                <div class="label-container">
+                    <label for="model_choice">Model</label>
+                    <div class="tooltip info-icon">
+                        <span class="info-icon-inner">i</span>
+                        <span class="tooltip-text">Select which language model to use for generation.</span>
+                    </div>
+                </div>
+                <select id="model_choice" name="model_choice" onchange="handleModelChange()">
+                    <option value="GPT-4o">GPT-4o</option>
+                    <option value="custom">Custom Model</option>
+                </select>
+            </div>
+            <div id="custom_model_config"
+                style="display: none; margin-top: 10px; padding: 15px; background-color: #f8f9fa; border-radius: 5px;">
+                <div class="form-group">
+                    <div class="label-container">
+                        <label for="custom_model_name">Model name</label>
+                        <div class="tooltip info-icon">
+                            <span class="info-icon-inner">i</span>
+                            <span class="tooltip-text">Enter the name of your custom model (e.g.,
+                                deepseek-ai/DeepSeek-V3-0324)</span>
+                        </div>
+                    </div>
+                    <input type="text" id="custom_model_name" name="custom_model_name"
+                        placeholder="e.g., deepseek-ai/DeepSeek-V3-0324">
+                </div>
+                <div class="form-group">
+                    <div class="label-container">
+                        <label for="custom_api_url">API URL</label>
+                        <div class="tooltip info-icon">
+                            <span class="info-icon-inner">i</span>
+                            <span class="tooltip-text">Enter the API endpoint URL for your custom model</span>
+                        </div>
+                    </div>
+                    <input type="text" id="custom_api_url" name="custom_api_url"
+                        placeholder="e.g., https://api.example.com/v1/chat/completions">
+                </div>
+                <div class="form-group">
+                    <div class="label-container">
+                        <label for="custom_params">Custom parameters (JSON)</label>
+                        <div class="tooltip info-icon">
+                            <span class="info-icon-inner">i</span>
+                            <span class="tooltip-text">Enter additional parameters in JSON format (e.g., {"max_tokens":
+                                2000, "temperature": 0.7})</span>
+                        </div>
+                    </div>
+                    <textarea id="custom_params" name="custom_params" rows="4"
+                        placeholder='{"max_tokens": 2000, "temperature": 0.7}'></textarea>
+                </div>
+            </div>
+            <div class="form-group">
+                <div class="label-container">
+                    <label for="api_key">API key</label>
+                    <div class="tooltip info-icon">
+                        <span class="info-icon-inner">i</span>
+                        <span class="tooltip-text">Enter your API key for the selected model.</span>
+                    </div>
+                </div>
+                <input type="text" id="api_key" placeholder="Enter your API Key">
+            </div>
+            <div class="spacing"></div>
+            <div class="form-group custom-example-group">
+                <div class="label-container">
+                    <label>Example stimuli</label>
+                    <div class="tooltip info-icon">
+                        <span class="info-icon-inner">i</span>
+                        <span class="tooltip-text">
+                            <strong>### What is this section for?</strong><br>
+                            This section provides <strong>example stimulus items</strong> for the Generator agent to
+                            learn from.<br>
+                            - The <strong>left column</strong> (Component) should match the component names you've
+                            defined above (e.g. <em>word pair</em>, <em>supportive context</em>, <em>neutral
+                                context</em>).<br>
+                            - The <strong>right column</strong> (Content) provides the actual example content for that
+                            component.<br>
+                            💡 Each row defines <strong>one component</strong> of the current item.<br>
+                            Click <strong>"Add item"</strong> to create a new example stimulus item with the same
+                            components.<br>
+                            ---<br>
+                            <strong>### Example:</strong><br>
+                            <strong>Item 1:</strong><br>
+                            - Component: <code>word pair</code> → Content: <code>TV / television</code><br>
+                            - Component: <code>supportive context</code> → Content:
+                            <code>She turned on the TV to watch the news.</code><br>
+                            - Component: <code>neutral context</code> → Content:
+                            <code>The TV was next to the window.</code><br>
+                            <strong>Item 2:</strong><br>
+                            ...<br>
+                            📌 You should add at least <strong>2–3 full example items</strong> for best results.
+                        </span>
+                    </div>
+                </div>
+                <p class="description-text">Add multiple example items to help the agent learn.</p>
+                <!-- Example table container -->
+                <div id="items-container">
+                    <!-- Item 1 -->
+                    <div class="item-container" id="item-1">
+                        <div class="item-title">Item 1</div>
+                        <table class="stimuli-table">
+                            <thead>
+                                <tr>
+                                    <th class="type-column">Components</th>
+                                    <th class="content-column">Content</th>
+                                </tr>
+                            </thead>
+                            <tbody>
+                                <tr>
+                                    <td class="type-column"><input type="text" placeholder="e.g. word pair"></td>
+                                    <td class="content-column"><input type="text" placeholder="e.g. math/mathematics">
+                                    </td>
+                                </tr>
+                            </tbody>
+                        </table>
+                        <div class="item-buttons-row example-buttons">
+                            <div class="left-buttons">
+                                <button class="add-component-btn">Add component</button>
+                                <button class="delete-component-btn">Delete component</button>
+                            </div>
+                            <div class="right-buttons">
+                                <button class="add-item-btn">Add item</button>
+                            </div>
+                        </div>
+                    </div>
+                </div>
+            </div>
+            <div class="form-group">
+                <div class="label-container">
+                    <label for="experiment_design">Stimulus design</label>
+                    <div class="tooltip info-icon">
+                        <span class="info-icon-inner">i</span>
+                        <span class="tooltip-text">
+                            <strong>### What is "Stimulus design"?</strong><br>
+                            This field defines the structure and logic of your experimental stimuli.<br>
+                            It helps the model understand what to generate and how to vary conditions.<br>
+                            ---<br>
+                            <strong>### What to include:</strong><br>
+                            ✅ Components<br>
+                            List the elements in each item (e.g. word pair, context sentence, target word)<br>
+                            ✅ Condition manipulation<br>
+                            Describe how the stimuli differ across conditions (e.g. supportive vs neutral)<br>
+                            ✅ Constraints (optional)<br>
+                            Mention any control rules (e.g. sentence length matching)<br>
+                            ---<br>
+                            <strong>### Example:</strong><br>
+                            - A word pair: a short word and its long form (e.g. TV – television)<br>
+                            - Two context sentences:<br>
+                            - Supportive context: strongly predicts the target word (e.g. She watches her favorite shows
+                            on the <strong>TV</strong>.)<br>
+                            - Neutral context: does not predict the target word (e.g. She placed the ball next to the
+                            <strong>TV</strong>)<br>
+                            Supportive and neutral contexts are matched for sentence length and structure.
+                        </span>
+                    </div>
+                </div>
+                <textarea id="experiment_design" placeholder="Describe the structure of each stimulus item:
+- Component 1: ...
+- Component 2: ...
+- Manipulation: ...
+💡 Click the info icon (ℹ️) to see a complete example."></textarea>
+                <!-- 添加"AutoGenerate properties"按钮 -->
+                <div class="button-container-spaced">
+                    <button id="auto_generate_button" class="auto-generate-btn">AutoGenerate properties</button>
+                </div>
+                <!-- 重启提醒文本 -->
+                <div class="restart-notice">* To ensure the server operates normally, the app will auto-restart periodically.</div>
+                <div class="restart-notice">* A countdown will appear in the top-left corner of this page twenty minutes prior to each restart.</div>
+            </div>
+        </div>
+        <h2>Agent Property Settings</h2>
+        <div class="pale-blue-section">
+            <div class="form-group">
+                <div class="label-container">
+                    <label><i class="fas fa-check-circle agent-icon validator-icon"></i> Validator</label>
+                    <div class="tooltip info-icon">
+                        <span class="info-icon-inner">i</span>
+                        <span class="tooltip-text">
+                            <strong>### What is a Validator?</strong><br>
+                            Validators define the <strong>mandatory requirements</strong> that each generated stimulus
+                            must meet.<br>
+                            ---<br>
+                            <strong>### How to use:</strong><br>
+                            - In the <strong>Properties</strong> column, define a short label (e.g.
+                            <code>IsSynonym</code>, <code>ContainsTargetWord</code>)<br>
+                            - In the <strong>Description</strong>, explain what this constraint means<br>
+                            ---<br>
+                            <strong>### Example:</strong><br>
+                            IsSynonym: Whether the two words in the word pair are synonyms<br>
+                            Predictability: Whether the supportive context can predict target word
+                        </span>
+                    </div>
+                </div>
+                <p class="description-text">Define the property name (left) and its validation logic (right). Help the
+                    Validator agent to filter out unacceptable items.</p>
+                <table id="agent2PropertiesTable">
+                    <thead>
+                        <tr>
+                            <th class="agent_2_properties-column">Properties</th>
+                            <th>Description</th>
+                            <th style="width: 70px;">Action</th>
+                        </tr>
+                    </thead>
+                    <tbody>
+                        <tr>
+                            <td class="agent_2_properties-column"><input type="text" placeholder="e.g. Synonym"></td>
+                            <td class="agent_2_description-column"><input type="text"
+                                    placeholder="e.g. Whether the words in the word pair are synonyms."></td>
+                            <td><button class="delete-row-btn delete-btn">Delete</button></td>
+                        </tr>
+                    </tbody>
+                </table>
+                <div class="button-container-spaced">
+                    <button id="add_agent_2_property_button">Add Validator's property</button>
+                </div>
+                <div class="form-group" style="margin-top: 15px;">
+                    <div class="label-container">
+                        <label>
+                            <input type="checkbox" id="agent2_individual_validation" style="margin-right: 8px;">
+                            Individual Criteria Validation
+                        </label>
+                        <div class="tooltip info-icon">
+                            <span class="info-icon-inner">i</span>
+                            <span class="tooltip-text">
+                                <strong>When this option is enabled:</strong><br>
+                                • Agent 2 (Validator) will validate each criterion individually instead of all at once<br>
+                                • Validation stops immediately when any criterion fails (early rejection)<br>
+                                • May provide more precise validation but increases the number of API calls and processing time
+                            </span>
+                        </div>
+                    </div>
+                </div>
+            </div>
+            <div class="spacing"></div>
+            <div class="form-group">
+                <div class="label-container">
+                    <label><i class="fas fa-star agent-icon scorer-icon"></i> Scorer</label>
+                    <div class="tooltip info-icon">
+                        <span class="info-icon-inner">i</span>
+                        <span class="tooltip-text">
+                            <strong>### What is a Scorer?</strong><br>
+                            Scorers assign <strong>numeric scores</strong> to each generated item based on specific
+                            quality dimensions.<br>
+                            These scores can be used to compare, filter, or rank items.<br>
+                            ---<br>
+                            <strong>### What to define:</strong><br>
+                            - <strong>Aspects</strong>: The dimension you're evaluating (e.g. Fluency, Frequency,
+                            Informativeness)<br>
+                            - <strong>Description</strong>: What this score represents<br>
+                            - <strong>Min / Max score</strong>: Define the scoring scale (e.g. from 0 to 10)<br>
+                            ---<br>
+                            <strong>### Example:</strong><br>
+                            | Aspect | Description | Min score | Max score |<br>
+                            |Word Pair Frequency | How frequently the word pair is used in English | 0 | 10 |<br>
+                            | Predictability | How strongly the supportive context predicts the target word | 0 | 100 |
+                        </span>
+                    </div>
+                </div>
+                <p class="description-text">Define the dimensions along which the generated items will be rated.</p>
+                <table id="agent3PropertiesTable">
+                    <thead>
+                        <tr>
+                            <th class="agent_3_properties-column">Aspects</th>
+                            <th class="agent_3_description-column">Description</th>
+                            <th class="agent_3_minimum-column">Min score</th>
+                            <th class="agent_3_maximum-column">Max score</th>
+                            <th style="width: 70px;">Action</th>
+                        </tr>
+                    </thead>
+                    <tbody>
+                        <tr>
+                            <td class="agent_3_properties-column"><input type="text"
+                                    placeholder="e.g. Word Pair Frequency"></td>
+                            <td class="agent_3_description-column"><input type="text"
+                                    placeholder="e.g. How frequent the word pair are used in English"></td>
+                            <td class="agent_3_minimum-column"><input type="number" min="0" placeholder="e.g. 0"></td>
+                            <td class="agent_3_maximum-column"><input type="number" min="0" placeholder="e.g. 10"></td>
+                            <td><button class="delete-row-btn delete-btn">Delete</button></td>
+                        </tr>
+                    </tbody>
+                </table>
+                <div class="button-container-spaced">
+                    <button id="add_agent_3_property_button">Add Scorer's aspect</button>
+                </div>
+                <div class="form-group" style="margin-top: 15px;">
+                    <div class="label-container">
+                        <label>
+                            <input type="checkbox" id="agent3_individual_scoring" style="margin-right: 8px;">
+                            Individual Aspect Scoring
+                        </label>
+                        <div class="tooltip info-icon">
+                            <span class="info-icon-inner">i</span>
+                            <span class="tooltip-text">
+                                <strong>When this option is enabled:</strong><br>
+                                • Agent 3 (Scorer) will score each aspect individually instead of all at once<br>
+                                • Each aspect gets a separate API call for more focused scoring<br>
+                                • May provide more accurate scores but increases the number of API calls and processing time
+                            </span>
+                        </div>
+                    </div>
+                </div>
+            </div>
+        </div>
+        <h2>Output</h2>
+        <div class="pale-blue-section">
+            <div class="form-group">
+                <div class="label-container">
+                    <label for="iteration">The number of items</label>
+                    <div class="tooltip info-icon">
+                        <span class="info-icon-inner">i</span>
+                        <span class="tooltip-text"><strong>Positive integer.</strong><br>Rounds of stimulus generation, corresponding to
+                            the number of constructed sets of stimuli.</span>
+                    </div>
+                </div>
+                <input type="text" id="iteration" placeholder="e.g. 50">
+            </div>
+            <div class="button-container">
+                <button id="generate_button">Generate stimulus</button>
+                <button id="stop_button" disabled>Stop</button>
+                <button id="clear_button">Clear all</button>
+            </div>
+            <div class="generation-status-container">
+                <span id="generation_status" class="generation-status"></span>
+            </div>
+            <div class="progress-section">
+                <div class="label-container">
+                    <label>Progress bar</label>
+                </div>
+                <div class="progress-container">
+                    <div class="progress-bar" id="progress_bar">
+                        <span class="progress-percentage" id="progress_percentage">0%</span>
+                    </div>
+                </div>
+            </div>
+        </div>
+        <!-- Add output log area -->
+        <h2>Generation Log</h2>
+        <div class="pale-blue-section">
+            <div class="log-container">
+                <div class="log-panel">
+                    <div class="log-header">
+                        <div class="log-header-left">
+                            <i class="fas fa-lightbulb agent-icon generator-icon"></i>
+                            <h3>Generator</h3>
+                        </div>
+                        <button class="log-clear-btn" onclick="clearLog('generator-log')">
+                            <i class="fas fa-trash-alt"></i> Clear
+                        </button>
+                    </div>
+                    <div class="log-content" id="generator-log"></div>
+                </div>
+                <div class="log-panel">
+                    <div class="log-header">
+                        <div class="log-header-left">
+                            <i class="fas fa-check-circle agent-icon validator-icon"></i>
+                            <h3>Validator</h3>
+                        </div>
+                        <button class="log-clear-btn" onclick="clearLog('validator-log')">
+                            <i class="fas fa-trash-alt"></i> Clear
+                        </button>
+                    </div>
+                    <div class="log-content" id="validator-log"></div>
+                </div>
+                <div class="log-panel">
+                    <div class="log-header">
+                        <div class="log-header-left">
+                            <i class="fas fa-star agent-icon scorer-icon"></i>
+                            <h3>Scorer</h3>
+                        </div>
+                        <button class="log-clear-btn" onclick="clearLog('scorer-log')">
+                            <i class="fas fa-trash-alt"></i> Clear
+                        </button>
+                    </div>
+                    <div class="log-content" id="scorer-log"></div>
+                </div>
+            </div>
+        </div>
+    </div>
+    <script src="/static/script.js"></script>
+</body>
+</html>

requirements.txt ADDED Viewed

	@@ -0,0 +1,9 @@

+flask>=2.0.0
+flask-cors>=3.0.0
+flask-socketio>=5.0.0
+openai==0.28.0
+pandas>=1.0.0
+huggingface-hub>=0.19.0
+python-socketio>=5.0.0
+python-engineio>=4.0.0
+apscheduler>=3.9.0