kan0621 commited on
Commit
bbe4066
·
verified ·
1 Parent(s): ee1c958

Version 1.0

Browse files
Files changed (5) hide show
  1. Dockerfile +30 -0
  2. README.md +579 -6
  3. backend.py +1436 -0
  4. index.html +414 -0
  5. requirements.txt +9 -0
Dockerfile ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.11-slim
2
+
3
+ WORKDIR /app
4
+
5
+ # Install system dependencies
6
+ RUN apt-get update && apt-get install -y \
7
+ gcc \
8
+ && rm -rf /var/lib/apt/lists/*
9
+
10
+ # Copy requirements first for better caching
11
+ COPY requirements.txt .
12
+
13
+ # Install Python dependencies
14
+ RUN pip install --no-cache-dir -r requirements.txt
15
+
16
+ # Copy application files
17
+ COPY . .
18
+
19
+ # Create necessary directories
20
+ RUN mkdir -p sessions
21
+
22
+ # Expose port
23
+ EXPOSE 7860
24
+
25
+ # Set environment variables for production
26
+ ENV PRODUCTION=true
27
+ ENV PORT=7860
28
+
29
+ # Run the application
30
+ CMD ["python", "app.py"]
README.md CHANGED
@@ -1,12 +1,585 @@
1
  ---
2
  title: Stimuli Generator
3
- emoji: 💻
4
- colorFrom: yellow
5
- colorTo: indigo
6
  sdk: docker
7
- pinned: false
8
  license: apache-2.0
9
- short_description: Multi-Agent Stimulus Material Generation Tool
10
  ---
 
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  title: Stimuli Generator
3
+ emoji:
4
+ colorFrom: blue
5
+ colorTo: purple
6
  sdk: docker
7
+ pinned: true
8
  license: apache-2.0
9
+ short_description: A LLM-based Stimulus Material Generation Tool
10
  ---
11
+ # 🧠 Stimulus Generator
12
 
13
+ <div align="center">
14
+
15
+ ![Version](https://img.shields.io/badge/Version-1.0.0-blue)
16
+ ![License](https://img.shields.io/badge/License-Apache%202.0-green)
17
+ ![Python Version](https://img.shields.io/badge/Python-3.8+-yellow)
18
+ [![Documentation](https://img.shields.io/badge/Documentation-Latest-brightgreen)](https://github.com/xufengduan/Stimuli_generator)
19
+
20
+ </div>
21
+
22
+ <p align="center">
23
+ <b>A Large Language Model-based Stimulus Material Generation Tool for Psycholinguistic Research</b>
24
+ </p>
25
+
26
+ <div align="center">
27
+ <a href="#english">English</a> | <a href="#chinese" onclick="document.getElementById('chinese-content').open = true;">中文</a>
28
+ </div>
29
+
30
+ ---
31
+
32
+ <a id="english"></a>
33
+ ## 📖 Project Introduction
34
+
35
+ Stimulus Generator is a tool based on large language models designed specifically for psycholinguistic research to generate stimulus materials. It can automatically generate experimental stimulus materials that meet requirements based on the experimental design and examples defined by researchers. This tool adopts a multi-agent architecture, including Generator, Validator, and Scorer, to ensure that the generated stimulus materials meet experimental requirements and are of high quality.
36
+
37
+ ## ✨ Main Features
38
+
39
+ - **🤖 Multi-agent Architecture**:
40
+ - **Generator**: Generates stimulus materials based on experimental design
41
+ - **Validator**: Verifies whether the generated materials meet experimental requirements
42
+ - **Scorer**: Scores materials in multiple dimensions
43
+
44
+ - **🔄 Flexible Model Selection**:
45
+ - Supports GPT-4 (requires OpenAI API Key)
46
+ - Supports Meta Llama 3.3 70B Instruct model
47
+
48
+ - **📊 Real-time Progress Monitoring**:
49
+ - WebSocket real-time updates of generation progress
50
+ - Detailed log information display
51
+ - Generation process can be stopped at any time
52
+
53
+ - **🎨 User-friendly Interface**:
54
+ - Intuitive form design
55
+ - Real-time validation and feedback
56
+ - Responsive layout design
57
+ - Detailed help information
58
+
59
+ ## 💻 System Requirements
60
+
61
+ | Requirement | Details |
62
+ |-------------|---------|
63
+ | Python | 3.8 or higher |
64
+ | Browser | Modern web browser (Chrome, Firefox, Safari, etc.) |
65
+ | Network | Stable network connection |
66
+ | Socket.IO | Client version 4.x (compatible with server side) |
67
+
68
+ ## 🚀 Installation Instructions
69
+
70
+ ### Clone directly from GitHub repository
71
+
72
+ ```bash
73
+ # 1. Clone the project code
74
+ git clone https://github.com/xufengduan/Stimuli_generator.git
75
+ cd Stimuli_generator
76
+
77
+ # 2. Create and activate a virtual environment (recommended)
78
+ python -m venv venv
79
+
80
+ # Windows
81
+ venv\Scripts\activate
82
+ # Linux/Mac
83
+ source venv/bin/activate
84
+
85
+ # 3. Install required dependencies
86
+ pip install -e .
87
+ ```
88
+
89
+ ## 📝 Usage Instructions
90
+
91
+ ### Launch Web Interface
92
+
93
+ After installation, you can use the command-line tool to start the web interface:
94
+
95
+ ```bash
96
+ stimulus-generator webui
97
+ ```
98
+ By default, the web interface will run at http://127.0.0.1:5001.
99
+
100
+ ### Command Line Arguments
101
+
102
+ ```bash
103
+ stimulus-generator webui --port 5001
104
+ ```
105
+
106
+ | Argument | Description |
107
+ |----------|-------------|
108
+ | `--host` | Specify host address (default: 0.0.0.0) |
109
+ | `--port` | Specify port number (default: 5001) |
110
+ | `--debug` | Enable debug mode |
111
+ | `--share` | Create public link (requires additional dependencies) |
112
+
113
+ ## 🎯 Usage Steps
114
+
115
+ ### 1. Configure Generation Parameters
116
+
117
+ #### 1.1 Select Language Model
118
+ ![Select Language Model](static/images/select_model.png)
119
+
120
+ Choose between:
121
+ - GPT-4 (requires OpenAI API Key)
122
+ - Meta Llama 3.3 70B Instruct model
123
+
124
+ #### 1.2 Enter API Key (if using GPT-4)
125
+ ![Enter API Key](static/images/enter_api_key.png)
126
+
127
+ If you selected GPT-4, enter your OpenAI API Key in the designated field.
128
+
129
+ #### 1.3 Add Example Stimulus Materials
130
+ ![Add Example Materials](static/images/add_examples.png)
131
+
132
+ Components are the building blocks of your stimuli. For example, in a study investigating contextual predictability on word choice:
133
+ - A word pair (e.g., math/mathematics)
134
+ - Supportive context (high predictability)
135
+ - Neutral context
136
+
137
+ Each component should be filled with its corresponding content. For instance:
138
+ - Word pair: "math/mathematics"
139
+ - Supportive context: "The student solved the simple arithmetic problem using basic..."
140
+ - Neutral context: "The student was working on a problem that required..."
141
+
142
+ To add more examples:
143
+ 1. Complete all components for the first item
144
+ 2. Click "Add Item" in the bottom right corner
145
+ 3. Repeat for additional examples (recommended: at least 3 examples)
146
+
147
+ #### 1.4 Fill in Experimental Design Description
148
+ ![Experimental Design](static/images/experimental_design.png)
149
+
150
+ When writing your experimental design description, include these key components:
151
+
152
+ 1. **Purpose of the Stimuli**
153
+ - Explain the experiment's goal
154
+ - Describe how the stimuli support this goal
155
+ - Example: "We are designing stimuli for an experiment investigating whether people prefer shorter words in predictive contexts."
156
+
157
+ 2. **Core Structure of Each Stimulus Item**
158
+ - Describe the components of each item
159
+ - Example: "Each stimulus item includes a word pair and two contexts."
160
+
161
+ 3. **Detailed Description of Each Element**
162
+ For each component, specify:
163
+ - What it is
164
+ - How it's constructed
165
+ - What constraints apply
166
+ - What to avoid
167
+ - Example: "The word pair consists of a short and a long form of the same word... Avoid pairs where either word forms part of a fixed or common phrase."
168
+
169
+ 4. **Experimental Conditions or Variants**
170
+ Explain:
171
+ - Definition of each condition
172
+ - Construction criteria
173
+ - Matching constraints
174
+ - Example: "The supportive context should strongly predict the missing final word... The two contexts should be matched for length."
175
+
176
+ 5. **Example Item**
177
+ Include at least one complete example with labeled parts.
178
+
179
+ 6. **Formatting Guidelines**
180
+ Note any specific formatting or submission requirements.
181
+
182
+ #### 1.5 Review Auto-generated Properties
183
+ ![Review Properties](static/images/review_properties.png)
184
+
185
+ After completing the experimental design:
186
+ 1. Click "Auto-generate Properties"
187
+ 2. The system will automatically set:
188
+ - Validation conditions
189
+ - Scoring dimensions
190
+ 3. **Important**: Review and adjust these auto-generated properties as needed
191
+
192
+ ### 2. Start Generation
193
+ ![Start Generation](static/images/Generating.gif)
194
+
195
+ 1. Click the "Generate stimulus" button
196
+ 2. Monitor progress in real-time
197
+ 3. View detailed logs
198
+ 4. Use "Stop" button if needed
199
+
200
+ ### 3. Get Results
201
+ ![Get Results](static/images/get_results.png)
202
+
203
+ - CSV file automatically downloads upon completion
204
+ - Contains generated materials and scores
205
+
206
+ ## 📂 File Structure
207
+
208
+ ```
209
+ Stimulus-Generator/
210
+ ├── stimulus_generator/ # Main Python package
211
+ │ ├── __init__.py # Package initialization file
212
+ │ ├── app.py # Flask backend server
213
+ │ ├── backend.py # Core backend functionality
214
+ │ └── cli.py # Command line interface
215
+ ├── run.py # Quick start script
216
+ ├── setup.py # Package installation configuration
217
+ ├── static/
218
+ │ ├── script.js # Frontend JavaScript code
219
+ │ ├── styles.css # Page stylesheet
220
+ │ └── Stimulus Generator Web Logo.png # Website icon
221
+ ├── webpage.html # Main page
222
+ ├── requirements.txt # Python dependency list
223
+ └── README.md # Project documentation
224
+ ```
225
+
226
+ ## 🛠️ Command Line Tools
227
+
228
+ After installation, you can use the following command-line tools:
229
+
230
+ ```bash
231
+ # Launch web interface
232
+ stimulus-generator webui [--host HOST] [--port PORT] [--debug] [--share]
233
+
234
+ # View help
235
+ stimulus-generator --help
236
+ ```
237
+
238
+ If you don't want to install, you can also run directly:
239
+
240
+ ```bash
241
+ # After cloning the repository, run in the project directory
242
+ python run.py webui
243
+ ```
244
+
245
+ ## ⚠️ Notes
246
+
247
+ 1. **API Key Security**:
248
+ - Please keep your OpenAI API Key secure
249
+ - Do not expose API Keys in public environments
250
+
251
+ 2. **Generation Process**:
252
+ - Generation process may take some time, please be patient
253
+ - You can monitor generation status in real time through the log panel
254
+ - You can stop generation at any time if there are issues
255
+
256
+ 3. **Results Usage**:
257
+ - It is recommended to check if the generated materials meet experimental requirements
258
+ - Manual screening or modification of generated materials may be needed
259
+
260
+ ## ❓ FAQ
261
+
262
+ <details>
263
+ <summary><b>What to do if the generation process gets stuck?</b></summary>
264
+ <br>
265
+ - Check if the network connection is normal
266
+ - Click the "Stop" button to stop the current generation
267
+ - Refresh the page to restart
268
+ - If the page is unresponsive for a long time, wait for 30 seconds and the system will automatically unlock the interface
269
+ </details>
270
+
271
+ <details>
272
+ <summary><b>How to solve WebSocket connection errors?</b></summary>
273
+ <br>
274
+ - Ensure that the network environment does not block WebSocket connections
275
+ - If you see WebSocket error messages, refresh the page to re-establish the connection
276
+ - Restart the server or try using a different browser
277
+ - WebSocket connection issues will not affect main functionality, the system has automatic recovery mechanisms
278
+ </details>
279
+
280
+ <details>
281
+ <summary><b>How to optimize generation quality?</b></summary>
282
+ <br>
283
+ - Provide more detailed examples
284
+ - Improve experimental design description
285
+ - Set appropriate validation conditions
286
+ </details>
287
+
288
+ <details>
289
+ <summary><b>How to handle slow generation speed?</b></summary>
290
+ <br>
291
+ - Consider reducing the number of items to generate
292
+ - Ensure stable network connection
293
+ - Choose a model with faster response
294
+ </details>
295
+
296
+ ## 📞 Technical Support
297
+
298
+ For questions or suggestions, please contact:
299
+ - Submit an [Issue](https://github.com/xufengduan/Stimuli_generator/issues)
300
+ - Send an email to: ...
301
+
302
+ ## 📄 License
303
+
304
+ This project is licensed under the [Apache License 2.0](LICENSE). See the LICENSE file for details.
305
+
306
+ ---
307
+
308
+ <details id="chinese-content">
309
+ <summary><a id="chinese"></a>中文版本 (Chinese Version)</summary>
310
+
311
+ ## 📖 项目简介
312
+
313
+ Stimulus Generator 是一个基于大语言模型的刺激材料生成工具,专门为心理语言学研究设计。它能够根据研究者定义的实验设计和示例,自动生成符合要求的实验刺激材料。该工具采用多代理架构,包含生成器(Generator)、验证器(Validator)和评分器(Scorer)三个代理,确保生成的刺激材料满足实验要求并具有良好的质量。
314
+
315
+ ## ✨ 主要特点
316
+
317
+ - **🤖 多代理架构**:
318
+ - **Generator**:根据实验设计生成刺激材料
319
+ - **Validator**:验证生成的材料是否符合实验要求
320
+ - **Scorer**:对材料进行多维度评分
321
+
322
+ - **🔄 灵活的模型选择**:
323
+ - 支持 GPT-4 (需要 OpenAI API Key)
324
+ - 支持 Meta Llama 3.3 70B Instruct 模型
325
+
326
+ - **📊 实时进度监控**:
327
+ - WebSocket 实时更新生成进度
328
+ - 详细的日志信息显示
329
+ - 可随时停止生成过程
330
+
331
+ - **🎨 用户友好界面**:
332
+ - 直观的表单设计
333
+ - 实时验证和反馈
334
+ - 响应式布局设计
335
+ - 详细的帮助信息提示
336
+
337
+ ## 💻 系统要求
338
+
339
+ | 要求 | 详情 |
340
+ |------|------|
341
+ | Python | 3.8 或更高版本 |
342
+ | 浏览器 | 现代网页浏览器(Chrome、Firefox、Safari 等) |
343
+ | 网络 | 稳定的网络连接 |
344
+ | Socket.IO | 客户端版本 4.x(与服务器端兼容) |
345
+
346
+ ## 🚀 安装说明
347
+
348
+ ### 直接从GitHub仓库中克隆
349
+
350
+ ```bash
351
+ # 1. 克隆项目代码
352
+ git clone https://github.com/xufengduan/Stimuli_generator.git
353
+ cd Stimuli_generator
354
+
355
+ # 2. 创建并激活虚拟环境(推荐)
356
+ python -m venv venv
357
+
358
+ # Windows
359
+ venv\Scripts\activate
360
+ # Linux/Mac
361
+ source venv/bin/activate
362
+
363
+ # 3. 安装项目所需依赖
364
+ pip install -e .
365
+ ```
366
+
367
+ ## 📝 使用说明
368
+
369
+ ### 启动Web界面
370
+
371
+ 安装完成后,可以直接使用命令行工具启动Web界面:
372
+
373
+ ```bash
374
+ stimulus-generator webui
375
+ ```
376
+
377
+ 默认情况下,Web界面将在 http://127.0.0.1:5001 上运行。
378
+
379
+ ### 命令行参数
380
+
381
+ ```bash
382
+ stimulus-generator webui --port 5001
383
+ ```
384
+
385
+ | 参数 | 描述 |
386
+ |------|------|
387
+ | `--host` | 指定主机地址(默认:0.0.0.0) |
388
+ | `--port` | 指定端口号(默认:5001) |
389
+ | `--debug` | 启用调试模式 |
390
+ | `--share` | 创建公共链接(需要安装额外依赖) |
391
+
392
+ ## 🎯 使用步骤
393
+
394
+ ### 1. 配置生成参数
395
+
396
+ #### 1.1 选择语言模型
397
+ ![选择语言模型](static/images/select_model.png)
398
+
399
+ 可选择:
400
+ - GPT-4(需要 OpenAI API Key)
401
+ - Meta Llama 3.3 70B Instruct 模型
402
+
403
+ #### 1.2 输入 API Key(如果使用 GPT-4)
404
+ ![输入 API Key](static/images/enter_api_key.png)
405
+
406
+ 如果选择了 GPT-4,请在指定字段中输入您的 OpenAI API Key。
407
+
408
+ #### 1.3 添加示例刺激材料
409
+ ![添加示例材料](static/images/add_examples.png)
410
+
411
+ 组件(Components)是刺激材料的组成部分。例如,在研究语境可预测性对词汇选择的影响时:
412
+ - 词对(例如:math/mathematics)
413
+ - 支持性语境(高可预测性)
414
+ - 中性语境
415
+
416
+ 每个组件都需要填写相应的内容。例如:
417
+ - 词对:"math/mathematics"
418
+ - 支持性语境:"学生使用基本的算术解决了这个简单的问题..."
419
+ - 中性语境:"学生正在解决一个需要..."
420
+
421
+ 添加更多示例:
422
+ 1. 完成第一个项目的所有组件
423
+ 2. 点击右下角的"添加项目"按钮
424
+ 3. 重复上述步骤添加更多示例(建议至少添加3个示例)
425
+
426
+ #### 1.4 填写实验设计说明
427
+ ![实验设计](static/images/experimental_design.png)
428
+
429
+ 在编写实验设计说明时,请包含以下关键部分:
430
+
431
+ 1. **刺激材料的目的**
432
+ - 解释实验目标
433
+ - 描述刺激材料如何支持这个目标
434
+ - 示例:"我们正在设计用于研究人们在可预测语境中是否倾向于使用较短词汇的实验刺激材料。"
435
+
436
+ 2. **每个刺激项目的核心结构**
437
+ - 描述每个项目的组成部分
438
+ - 示例:"每个刺激项目包含一个词对和两个语境。"
439
+
440
+ 3. **每个元素的详细描述**
441
+ 对于每个组件,请说明:
442
+ - 它是什么
443
+ - 如何构建
444
+ - 适用的约束条件
445
+ - 需要避免的内容
446
+ - 示例:"词对由同一个词的短形式和长形式组成...避免使用固定搭配或常见短语中的词。"
447
+
448
+ 4. **实验条件或变体**
449
+ 说明:
450
+ - 每个条件的定义
451
+ - 构建标准
452
+ - 匹配约束
453
+ - 示例:"支持性语境应该强烈预测缺失的最后一个词...两个语境应该在长度上匹配。"
454
+
455
+ 5. **示例项目**
456
+ 包含至少一个完整的示例,并标注各个部分。
457
+
458
+ 6. **格式指南**
459
+ 注明任何特定的格式或提交要求。
460
+
461
+ #### 1.5 检查自��生成的属性
462
+ ![检查属性](static/images/review_properties.png)
463
+
464
+ 完成实验设计后:
465
+ 1. 点击"自动生成属性"按钮
466
+ 2. 系统将自动设置:
467
+ - 验证条件
468
+ - 评分维度
469
+ 3. **重要**:请检查并根据需要调整这些自动生成的属性
470
+
471
+ ### 2. 开始生成
472
+ ![开始生成](static/images/Generating.gif)
473
+
474
+ 1. 点击"生成刺激材料"按钮
475
+ 2. 实时监控进度
476
+ 3. 查看详细日志
477
+ 4. 必要时使用"停止"按钮
478
+
479
+ ### 3. 获取结果
480
+ ![获取结果](static/images/get_results.png)
481
+
482
+ - 完成后自动下载 CSV 格式的结果文件
483
+ - 包含生成的刺激材料及其评分
484
+
485
+ ## 📂 文件结构
486
+
487
+ ```
488
+ Stimulus-Generator/
489
+ ├── stimulus_generator/ # 主Python包
490
+ │ ├── __init__.py # 包初始化文件
491
+ │ ├── app.py # Flask 后端服务器
492
+ │ ├── backend.py # 后端核心功能
493
+ │ └── cli.py # 命令行接口
494
+ ├── run.py # 快速启动脚本
495
+ ├── setup.py # 包安装配置
496
+ ├── static/
497
+ │ ├── script.js # 前端 JavaScript 代码
498
+ │ ├── styles.css # 页面样式表
499
+ │ └── Stimulus Generator Web Logo.png # 网站图标
500
+ ├── webpage.html # 主页面
501
+ ├── requirements.txt # Python 依赖列表
502
+ └── README.md # 项目说明文档
503
+ ```
504
+
505
+ ## 🛠️ 命令行工具
506
+
507
+ 安装后,可以使用以下命令行工具:
508
+
509
+ ```bash
510
+ # 启动Web界面
511
+ stimulus-generator webui [--host HOST] [--port PORT] [--debug] [--share]
512
+
513
+ # 查看帮助
514
+ stimulus-generator --help
515
+ ```
516
+
517
+ 如果您不想安装,也可以直接使用以下方式运行:
518
+
519
+ ```bash
520
+ # 克隆仓库后,在项目目录中运行
521
+ python run.py webui
522
+ ```
523
+
524
+ ## ⚠️ 注意事项
525
+
526
+ 1. **API 密钥安全**:
527
+ - 请妥善保管您的 OpenAI API Key
528
+ - 不要在公共环境中暴露 API Key
529
+
530
+ 2. **生成过程**:
531
+ - 生成过程可能需要一定时间,请耐心等待
532
+ - 可以通过日志面板实时监控生成状态
533
+ - 如遇到问题可以随时停止生成
534
+
535
+ 3. **结果使用**:
536
+ - 建议检查生成的材料是否符合实验要求
537
+ - 可能需要对生成的材料进行人工筛选或修改
538
+
539
+ ## ❓ 常见问题
540
+
541
+ <details>
542
+ <summary><b>生成过程卡住怎么办?</b></summary>
543
+ <br>
544
+ - 检查网络连接是否正常
545
+ - 点击 "Stop" 按钮停止当前生成
546
+ - 刷新页面重新开始
547
+ - 如果页面长时间无响应,可以等待30秒,系统会自动解除界面锁定
548
+ </details>
549
+
550
+ <details>
551
+ <summary><b>WebSocket连接错误如何解决?</b></summary>
552
+ <br>
553
+ - 确保网络环境没有阻止WebSocket连接
554
+ - 如果看到WebSocket错误信息,可以刷新页面重新建立连接
555
+ - 重启服务器或尝试使用不同的浏览器
556
+ - WebSocket连接问题不会影响主要功能,系统有自动恢复机制
557
+ </details>
558
+
559
+ <details>
560
+ <summary><b>如何优化生成质量?</b></summary>
561
+ <br>
562
+ - 提供更多详细的示例
563
+ - 完善实验设计说明
564
+ - 设置合适的验证条件
565
+ </details>
566
+
567
+ <details>
568
+ <summary><b>生成速度较慢怎么处理?</b></summary>
569
+ <br>
570
+ - 考虑减少生成数量
571
+ - 确保网络连接稳定
572
+ - 选择响应更快的模型
573
+ </details>
574
+
575
+ ## 📞 技术支持
576
+
577
+ 如有问题或建议,请通过以下方式联系:
578
+ - 提交 [Issue](https://github.com/xufengduan/Stimuli_generator/issues)
579
+ - 发送邮件至:...
580
+
581
+ ## 📄 许可证
582
+
583
+ 本项目采用 [Apache License 2.0](LICENSE) 许可证。详见 LICENSE 文件。
584
+
585
+ </details>
backend.py ADDED
@@ -0,0 +1,1436 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ import os
3
+ import queue
4
+ import random
5
+ import time
6
+ import traceback
7
+ from abc import ABC, abstractmethod
8
+ from multiprocessing import Process, Queue
9
+
10
+ import openai
11
+ import pandas as pd
12
+ import requests
13
+ from requests.exceptions import RequestException, Timeout, ConnectionError as RequestsConnectionError
14
+
15
+ # Set OpenAI API key
16
+ # openai.api_key = ""
17
+
18
+ # Set Chutes AI API key (commented out)
19
+
20
+ # Use multiprocessing to implement real timeout mechanism
21
+
22
+
23
+ def _timeout_target(queue, func, args, kwargs):
24
+ """multiprocessing target function, must be defined at module level to be pickled"""
25
+ try:
26
+ result = func(*args, **kwargs)
27
+ queue.put(('success', result))
28
+ except Exception as e:
29
+ tb = traceback.format_exc()
30
+ print(f"Exception in subprocess:\n{tb}")
31
+ queue.put(('error', f"{type(e).__name__}: {str(e)}\n{tb}"))
32
+
33
+
34
+ def call_with_timeout(func, args, kwargs, timeout_seconds=60):
35
+ """use multiprocessing to implement API call timeout, can force terminate"""
36
+ queue = Queue()
37
+ process = Process(target=_timeout_target, args=(queue, func, args, kwargs))
38
+ process.start()
39
+ process.join(timeout_seconds)
40
+
41
+ if process.is_alive():
42
+ # force terminate process
43
+ process.terminate()
44
+ process.join()
45
+ print(
46
+ f"API call timed out after {timeout_seconds} seconds and process was terminated")
47
+ return {"error": f"API call timed out after {timeout_seconds} seconds"}
48
+
49
+ try:
50
+ result_type, result = queue.get_nowait()
51
+ if result_type == 'success':
52
+ return result
53
+ else:
54
+ return {"error": result}
55
+ except queue.Empty:
56
+ return {"error": "Process completed but no result returned"}
57
+
58
+
59
+ # ======================
60
+ # 1. Configuration (Prompt + Schema)
61
+ # ======================
62
+ # ---- Agent 1 Prompt ----
63
+ AGENT_1_PROMPT_TEMPLATE = """\
64
+ Please help me construct one item as stimuli for a psycholinguistic experiment based on the description:
65
+
66
+ Experimental stimuli design: {experiment_design}
67
+
68
+ Existing stimuli (DO NOT repeat any of these): {previous_stimuli}
69
+
70
+ Previously rejected stimuli with validation feedback (learn from these failures and avoid similar issues):
71
+ {rejected_stimuli}
72
+
73
+ CRITICAL REQUIREMENTS:
74
+ 1. Generate a COMPLETELY NEW and UNIQUE stimulus that is DIFFERENT from ALL existing stimuli above.
75
+ 2. Do NOT repeat or slightly modify any existing stimulus - create something entirely original.
76
+ 3. Avoid any content that overlaps with existing or rejected stimuli.
77
+ 4. Learn from the rejected stimuli above - understand why they failed validation and avoid making similar mistakes.
78
+ {generation_requirements}
79
+
80
+ Please return in JSON format.
81
+ """
82
+
83
+ # ---- Agent 2 Prompt ----
84
+ AGENT_2_PROMPT_TEMPLATE = """\
85
+ Please verify the following NEW STIMULUS with utmost precision, ensuring they meet the Experimental stimuli design and following strict criteria.
86
+
87
+ NEW STIMULUS: {new_stimulus};
88
+
89
+ Experimental stimuli design: {experiment_design}
90
+
91
+ Please return in JSON format.
92
+ """
93
+
94
+ # ---- Agent 3 Prompt ----
95
+ AGENT_3_PROMPT_TEMPLATE = """\
96
+ Please rate the following STIMULUS based on the Experimental stimuli design provided for a psychological experiment:
97
+
98
+ STIMULUS: {valid_stimulus}
99
+ Experimental stimuli design: {experiment_design}
100
+
101
+ SCORING REQUIREMENTS:
102
+ {scoring_requirements}
103
+
104
+ Please return in JSON format including the score for each dimension within the specified ranges.
105
+ """
106
+
107
+ # ---- Agent 1 Stimulus Schema ----
108
+ AGENT_1_PROPERTIES = {}
109
+
110
+ # ---- Agent 2 Validation Result Schema ----
111
+ AGENT_2_PROPERTIES = {}
112
+
113
+ # ---- Agent 3 Scoring Result Schema ----
114
+ AGENT_3_PROPERTIES = {}
115
+
116
+
117
+ # ======================
118
+ # 2. Abstract Model Client Interface
119
+ # ======================
120
+ class ModelClient(ABC):
121
+ """Abstract base class for model clients"""
122
+
123
+ @abstractmethod
124
+ def generate_completion(self, prompt, properties, params=None):
125
+ """Generate a completion with JSON schema response format"""
126
+ pass
127
+
128
+ @abstractmethod
129
+ def get_default_params(self):
130
+ """Get default parameters for this model"""
131
+ pass
132
+
133
+
134
+ # ======================
135
+ # 3. Concrete Model Client Implementations
136
+ # ======================
137
+ class OpenAIClient(ModelClient):
138
+ """OpenAI GPT model client"""
139
+
140
+ def __init__(self, api_key=None):
141
+ self.api_key = api_key
142
+ if api_key:
143
+ openai.api_key = api_key
144
+ print("OpenAI API key configured successfully")
145
+ else:
146
+ print("Warning: No OpenAI API key provided!")
147
+
148
+ def _api_call(self, prompt, properties, params, api_key):
149
+ """API call function, will be called by multiprocessing"""
150
+ # set API key in subprocess
151
+ openai.api_key = api_key
152
+
153
+ return openai.ChatCompletion.create(
154
+ model=params["model"],
155
+ messages=[{"role": "user", "content": prompt}],
156
+ response_format={
157
+ "type": "json_schema",
158
+ "json_schema": {
159
+ "name": "response_schema",
160
+ "schema": {
161
+ "type": "object",
162
+ "properties": properties,
163
+ "required": list(properties.keys()),
164
+ "additionalProperties": False
165
+ }
166
+ }
167
+ }
168
+ )
169
+
170
+ def generate_completion(self, prompt, properties, params=None):
171
+ """Generate completion using OpenAI API"""
172
+ if params is None:
173
+ params = self.get_default_params()
174
+
175
+ # retry mechanism
176
+ for attempt in range(3):
177
+ try:
178
+ response = call_with_timeout(
179
+ self._api_call, (prompt, properties, params, self.api_key), {}, 60)
180
+
181
+ if isinstance(response, dict) and "error" in response:
182
+ print(f"OpenAI API timeout attempt {attempt + 1}/3")
183
+ if attempt == 2: # last attempt
184
+ return {"error": "API timeout after 3 attempts"}
185
+ time.sleep(2 ** attempt) # exponential backoff
186
+ continue
187
+
188
+ return json.loads(response['choices'][0]['message']['content'])
189
+ except json.JSONDecodeError as e:
190
+ print(f"Failed to parse OpenAI JSON response: {e}")
191
+ return {"error": f"Failed to parse response: {str(e)}"}
192
+ except (openai.error.APIError, openai.error.RateLimitError) as e:
193
+ print(f"OpenAI API error attempt {attempt + 1}/3: {e}")
194
+ if attempt == 2:
195
+ return {"error": f"OpenAI API error after 3 attempts: {str(e)}"}
196
+ time.sleep(2 ** attempt)
197
+ except openai.error.AuthenticationError as e:
198
+ print(f"OpenAI authentication error: {e}")
199
+ return {"error": f"Authentication failed: {str(e)}"}
200
+ except openai.error.InvalidRequestError as e:
201
+ print(f"OpenAI invalid request: {e}")
202
+ return {"error": f"Invalid request: {str(e)}"}
203
+
204
+ def get_default_params(self):
205
+ return {"model": "gpt-4o"}
206
+
207
+
208
+ # class HuggingFaceClient(ModelClient):
209
+ # """Hugging Face model client"""
210
+
211
+ # def __init__(self, api_key):
212
+ # self.api_key = api_key
213
+
214
+ # def _api_call(self, messages, response_format, params):
215
+ # """API call function that will be called by multiprocessing"""
216
+ # client = InferenceClient(
217
+ # params["model"],
218
+ # token=self.api_key,
219
+ # headers={"x-use-cache": "false"}
220
+ # )
221
+
222
+ # return client.chat_completion(
223
+ # messages=messages,
224
+ # response_format=response_format,
225
+ # max_tokens=params.get("max_tokens", 1000),
226
+ # temperature=params.get("temperature", 0.7)
227
+ # )
228
+
229
+ # def generate_completion(self, prompt, properties, params=None):
230
+ # """Generate completion using Hugging Face API"""
231
+ # if params is None:
232
+ # params = self.get_default_params()
233
+
234
+ # response_format = {
235
+ # "type": "json_schema",
236
+ # "json_schema": {
237
+ # "name": "response_schema",
238
+ # "schema": {
239
+ # "type": "object",
240
+ # "properties": properties,
241
+ # "required": list(properties.keys()),
242
+ # "additionalProperties": False
243
+ # }
244
+ # }
245
+ # }
246
+
247
+ # messages = [{"role": "user", "content": prompt}]
248
+
249
+ # # Retry mechanism
250
+ # for attempt in range(3):
251
+ # try:
252
+ # response = call_with_timeout(
253
+ # self._api_call, (messages, response_format, params), {}, 60)
254
+
255
+ # if isinstance(response, dict) and "error" in response:
256
+ # print(f"HuggingFace API timeout attempt {attempt + 1}/3")
257
+ # if attempt == 2:
258
+ # return {"error": "API timeout after 3 attempts"}
259
+ # time.sleep(2 ** attempt)
260
+ # continue
261
+
262
+ # content = response.choices[0].message.content
263
+ # return json.loads(content)
264
+ # except (json.JSONDecodeError, AttributeError, IndexError) as e:
265
+ # print(f"Failed to parse HuggingFace JSON response: {e}")
266
+ # return {"error": "Failed to parse response"}
267
+ # except Exception as e:
268
+ # print(f"HuggingFace API error attempt {attempt + 1}/3: {e}")
269
+ # if attempt == 2:
270
+ # return {"error": f"API error after 3 attempts: {str(e)}"}
271
+ # time.sleep(2 ** attempt)
272
+
273
+ # def get_default_params(self):
274
+ # return {
275
+ # "model": "meta-llama/Llama-3.3-70B-Instruct",
276
+ # }
277
+
278
+
279
+ class CustomModelClient(ModelClient):
280
+ """Custom model client for user-defined APIs"""
281
+
282
+ def __init__(self, api_url, api_key, model_name):
283
+ self.api_url = api_url
284
+ self.api_key = api_key
285
+ self.model_name = model_name
286
+
287
+ def _api_call(self, request_data, headers):
288
+ """API call function, will be called by multiprocessing"""
289
+ try:
290
+ response = requests.post(
291
+ self.api_url,
292
+ headers=headers,
293
+ json=request_data,
294
+ timeout=60 # timeout for requests
295
+ )
296
+ response.raise_for_status()
297
+ return response.json()
298
+ except Timeout:
299
+ raise Timeout(
300
+ f"Request to {self.api_url} timed out after 60 seconds")
301
+ except RequestsConnectionError as e:
302
+ raise RequestsConnectionError(
303
+ f"Failed to connect to {self.api_url}: {str(e)}")
304
+ except RequestException as e:
305
+ raise RequestException(f"Request failed: {str(e)}")
306
+
307
+ def generate_completion(self, prompt, properties, params=None):
308
+
309
+ is_deepseek = self.api_url.strip().startswith("https://api.deepseek.com")
310
+
311
+ if is_deepseek:
312
+ rand_stamp = int(time.time())
313
+ # Generate field list
314
+ field_list = ', '.join([f'"{k}"' for k in properties.keys()])
315
+ # Determine agent type
316
+ # If starts with "Please verify the following NEW STIMULUS ", then return at the end of prompt, each field can only return boolean value
317
+ if prompt.strip().startswith("Please verify the following NEW STIMULUS"):
318
+ prompt = prompt.rstrip() + \
319
+ f"\nPlease return in strict JSON format, fields must include: {field_list}, requirements for each field are as follows: {properties}, each field can only return boolean values (True/False)"
320
+ elif prompt.strip().startswith("Please rate the following STIMULUS"):
321
+ prompt = prompt.rstrip() + \
322
+ f"\nPlease return in strict JSON format, fields must include: {field_list}, requirements for each field are as follows: {properties}, each field can only return numbers"
323
+ else:
324
+ prompt = prompt.rstrip() + \
325
+ f"\nPlease return in strict JSON format, fields must include: {field_list}, requirements for each field are as follows: {properties}"
326
+
327
+ request_data = {
328
+ "model": self.model_name,
329
+ "messages": [
330
+ {"role": "system", "content": f"RAND:{rand_stamp}"},
331
+ {"role": "user", "content": prompt}
332
+ ],
333
+ "stream": False,
334
+ "response_format": {"type": "json_object"}
335
+ }
336
+ else:
337
+ # build base request
338
+ request_data = {
339
+ "model": self.model_name,
340
+ "messages": [{"role": "user", "content": prompt}],
341
+ "stream": False,
342
+ "response_format": {
343
+ "type": "json_schema",
344
+ "json_schema": {
345
+ "name": "response_schema",
346
+ "schema": {
347
+ "type": "object",
348
+ "properties": properties,
349
+ "required": list(properties.keys()),
350
+ "additionalProperties": False
351
+ }
352
+ }
353
+ }
354
+ }
355
+
356
+ if params is not None:
357
+ request_data.update(params)
358
+
359
+ headers = {
360
+ "Authorization": f"Bearer {self.api_key}",
361
+ "Content-Type": "application/json"
362
+ }
363
+
364
+ # retry mechanism
365
+ for attempt in range(3):
366
+ try:
367
+ print("Sending request to Custom API with:",
368
+ json.dumps(request_data, indent=2))
369
+
370
+ result = call_with_timeout(
371
+ self._api_call, (request_data, headers), {}, 600)
372
+
373
+ if isinstance(result, dict) and "error" in result:
374
+ print(f"Custom API timeout attempt {attempt + 1}/3")
375
+ if attempt == 2:
376
+ return {"error": "API timeout after 3 attempts"}
377
+ time.sleep(2 ** attempt)
378
+ continue
379
+
380
+ print("Response from Custom API:",
381
+ json.dumps(result, indent=2))
382
+ content = result["choices"][0]["message"]["content"]
383
+ return json.loads(content)
384
+
385
+ except json.JSONDecodeError as e:
386
+ print(
387
+ f"Custom API JSON parsing error attempt {attempt + 1}/3: {e}")
388
+ if attempt == 2:
389
+ return {"error": f"API JSON parsing error after 3 attempts: {str(e)}"}
390
+ time.sleep(2 ** attempt)
391
+ except KeyError as e:
392
+ print(
393
+ f"Custom API response missing expected key attempt {attempt + 1}/3: {e}")
394
+ if attempt == 2:
395
+ return {"error": f"API response missing expected key after 3 attempts: {str(e)}"}
396
+ time.sleep(2 ** attempt)
397
+ except (Timeout, RequestsConnectionError) as e:
398
+ print(
399
+ f"Custom API connection error attempt {attempt + 1}/3: {e}")
400
+ if attempt == 2:
401
+ return {"error": f"API connection error after 3 attempts: {str(e)}"}
402
+ time.sleep(2 ** attempt)
403
+ except RequestException as e:
404
+ print(f"Custom API request error attempt {attempt + 1}/3: {e}")
405
+ if attempt == 2:
406
+ return {"error": f"API request error after 3 attempts: {str(e)}"}
407
+ time.sleep(2 ** attempt)
408
+
409
+ def get_default_params(self):
410
+ return {
411
+ }
412
+
413
+
414
+ # ======================
415
+ # 4. Model Client Factory
416
+ # ======================
417
+ def create_model_client(model_choice, settings=None):
418
+ """Factory function to create appropriate model client"""
419
+ if model_choice == 'GPT-4o':
420
+ api_key = settings.get('api_key') if settings else None
421
+ return OpenAIClient(api_key)
422
+ elif model_choice == 'custom':
423
+ if not settings:
424
+ raise ValueError("Settings required for custom model")
425
+ return CustomModelClient(
426
+ api_url=settings.get('apiUrl'),
427
+ api_key=settings.get('api_key'),
428
+ model_name=settings.get('modelName')
429
+ )
430
+ # elif model_choice == 'HuggingFace':
431
+ # api_key = settings.get('api_key')
432
+ # return HuggingFaceClient(api_key)
433
+ else:
434
+ raise ValueError(f"Unsupported model choice: {model_choice}")
435
+
436
+
437
+ # ======================
438
+ # 5. Unified Agent Functions
439
+ # ======================
440
+ def check_stimulus_repetition(new_stimulus_dict, previous_stimuli_list):
441
+ """
442
+ If the value of any key (dimension) in new_stimulus_dict is exactly the same as the corresponding value in any stimulus in previous_stimuli_list, it is considered a repetition.
443
+ """
444
+ for existing_stimulus in previous_stimuli_list:
445
+ for key, new_value in new_stimulus_dict.items():
446
+ # If the key exists in existing_stimulus and the values are the same, it is considered a repetition
447
+ if key in existing_stimulus:
448
+ try:
449
+ existing_val = str(existing_stimulus[key]).lower()
450
+ new_val = str(new_value).lower()
451
+ if existing_val == new_val:
452
+ return True
453
+ except (AttributeError, TypeError):
454
+ # Skip comparison if values can't be converted to string
455
+ continue
456
+
457
+ return False
458
+
459
+
460
+ def agent_1_generate_stimulus(
461
+ model_client,
462
+ experiment_design,
463
+ previous_stimuli,
464
+ properties,
465
+ rejected_stimuli=None,
466
+ prompt_template=AGENT_1_PROMPT_TEMPLATE,
467
+ params=None,
468
+ stop_event=None):
469
+ """
470
+ Agent 1: Generate new stimulus using the provided model client
471
+ """
472
+ if stop_event and stop_event.is_set():
473
+ print("Generation stopped by user in agent_1_generate_stimulus.")
474
+ return {"stimulus": "STOPPED"}
475
+
476
+ # Use fixed generation_requirements
477
+ generation_requirements = "5. Follow the same JSON format as the existing stimuli."
478
+
479
+ if rejected_stimuli is None:
480
+ rejected_stimuli = []
481
+
482
+ prompt = prompt_template.format(
483
+ experiment_design=experiment_design,
484
+ previous_stimuli=previous_stimuli,
485
+ rejected_stimuli=rejected_stimuli,
486
+ generation_requirements=generation_requirements
487
+ )
488
+
489
+ try:
490
+ result = model_client.generate_completion(prompt, properties, params)
491
+
492
+ # Check stop event again
493
+ if stop_event and stop_event.is_set():
494
+ print(
495
+ "Generation stopped by user after API call in agent_1_generate_stimulus.")
496
+ return {"stimulus": "STOPPED"}
497
+
498
+ if "error" in result:
499
+ return {"stimulus": "ERROR/ERROR"}
500
+
501
+ return result
502
+ except (json.JSONDecodeError, KeyError, TypeError) as e:
503
+ print(f"Error parsing response in agent_1_generate_stimulus: {e}")
504
+ return {"stimulus": "ERROR/ERROR"}
505
+ except (RequestException, Timeout) as e:
506
+ print(f"Network error in agent_1_generate_stimulus: {e}")
507
+ return {"stimulus": "ERROR/ERROR"}
508
+
509
+
510
+ def agent_2_validate_stimulus(
511
+ model_client,
512
+ new_stimulus,
513
+ experiment_design,
514
+ properties,
515
+ prompt_template=AGENT_2_PROMPT_TEMPLATE,
516
+ stop_event=None):
517
+ """
518
+ Agent 2: Validate experimental stimulus using the provided model client
519
+ """
520
+ if stop_event and stop_event.is_set():
521
+ print("Generation stopped by user in agent_2_validate_stimulus.")
522
+ return {"error": "Stopped by user"}
523
+
524
+ prompt = prompt_template.format(
525
+ experiment_design=experiment_design,
526
+ new_stimulus=new_stimulus
527
+ )
528
+
529
+ try:
530
+ # use temperature=0 parameter, get model-specific default params and override temperature
531
+ fixed_params = model_client.get_default_params()
532
+ fixed_params["temperature"] = 0
533
+ result = model_client.generate_completion(
534
+ prompt, properties, fixed_params)
535
+
536
+ print("Agent 2 Output:", result)
537
+
538
+ # Check stop event again
539
+ if stop_event and stop_event.is_set():
540
+ print(
541
+ "Generation stopped by user after API call in agent_2_validate_stimulus.")
542
+ return {"error": "Stopped by user"}
543
+
544
+ if "error" in result:
545
+ print(f"Agent 2 API error: {result}")
546
+ return {"error": f"Failed to validate stimulus: {result.get('error', 'Unknown error')}"}
547
+
548
+ return result
549
+ except (json.JSONDecodeError, KeyError, TypeError) as e:
550
+ print(f"Error parsing validation response: {e}")
551
+ return {"error": f"Failed to parse validation response: {str(e)}"}
552
+ except (RequestException, Timeout) as e:
553
+ print(f"Network error in validation: {e}")
554
+ return {"error": f"Network error during validation: {str(e)}"}
555
+
556
+
557
+ def agent_2_validate_stimulus_individual(
558
+ model_client,
559
+ new_stimulus,
560
+ experiment_design,
561
+ properties,
562
+ prompt_template=AGENT_2_PROMPT_TEMPLATE,
563
+ stop_event=None,
564
+ websocket_callback=None):
565
+ """
566
+ Agent 2: Validate experimental stimulus by checking each criterion individually
567
+ """
568
+ if stop_event and stop_event.is_set():
569
+ print("Generation stopped by user in agent_2_validate_stimulus_individual.")
570
+ return {"error": "Stopped by user"}
571
+
572
+ validation_results = {}
573
+
574
+ # Create individual prompt template for each criterion
575
+ individual_prompt_template = """\
576
+ Please verify the following NEW STIMULUS with utmost precision for the specific criterion mentioned below.
577
+
578
+ NEW STIMULUS: {new_stimulus}
579
+
580
+ Experimental stimuli design: {experiment_design}
581
+
582
+ SPECIFIC CRITERION TO VALIDATE:
583
+ Property: {property_name}
584
+ Description: {property_description}
585
+
586
+ Please return in JSON format with only one field: "{property_name}" (boolean: true if criterion is met, false otherwise).
587
+ """
588
+
589
+ try:
590
+ total_criteria = len(properties)
591
+ current_criterion = 0
592
+
593
+ for property_name, property_description in properties.items():
594
+ current_criterion += 1
595
+
596
+ if stop_event and stop_event.is_set():
597
+ print(
598
+ f"Generation stopped by user while validating {property_name}.")
599
+ return {"error": "Stopped by user"}
600
+
601
+ if websocket_callback:
602
+ websocket_callback(
603
+ "validator", f"Validating criterion {current_criterion}/{total_criteria}: {property_name}")
604
+
605
+ # Create prompt for individual criterion
606
+ prompt = individual_prompt_template.format(
607
+ new_stimulus=new_stimulus,
608
+ experiment_design=experiment_design,
609
+ property_name=property_name,
610
+ property_description=property_description
611
+ )
612
+
613
+ # Create properties dict with single criterion
614
+ single_property = {property_name: property_description}
615
+
616
+ # Get model-specific default params and override temperature
617
+ fixed_params = model_client.get_default_params()
618
+ fixed_params["temperature"] = 0
619
+
620
+ result = model_client.generate_completion(
621
+ prompt, single_property, fixed_params)
622
+
623
+ print(f"Agent 2 Individual Validation - {property_name}: {result}")
624
+
625
+ if "error" in result:
626
+ print(
627
+ f"Agent 2 Individual API error for {property_name}: {result}")
628
+ if websocket_callback:
629
+ websocket_callback(
630
+ "validator", f"Error validating criterion {property_name}: {result.get('error', 'Unknown error')}")
631
+ return {"error": f"Failed to validate criterion {property_name}: {result.get('error', 'Unknown error')}"}
632
+
633
+ # Extract the validation result for this criterion
634
+ if property_name in result:
635
+ validation_results[property_name] = result[property_name]
636
+ status = "PASSED" if result[property_name] else "FAILED"
637
+ if websocket_callback:
638
+ websocket_callback(
639
+ "validator", f"Criterion {property_name}: {status}")
640
+
641
+ # Early stop: if any criterion fails, immediately reject
642
+ if not result[property_name]:
643
+ if websocket_callback:
644
+ websocket_callback(
645
+ "validator", f"Early rejection: Criterion {property_name} failed. Stopping validation.")
646
+ print(
647
+ f"Agent 2 Individual Validation - Early stop: {property_name} failed")
648
+ return validation_results
649
+ else:
650
+ print(
651
+ f"Warning: {property_name} not found in result, assuming False")
652
+ validation_results[property_name] = False
653
+ if websocket_callback:
654
+ websocket_callback(
655
+ "validator", f"Criterion {property_name}: FAILED (parsing error)")
656
+ websocket_callback(
657
+ "validator", f"Early rejection: Criterion {property_name} failed. Stopping validation.")
658
+ print(
659
+ f"Agent 2 Individual Validation - Early stop: {property_name} failed (parsing error)")
660
+ return validation_results
661
+
662
+ print("Agent 2 Individual Validation - All Results:", validation_results)
663
+ if websocket_callback:
664
+ websocket_callback(
665
+ "validator", "All criteria passed successfully!")
666
+ return validation_results
667
+
668
+ except (json.JSONDecodeError, KeyError, TypeError) as e:
669
+ print(f"Error parsing individual validation response: {e}")
670
+ return {"error": f"Failed to parse validation response: {str(e)}"}
671
+ except (RequestException, Timeout) as e:
672
+ print(f"Network error in individual validation: {e}")
673
+ return {"error": f"Network error during validation: {str(e)}"}
674
+
675
+
676
+ def generate_scoring_requirements(properties):
677
+ """
678
+ Generate scoring requirements text from properties dictionary
679
+ """
680
+ if not properties:
681
+ return "No specific scoring requirements provided."
682
+
683
+ requirements = []
684
+ for aspect_name, aspect_details in properties.items():
685
+ min_score = aspect_details.get('minimum', 0)
686
+ max_score = aspect_details.get('maximum', 10)
687
+ description = aspect_details.get('description', aspect_name)
688
+
689
+ requirements.append(
690
+ f"- {aspect_name}: {description} (Score range: {min_score} to {max_score})")
691
+
692
+ return "\n".join(requirements)
693
+
694
+
695
+ def agent_3_score_stimulus(
696
+ model_client,
697
+ valid_stimulus,
698
+ experiment_design,
699
+ properties,
700
+ prompt_template=AGENT_3_PROMPT_TEMPLATE,
701
+ stop_event=None):
702
+ """
703
+ Agent 3: Score experimental stimulus using the provided model client
704
+ """
705
+ if stop_event and stop_event.is_set():
706
+ print("Generation stopped by user after API call in agent_3_score_stimulus.")
707
+ return {field: 0 for field in properties.keys()} if properties else {}
708
+
709
+ # Generate scoring requirements text
710
+ scoring_requirements = generate_scoring_requirements(properties)
711
+
712
+ prompt = prompt_template.format(
713
+ experiment_design=experiment_design,
714
+ valid_stimulus=valid_stimulus,
715
+ scoring_requirements=scoring_requirements
716
+ )
717
+
718
+ try:
719
+ # use temperature=0 parameter, get model-specific default params and override temperature
720
+ fixed_params = model_client.get_default_params()
721
+ fixed_params["temperature"] = 0
722
+ result = model_client.generate_completion(
723
+ prompt, properties, fixed_params)
724
+
725
+ if stop_event and stop_event.is_set():
726
+ print("Generation stopped by user after API call in agent_3_score_stimulus.")
727
+ return {field: 0 for field in properties.keys()} if properties else {}
728
+
729
+ if "error" in result:
730
+ print(f"Agent 3 API error: {result}")
731
+ return {field: 0 for field in properties.keys()}
732
+
733
+ return result
734
+ except (json.JSONDecodeError, KeyError, TypeError) as e:
735
+ print(f"Error parsing scoring response: {e}")
736
+ return {field: 0 for field in properties.keys()}
737
+ except (RequestException, Timeout) as e:
738
+ print(f"Network error in scoring: {e}")
739
+ return {field: 0 for field in properties.keys()}
740
+
741
+
742
+ def agent_3_score_stimulus_individual(
743
+ model_client,
744
+ valid_stimulus,
745
+ experiment_design,
746
+ properties,
747
+ prompt_template=AGENT_3_PROMPT_TEMPLATE,
748
+ stop_event=None,
749
+ websocket_callback=None):
750
+ """
751
+ Agent 3: Score experimental stimulus by evaluating each aspect individually
752
+ """
753
+ if stop_event and stop_event.is_set():
754
+ print("Generation stopped by user in agent_3_score_stimulus_individual.")
755
+ return {field: 0 for field in properties.keys()} if properties else {}
756
+
757
+ scoring_results = {}
758
+
759
+ # Create individual prompt template for each aspect
760
+ individual_prompt_template = """\
761
+ Please rate the following STIMULUS based on the specific aspect mentioned below for a psychological experiment:
762
+
763
+ STIMULUS: {valid_stimulus}
764
+ Experimental stimuli design: {experiment_design}
765
+
766
+ SPECIFIC ASPECT TO SCORE:
767
+ - Aspect Name: {aspect_name}
768
+ - Description: {aspect_description}
769
+ - Minimum Score: {min_score}
770
+ - Maximum Score: {max_score}
771
+ - Score Range: You must provide an integer score between {min_score} and {max_score} (inclusive)
772
+
773
+ SCORING INSTRUCTIONS:
774
+ Rate this stimulus on the "{aspect_name}" dimension based on the provided description. Your score should reflect how well the stimulus meets this criterion, with {min_score} being the lowest possible score and {max_score} being the highest possible score.
775
+
776
+ Please return in JSON format with only one field: "{aspect_name}" (integer score within the specified range {min_score}-{max_score}).
777
+ """
778
+
779
+ try:
780
+ total_aspects = len(properties)
781
+ current_aspect = 0
782
+
783
+ for aspect_name, aspect_details in properties.items():
784
+ current_aspect += 1
785
+
786
+ if stop_event and stop_event.is_set():
787
+ print(
788
+ f"Generation stopped by user while scoring {aspect_name}.")
789
+ return {field: 0 for field in properties.keys()}
790
+
791
+ if websocket_callback:
792
+ websocket_callback(
793
+ "scorer", f"Evaluating aspect {current_aspect}/{total_aspects}: {aspect_name}")
794
+
795
+ # Extract min and max scores from aspect details
796
+ min_score = aspect_details.get('minimum', 0)
797
+ max_score = aspect_details.get('maximum', 10)
798
+ description = aspect_details.get('description', aspect_name)
799
+
800
+ # Create prompt for individual aspect
801
+ prompt = individual_prompt_template.format(
802
+ valid_stimulus=valid_stimulus,
803
+ experiment_design=experiment_design,
804
+ aspect_name=aspect_name,
805
+ aspect_description=description,
806
+ min_score=min_score,
807
+ max_score=max_score
808
+ )
809
+
810
+ # Create properties dict with single aspect (include all details for JSON schema)
811
+ single_aspect = {aspect_name: {
812
+ 'type': 'integer',
813
+ 'description': description,
814
+ 'minimum': min_score,
815
+ 'maximum': max_score
816
+ }}
817
+
818
+ # Get model-specific default params and override temperature
819
+ fixed_params = model_client.get_default_params()
820
+ fixed_params["temperature"] = 0
821
+
822
+ result = model_client.generate_completion(
823
+ prompt, single_aspect, fixed_params)
824
+
825
+ print(f"Agent 3 Individual Scoring - {aspect_name}: {result}")
826
+
827
+ if "error" in result:
828
+ print(
829
+ f"Agent 3 Individual API error for {aspect_name}: {result}")
830
+ if websocket_callback:
831
+ websocket_callback(
832
+ "scorer", f"Error scoring aspect {aspect_name}: {result.get('error', 'Unknown error')}")
833
+ scoring_results[aspect_name] = 0
834
+ continue
835
+
836
+ # Extract the scoring result for this aspect
837
+ if aspect_name in result:
838
+ score = result[aspect_name]
839
+ # Ensure score is within valid range
840
+ if isinstance(score, (int, float)):
841
+ score = max(min_score, min(max_score, int(score)))
842
+ scoring_results[aspect_name] = score
843
+ if websocket_callback:
844
+ websocket_callback(
845
+ "scorer", f"Aspect {aspect_name}: {score}/{max_score}")
846
+ else:
847
+ print(
848
+ f"Warning: Invalid score for {aspect_name}, assuming 0")
849
+ scoring_results[aspect_name] = 0
850
+ if websocket_callback:
851
+ websocket_callback(
852
+ "scorer", f"Aspect {aspect_name}: 0/{max_score} (invalid response)")
853
+ else:
854
+ print(
855
+ f"Warning: {aspect_name} not found in result, assuming 0")
856
+ scoring_results[aspect_name] = 0
857
+ if websocket_callback:
858
+ websocket_callback(
859
+ "scorer", f"Aspect {aspect_name}: 0/{max_score} (parsing error)")
860
+
861
+ print("Agent 3 Individual Scoring - All Results:", scoring_results)
862
+ if websocket_callback:
863
+ total_score = sum(scoring_results.values())
864
+ max_possible = sum(aspect_details.get('maximum', 10)
865
+ for aspect_details in properties.values())
866
+ websocket_callback(
867
+ "scorer", f"Individual scoring completed! Total: {total_score}/{max_possible}")
868
+ return scoring_results
869
+
870
+ except (json.JSONDecodeError, KeyError, TypeError) as e:
871
+ print(f"Error parsing individual scoring response: {e}")
872
+ return {field: 0 for field in properties.keys()}
873
+ except (RequestException, Timeout) as e:
874
+ print(f"Network error in individual scoring: {e}")
875
+ return {field: 0 for field in properties.keys()}
876
+
877
+
878
+ # ======================
879
+ # 6. Main Flow Function
880
+ # ======================
881
+ def generate_stimuli(settings):
882
+
883
+ stop_event = settings['stop_event']
884
+ current_iteration = settings['current_iteration']
885
+ total_iterations = settings['total_iterations']
886
+ experiment_design = settings['experiment_design']
887
+ previous_stimuli = settings['previous_stimuli'] if settings['previous_stimuli'] else [
888
+ ]
889
+ model_choice = settings.get('model_choice', 'GPT-4o')
890
+
891
+ ablation = settings.get('ablation', {
892
+ "use_agent_2": True,
893
+ "use_agent_3": True
894
+ })
895
+
896
+ repetition_count = 0
897
+ validation_fails = 0
898
+
899
+ # Get custom parameters for custom model
900
+ custom_params = settings.get('params', None)
901
+
902
+ # Get session_update_callback function and websocket_callback function
903
+ session_update_callback = settings.get('session_update_callback')
904
+ websocket_callback = settings.get('websocket_callback')
905
+
906
+ # Ensure progress value is correctly initialized
907
+ with current_iteration.get_lock(), total_iterations.get_lock():
908
+ current_iteration.value = 0
909
+ total_iterations.value = settings['iteration']
910
+ # Immediately send correct initial progress
911
+ if session_update_callback:
912
+ session_update_callback()
913
+
914
+ # Check stop event at each critical point
915
+ def check_stop(message="Generation stopped by user."):
916
+ if stop_event.is_set():
917
+ print(message)
918
+ if websocket_callback:
919
+ websocket_callback("all", message)
920
+ return True
921
+ return False
922
+
923
+ # Helper function to create partial result when error or stop occurs
924
+ def create_partial_result(record_list, message, is_error=True):
925
+ nonlocal total_iterations
926
+ if len(record_list) > 0:
927
+ df = pd.DataFrame(record_list)
928
+ session_id = settings.get('session_id', 'default')
929
+ timestamp = int(time.time())
930
+ unique_id = ''.join(random.choice('0123456789abcdef')
931
+ for _ in range(6))
932
+ suffix = "_partial" if is_error else "_stopped"
933
+ suggested_filename = f"experiment_stimuli_results_{session_id}_{timestamp}_{unique_id}{suffix}.csv"
934
+
935
+ df['generation_timestamp'] = timestamp
936
+ df['batch_id'] = unique_id
937
+ df['total_iterations'] = total_iterations.value
938
+ df['stopped_by_user'] = not is_error
939
+ df['error_occurred'] = is_error
940
+ df['message'] = message
941
+ df['completed_iterations'] = len(record_list)
942
+
943
+ os.makedirs("outputs", exist_ok=True)
944
+ suggested_filename = os.path.join("outputs", suggested_filename)
945
+
946
+ return df, suggested_filename
947
+ return None, None
948
+
949
+ # Helper function to check stop and return partial data if available
950
+ def check_stop_and_return(message="Generation stopped by user."):
951
+ if stop_event.is_set():
952
+ print(message)
953
+ if websocket_callback:
954
+ websocket_callback("all", message)
955
+ return True, create_partial_result(record_list, message, is_error=False)
956
+ return False, (None, None)
957
+
958
+ # Immediately check if stopped
959
+ if check_stop("Generation stopped before starting."):
960
+ return None, None
961
+
962
+ record_list = []
963
+ rejected_stimuli_memory = []
964
+ agent_1_properties = settings.get('agent_1_properties', {})
965
+ print("Agent 1 Properties:", agent_1_properties)
966
+ if websocket_callback:
967
+ websocket_callback(
968
+ "setup", f"Agent 1 Properties: {agent_1_properties}")
969
+
970
+ if check_stop():
971
+ return None, None
972
+
973
+ agent_2_properties = settings.get('agent_2_properties', {})
974
+ print("Agent 2 Properties:", agent_2_properties)
975
+ if websocket_callback:
976
+ websocket_callback(
977
+ "setup", f"Agent 2 Properties: {agent_2_properties}")
978
+
979
+ if check_stop():
980
+ return None, None
981
+
982
+ agent_3_properties = settings.get('agent_3_properties', {})
983
+ print("Agent 3 Properties:", agent_3_properties)
984
+ if websocket_callback:
985
+ websocket_callback(
986
+ "setup", f"Agent 3 Properties: {agent_3_properties}")
987
+
988
+ if check_stop():
989
+ return None, None
990
+
991
+ # Create model client using factory
992
+ try:
993
+ model_client = create_model_client(model_choice, settings)
994
+ print(f"Using model: {model_choice}")
995
+ if websocket_callback:
996
+ websocket_callback("setup", f"Using model: {model_choice}")
997
+ except Exception as e:
998
+ error_msg = f"Failed to create model client: {str(e)}"
999
+ print(error_msg)
1000
+ if websocket_callback:
1001
+ websocket_callback("setup", error_msg)
1002
+ return None, None
1003
+
1004
+ if check_stop():
1005
+ return None, None
1006
+
1007
+ # Create a function specifically for updating progress
1008
+ def update_progress(completed_iterations):
1009
+ if check_stop():
1010
+ return
1011
+
1012
+ with current_iteration.get_lock(), total_iterations.get_lock():
1013
+ current_value = min(completed_iterations, total_iterations.value)
1014
+ if current_value > current_iteration.value:
1015
+ current_iteration.value = current_value
1016
+ if session_update_callback:
1017
+ session_update_callback()
1018
+
1019
+ # Get actual total iterations
1020
+ total_iter_value = total_iterations.value
1021
+ for iteration_num in range(total_iter_value):
1022
+ stopped, partial_result = check_stop_and_return()
1023
+ if stopped:
1024
+ return partial_result
1025
+
1026
+ round_message = f"=== No. {iteration_num + 1} Round ==="
1027
+ print(round_message)
1028
+ if websocket_callback:
1029
+ websocket_callback("all", round_message)
1030
+
1031
+ # Step 1: Generate stimulus
1032
+ current_retry_count = 0 # Retry counter for this iteration
1033
+ while True:
1034
+ stopped, partial_result = check_stop_and_return()
1035
+ if stopped:
1036
+ return partial_result
1037
+
1038
+ try:
1039
+ stimuli = agent_1_generate_stimulus(
1040
+ model_client=model_client,
1041
+ experiment_design=experiment_design,
1042
+ previous_stimuli=previous_stimuli,
1043
+ properties=agent_1_properties,
1044
+ rejected_stimuli=rejected_stimuli_memory,
1045
+ prompt_template=AGENT_1_PROMPT_TEMPLATE,
1046
+ params=custom_params,
1047
+ stop_event=stop_event
1048
+ )
1049
+
1050
+ if isinstance(stimuli, dict) and stimuli.get('stimulus') == 'STOPPED':
1051
+ stopped, partial_result = check_stop_and_return(
1052
+ "Generation stopped after 'Generator'.")
1053
+ if stopped:
1054
+ return partial_result
1055
+
1056
+ # Skip validation if Agent 1 returned an error
1057
+ if isinstance(stimuli, dict) and stimuli.get('stimulus') == 'ERROR/ERROR':
1058
+ print("Agent 1 returned ERROR, regenerating...")
1059
+ if websocket_callback:
1060
+ websocket_callback(
1061
+ "generator", "Generator returned ERROR, regenerating...")
1062
+ continue
1063
+
1064
+ print("Agent 1 Output:", stimuli)
1065
+ if websocket_callback:
1066
+ websocket_callback(
1067
+ "generator", f"Generator's Output: {json.dumps(stimuli, indent=2)}")
1068
+
1069
+ stopped, partial_result = check_stop_and_return(
1070
+ "Generation stopped after 'Generator'.")
1071
+ if stopped:
1072
+ return partial_result
1073
+
1074
+ # Step 1.5: Check if stimulus already exists
1075
+
1076
+ if check_stimulus_repetition(stimuli, previous_stimuli):
1077
+ repetition_count += 1
1078
+ current_retry_count += 1
1079
+
1080
+ # Add retry limit to avoid infinite loops (but never accept duplicates)
1081
+ max_repetition_retries = 50
1082
+ if current_retry_count > max_repetition_retries:
1083
+ error_msg = f"Failed to generate unique stimulus after {max_repetition_retries} attempts. Consider adjusting experiment design or reducing target count."
1084
+ print(error_msg)
1085
+ if websocket_callback:
1086
+ websocket_callback("generator", error_msg)
1087
+ # Return partial results instead of raising exception
1088
+ return create_partial_result(record_list, error_msg)
1089
+
1090
+ if ablation["use_agent_2"]:
1091
+ print("Detected repeated stimulus, regenerating...")
1092
+
1093
+ if websocket_callback:
1094
+ websocket_callback(
1095
+ "generator", "Detected repeated stimulus, regenerating...")
1096
+ continue
1097
+ else:
1098
+ print(
1099
+ "Ablation: Skipping Agent 2 (Repetition Check)")
1100
+ if websocket_callback:
1101
+ websocket_callback(
1102
+ "generator", "Ablation: Skipping Agent 2 (Repetition Check)")
1103
+
1104
+ stopped, partial_result = check_stop_and_return()
1105
+ if stopped:
1106
+ return partial_result
1107
+
1108
+ # Step 2: Validate stimulus
1109
+ # Check if individual validation is enabled
1110
+ individual_validation = settings.get(
1111
+ 'agent_2_individual_validation', False)
1112
+
1113
+ if individual_validation:
1114
+ if websocket_callback:
1115
+ websocket_callback(
1116
+ "validator", f"Using individual validation mode - checking {len(agent_2_properties)} criteria...")
1117
+ validation_result = agent_2_validate_stimulus_individual(
1118
+ model_client=model_client,
1119
+ new_stimulus=stimuli,
1120
+ experiment_design=experiment_design,
1121
+ properties=agent_2_properties,
1122
+ stop_event=stop_event,
1123
+ websocket_callback=websocket_callback
1124
+ )
1125
+ else:
1126
+ if websocket_callback:
1127
+ websocket_callback(
1128
+ "validator", "Using batch validation mode...")
1129
+ validation_result = agent_2_validate_stimulus(
1130
+ model_client=model_client,
1131
+ new_stimulus=stimuli,
1132
+ experiment_design=experiment_design,
1133
+ properties=agent_2_properties,
1134
+ prompt_template=AGENT_2_PROMPT_TEMPLATE,
1135
+ stop_event=stop_event
1136
+ )
1137
+
1138
+ if isinstance(validation_result, dict) and validation_result.get('error') == 'Stopped by user':
1139
+ stopped, partial_result = check_stop_and_return(
1140
+ "Generation stopped after 'Validator'.")
1141
+ if stopped:
1142
+ return partial_result
1143
+
1144
+ print("Agent 2 Output:", validation_result)
1145
+ if websocket_callback:
1146
+ websocket_callback(
1147
+ "validator", f"Validator's Output: {json.dumps(validation_result, indent=2)}")
1148
+
1149
+ stopped, partial_result = check_stop_and_return(
1150
+ "Generation stopped after 'Validator'.")
1151
+ if stopped:
1152
+ return partial_result
1153
+
1154
+ # Check if there was an error first
1155
+ if 'error' in validation_result:
1156
+ print(f"Validation error: {validation_result['error']}")
1157
+ if websocket_callback:
1158
+ websocket_callback(
1159
+ "validator", f"Validation error: {validation_result['error']}")
1160
+ continue # Skip to next iteration
1161
+
1162
+ # Check validation fields
1163
+ failed_fields = [
1164
+ key for key, value in validation_result.items() if not value]
1165
+
1166
+ if failed_fields:
1167
+ # Some fields failed validation
1168
+ validation_fails += 1
1169
+ current_retry_count += 1
1170
+
1171
+ # Add to rejected memory (only if it's a valid stimulus, not an error)
1172
+ is_error_stimulus = (
1173
+ isinstance(stimuli, dict) and
1174
+ stimuli.get('stimulus') in ['ERROR/ERROR', 'STOPPED']
1175
+ )
1176
+ if not is_error_stimulus:
1177
+ rejected_item = {
1178
+ "stimulus": stimuli,
1179
+ "validation_result": validation_result,
1180
+ "failed_fields": failed_fields
1181
+ }
1182
+ rejected_stimuli_memory.append(rejected_item)
1183
+ # Limit memory size to prevent unbounded growth
1184
+ MAX_REJECTED_MEMORY = 20
1185
+ if len(rejected_stimuli_memory) > MAX_REJECTED_MEMORY:
1186
+ rejected_stimuli_memory = rejected_stimuli_memory[-MAX_REJECTED_MEMORY:]
1187
+
1188
+ print(
1189
+ f"Failed validation for fields: {failed_fields}, regenerating...")
1190
+ if websocket_callback:
1191
+ websocket_callback(
1192
+ "validator", f"Failed validation for fields: {failed_fields}, regenerating...")
1193
+
1194
+ # Check retry limit to avoid infinite loops
1195
+ max_retries = 50
1196
+ if current_retry_count > max_retries:
1197
+ error_msg = f"Failed to generate valid stimulus after {max_retries} attempts. Consider adjusting validation criteria."
1198
+ print(error_msg)
1199
+ if websocket_callback:
1200
+ websocket_callback("validator", error_msg)
1201
+ # Return partial results instead of raising exception
1202
+ return create_partial_result(record_list, error_msg)
1203
+
1204
+ if ablation["use_agent_2"]:
1205
+ continue # Regenerate
1206
+ else:
1207
+ print("Ablation: Skipping Agent 2 (Validation)")
1208
+ if websocket_callback:
1209
+ websocket_callback(
1210
+ "validator", "Ablation: Skipping Agent 2 (Validation)")
1211
+ update_progress(iteration_num + 1)
1212
+ break
1213
+ else:
1214
+ # All validations passed
1215
+ print("All validations passed, proceeding to next step...")
1216
+ if websocket_callback:
1217
+ websocket_callback(
1218
+ "validator", "All validations passed, proceeding to next step...")
1219
+ update_progress(iteration_num + 1)
1220
+ break
1221
+
1222
+ except Exception as e:
1223
+ error_msg = f"Error in generation/validation step: {str(e)}"
1224
+ print(error_msg)
1225
+ if websocket_callback:
1226
+ websocket_callback("all", error_msg)
1227
+ if len(record_list) > 0:
1228
+ df = pd.DataFrame(record_list)
1229
+ session_id = settings.get('session_id', 'default')
1230
+ timestamp = int(time.time())
1231
+ unique_id = ''.join(random.choice(
1232
+ '0123456789abcdef') for _ in range(6))
1233
+ suggested_filename = f"experiment_stimuli_results_{session_id}_{timestamp}_{unique_id}_error.csv"
1234
+
1235
+ df['generation_timestamp'] = timestamp
1236
+ df['batch_id'] = unique_id
1237
+ df['total_iterations'] = total_iter_value
1238
+ df['error_occurred'] = True
1239
+ df['error_message'] = str(e)
1240
+
1241
+ os.makedirs("outputs", exist_ok=True)
1242
+ suggested_filename = os.path.join(
1243
+ "outputs", f"experiment_stimuli_results_{session_id}_{timestamp}_{unique_id}.csv")
1244
+
1245
+ return df, suggested_filename
1246
+ else:
1247
+ raise e
1248
+
1249
+ stopped, partial_result = check_stop_and_return(
1250
+ "Generation stopped after 'Validator'.")
1251
+ if stopped:
1252
+ return partial_result
1253
+
1254
+ try:
1255
+ stopped, partial_result = check_stop_and_return(
1256
+ "Generation stopped before Scorer.")
1257
+ if stopped:
1258
+ return partial_result
1259
+
1260
+ # Step 3: Score
1261
+ if ablation["use_agent_3"]:
1262
+ # Check if individual scoring is enabled
1263
+ individual_scoring = settings.get(
1264
+ 'agent_3_individual_scoring', False)
1265
+
1266
+ if individual_scoring:
1267
+ if websocket_callback:
1268
+ websocket_callback(
1269
+ "scorer", f"Using individual scoring mode - evaluating {len(agent_3_properties)} aspects...")
1270
+ scores = agent_3_score_stimulus_individual(
1271
+ model_client=model_client,
1272
+ valid_stimulus=stimuli,
1273
+ experiment_design=experiment_design,
1274
+ properties=agent_3_properties,
1275
+ stop_event=stop_event,
1276
+ websocket_callback=websocket_callback
1277
+ )
1278
+ else:
1279
+ if websocket_callback:
1280
+ websocket_callback(
1281
+ "scorer", "Using batch scoring mode...")
1282
+ scores = agent_3_score_stimulus(
1283
+ model_client=model_client,
1284
+ valid_stimulus=stimuli,
1285
+ experiment_design=experiment_design,
1286
+ properties=agent_3_properties,
1287
+ prompt_template=AGENT_3_PROMPT_TEMPLATE,
1288
+ stop_event=stop_event
1289
+ )
1290
+
1291
+ if isinstance(scores, dict) and all(v == 0 for v in scores.values()):
1292
+ if stop_event.is_set():
1293
+ stopped, partial_result = check_stop_and_return(
1294
+ "Generation stopped after 'Scorer'.")
1295
+ if stopped:
1296
+ return partial_result
1297
+
1298
+ print("Agent 3 Output:", scores)
1299
+ if websocket_callback:
1300
+ websocket_callback(
1301
+ "scorer", f"Scorer's Output: {json.dumps(scores, indent=2)}")
1302
+
1303
+ stopped, partial_result = check_stop_and_return(
1304
+ "Generation stopped after 'Scorer'.")
1305
+ if stopped:
1306
+ return partial_result
1307
+ else:
1308
+ print("Ablation: Skipping Agent 3 (Scoring)")
1309
+ if websocket_callback:
1310
+ websocket_callback("scorer", "Ablation: Skipping Agent 3")
1311
+
1312
+ # Save results
1313
+ record = {
1314
+ "stimulus_id": iteration_num + 1,
1315
+ "stimulus_content": stimuli,
1316
+ "repetition_count": repetition_count,
1317
+ "validation_fails": validation_fails,
1318
+ "validation_failure_reasons": validation_result
1319
+ }
1320
+ if ablation["use_agent_3"]:
1321
+ record.update(scores or {})
1322
+ record_list.append(record)
1323
+
1324
+ # Update previous_stimuli
1325
+ previous_stimuli.append(stimuli)
1326
+
1327
+ # If some records have been generated, create intermediate results
1328
+ if (iteration_num + 1) % 5 == 0 or iteration_num + 1 == total_iter_value:
1329
+ temp_df = pd.DataFrame(record_list)
1330
+ session_id = settings.get('session_id', 'default')
1331
+ timestamp = int(time.time())
1332
+ unique_id = ''.join(random.choice('0123456789abcdef')
1333
+ for _ in range(6))
1334
+ suggested_filename = f"experiment_stimuli_results_{session_id}_{timestamp}_{unique_id}.csv"
1335
+
1336
+ temp_df['generation_timestamp'] = timestamp
1337
+ temp_df['batch_id'] = unique_id
1338
+ temp_df['total_iterations'] = total_iter_value
1339
+
1340
+ if check_stop():
1341
+ return temp_df, suggested_filename
1342
+
1343
+ if iteration_num + 1 == total_iter_value:
1344
+ update_progress(total_iter_value)
1345
+ return temp_df, suggested_filename
1346
+
1347
+ except Exception as e:
1348
+ error_msg = f"Error in scoring step: {str(e)}"
1349
+ print(error_msg)
1350
+ if websocket_callback:
1351
+ websocket_callback("all", error_msg)
1352
+ if len(record_list) > 0:
1353
+ df = pd.DataFrame(record_list)
1354
+ session_id = settings.get('session_id', 'default')
1355
+ timestamp = int(time.time())
1356
+ unique_id = ''.join(random.choice('0123456789abcdef')
1357
+ for _ in range(6))
1358
+ suggested_filename = f"experiment_stimuli_results_{session_id}_{timestamp}_{unique_id}_error.csv"
1359
+
1360
+ df['generation_timestamp'] = timestamp
1361
+ df['batch_id'] = unique_id
1362
+ df['total_iterations'] = total_iter_value
1363
+ df['error_occurred'] = True
1364
+ df['error_message'] = str(e)
1365
+
1366
+ return df, suggested_filename
1367
+ else:
1368
+ raise e
1369
+
1370
+ # Check again if stopped at final step
1371
+ if check_stop("Generation stopped at final step."):
1372
+ if len(record_list) > 0:
1373
+ df = pd.DataFrame(record_list)
1374
+ session_id = settings.get('session_id', 'default')
1375
+ timestamp = int(time.time())
1376
+ unique_id = ''.join(random.choice('0123456789abcdef')
1377
+ for _ in range(6))
1378
+ suggested_filename = f"experiment_stimuli_results_{session_id}_{timestamp}_{unique_id}.csv"
1379
+
1380
+ df['generation_timestamp'] = timestamp
1381
+ df['batch_id'] = unique_id
1382
+ df['total_iterations'] = total_iter_value
1383
+ df['error_occurred'] = False
1384
+ df['error_message'] = ""
1385
+
1386
+ completion_msg = f"Data generation completed for session {session_id}"
1387
+ print(completion_msg)
1388
+ if websocket_callback:
1389
+ websocket_callback("all", completion_msg)
1390
+ return df, suggested_filename
1391
+ return None, None
1392
+
1393
+ # Only generate DataFrame and return results after all iterations
1394
+ if len(record_list) > 0:
1395
+ update_progress(total_iter_value)
1396
+
1397
+ df = pd.DataFrame(record_list)
1398
+ session_id = settings.get('session_id', 'default')
1399
+ timestamp = int(time.time())
1400
+ unique_id = ''.join(random.choice('0123456789abcdef')
1401
+ for _ in range(6))
1402
+ suggested_filename = f"experiment_stimuli_results_{session_id}_{timestamp}_{unique_id}.csv"
1403
+
1404
+ df['generation_timestamp'] = timestamp
1405
+ df['batch_id'] = unique_id
1406
+ df['total_iterations'] = total_iter_value
1407
+ df['error_occurred'] = False
1408
+ df['error_message'] = ""
1409
+
1410
+ completion_msg = f"Data generation completed for session {session_id}"
1411
+ print(completion_msg)
1412
+ if websocket_callback:
1413
+ websocket_callback("all", completion_msg)
1414
+ return df, suggested_filename
1415
+ else:
1416
+ print("No records generated.")
1417
+ if websocket_callback:
1418
+ websocket_callback("all", "No records generated.")
1419
+ return None, None
1420
+
1421
+
1422
+ # ======================
1423
+ # 7. Legacy Support Function (maintain backward compatibility)
1424
+ # ======================
1425
+ def custom_model_inference_handler(session_id, prompt, model, api_url, api_key, params=None):
1426
+ """Legacy function for backward compatibility"""
1427
+ try:
1428
+ client = CustomModelClient(api_url, api_key, model)
1429
+ result = client.generate_completion(prompt, {}, params)
1430
+
1431
+ if "error" in result:
1432
+ return {'error': result["error"]}, 500
1433
+
1434
+ return {'response': json.dumps(result)}, 200
1435
+ except Exception as e:
1436
+ return {'error': f'Unexpected error: {str(e)}'}, 500
index.html ADDED
@@ -0,0 +1,414 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+
4
+ <head>
5
+ <meta charset="UTF-8">
6
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
7
+ <title>Stimulus Generator</title>
8
+ <link rel="stylesheet" href="/static/styles.css">
9
+ <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.0.0/css/all.min.css">
10
+ <script src="https://cdnjs.cloudflare.com/ajax/libs/socket.io/4.6.0/socket.io.min.js"></script>
11
+ </head>
12
+
13
+ <body>
14
+ <!-- Restart countdown timer -->
15
+ <div id="restart-countdown" class="restart-countdown" style="display: none;">
16
+ <div class="countdown-content">
17
+ <i class="fas fa-clock"></i>
18
+ <span id="countdown-time">20:00</span>
19
+ <span class="countdown-label">Restart countdown</span>
20
+ </div>
21
+ </div>
22
+
23
+ <div class="container">
24
+ <div class="header-container">
25
+ <h1>Stimulus Generator</h1>
26
+ </div>
27
+ <h2>Parameter Settings</h2>
28
+ <div class="pale-blue-section">
29
+ <p class="instruction-text">Fill in your experiment design and let the model generate customized stimulus
30
+ materials.</p>
31
+ <div class="form-group">
32
+ <div class="label-container">
33
+ <label for="model_choice">Model</label>
34
+ <div class="tooltip info-icon">
35
+ <span class="info-icon-inner">i</span>
36
+ <span class="tooltip-text">Select which language model to use for generation.</span>
37
+ </div>
38
+ </div>
39
+ <select id="model_choice" name="model_choice" onchange="handleModelChange()">
40
+ <option value="GPT-4o">GPT-4o</option>
41
+ <option value="custom">Custom Model</option>
42
+ </select>
43
+ </div>
44
+ <div id="custom_model_config"
45
+ style="display: none; margin-top: 10px; padding: 15px; background-color: #f8f9fa; border-radius: 5px;">
46
+ <div class="form-group">
47
+ <div class="label-container">
48
+ <label for="custom_model_name">Model name</label>
49
+ <div class="tooltip info-icon">
50
+ <span class="info-icon-inner">i</span>
51
+ <span class="tooltip-text">Enter the name of your custom model (e.g.,
52
+ deepseek-ai/DeepSeek-V3-0324)</span>
53
+ </div>
54
+ </div>
55
+ <input type="text" id="custom_model_name" name="custom_model_name"
56
+ placeholder="e.g., deepseek-ai/DeepSeek-V3-0324">
57
+ </div>
58
+ <div class="form-group">
59
+ <div class="label-container">
60
+ <label for="custom_api_url">API URL</label>
61
+ <div class="tooltip info-icon">
62
+ <span class="info-icon-inner">i</span>
63
+ <span class="tooltip-text">Enter the API endpoint URL for your custom model</span>
64
+ </div>
65
+ </div>
66
+ <input type="text" id="custom_api_url" name="custom_api_url"
67
+ placeholder="e.g., https://api.example.com/v1/chat/completions">
68
+ </div>
69
+ <div class="form-group">
70
+ <div class="label-container">
71
+ <label for="custom_params">Custom parameters (JSON)</label>
72
+ <div class="tooltip info-icon">
73
+ <span class="info-icon-inner">i</span>
74
+ <span class="tooltip-text">Enter additional parameters in JSON format (e.g., {"max_tokens":
75
+ 2000, "temperature": 0.7})</span>
76
+ </div>
77
+ </div>
78
+ <textarea id="custom_params" name="custom_params" rows="4"
79
+ placeholder='{"max_tokens": 2000, "temperature": 0.7}'></textarea>
80
+ </div>
81
+ </div>
82
+ <div class="form-group">
83
+ <div class="label-container">
84
+ <label for="api_key">API key</label>
85
+ <div class="tooltip info-icon">
86
+ <span class="info-icon-inner">i</span>
87
+ <span class="tooltip-text">Enter your API key for the selected model.</span>
88
+ </div>
89
+ </div>
90
+ <input type="text" id="api_key" placeholder="Enter your API Key">
91
+ </div>
92
+ <div class="spacing"></div>
93
+ <div class="form-group custom-example-group">
94
+ <div class="label-container">
95
+ <label>Example stimuli</label>
96
+ <div class="tooltip info-icon">
97
+ <span class="info-icon-inner">i</span>
98
+ <span class="tooltip-text">
99
+ <strong>### What is this section for?</strong><br>
100
+ This section provides <strong>example stimulus items</strong> for the Generator agent to
101
+ learn from.<br>
102
+ - The <strong>left column</strong> (Component) should match the component names you've
103
+ defined above (e.g. <em>word pair</em>, <em>supportive context</em>, <em>neutral
104
+ context</em>).<br>
105
+ - The <strong>right column</strong> (Content) provides the actual example content for that
106
+ component.<br>
107
+ 💡 Each row defines <strong>one component</strong> of the current item.<br>
108
+ Click <strong>"Add item"</strong> to create a new example stimulus item with the same
109
+ components.<br>
110
+ ---<br>
111
+ <strong>### Example:</strong><br>
112
+ <strong>Item 1:</strong><br>
113
+ - Component: <code>word pair</code> → Content: <code>TV / television</code><br>
114
+ - Component: <code>supportive context</code> → Content:
115
+ <code>She turned on the TV to watch the news.</code><br>
116
+ - Component: <code>neutral context</code> → Content:
117
+ <code>The TV was next to the window.</code><br>
118
+ <strong>Item 2:</strong><br>
119
+ ...<br>
120
+ 📌 You should add at least <strong>2–3 full example items</strong> for best results.
121
+ </span>
122
+ </div>
123
+ </div>
124
+ <p class="description-text">Add multiple example items to help the agent learn.</p>
125
+
126
+ <!-- Example table container -->
127
+ <div id="items-container">
128
+ <!-- Item 1 -->
129
+ <div class="item-container" id="item-1">
130
+ <div class="item-title">Item 1</div>
131
+ <table class="stimuli-table">
132
+ <thead>
133
+ <tr>
134
+ <th class="type-column">Components</th>
135
+ <th class="content-column">Content</th>
136
+ </tr>
137
+ </thead>
138
+ <tbody>
139
+ <tr>
140
+ <td class="type-column"><input type="text" placeholder="e.g. word pair"></td>
141
+ <td class="content-column"><input type="text" placeholder="e.g. math/mathematics">
142
+ </td>
143
+ </tr>
144
+ </tbody>
145
+ </table>
146
+ <div class="item-buttons-row example-buttons">
147
+ <div class="left-buttons">
148
+ <button class="add-component-btn">Add component</button>
149
+ <button class="delete-component-btn">Delete component</button>
150
+ </div>
151
+ <div class="right-buttons">
152
+ <button class="add-item-btn">Add item</button>
153
+ </div>
154
+ </div>
155
+ </div>
156
+ </div>
157
+ </div>
158
+ <div class="form-group">
159
+ <div class="label-container">
160
+ <label for="experiment_design">Stimulus design</label>
161
+ <div class="tooltip info-icon">
162
+ <span class="info-icon-inner">i</span>
163
+ <span class="tooltip-text">
164
+ <strong>### What is "Stimulus design"?</strong><br>
165
+ This field defines the structure and logic of your experimental stimuli.<br>
166
+ It helps the model understand what to generate and how to vary conditions.<br>
167
+ ---<br>
168
+ <strong>### What to include:</strong><br>
169
+ ✅ Components<br>
170
+ List the elements in each item (e.g. word pair, context sentence, target word)<br>
171
+ ✅ Condition manipulation<br>
172
+ Describe how the stimuli differ across conditions (e.g. supportive vs neutral)<br>
173
+ ✅ Constraints (optional)<br>
174
+ Mention any control rules (e.g. sentence length matching)<br>
175
+ ---<br>
176
+ <strong>### Example:</strong><br>
177
+ - A word pair: a short word and its long form (e.g. TV – television)<br>
178
+ - Two context sentences:<br>
179
+ - Supportive context: strongly predicts the target word (e.g. She watches her favorite shows
180
+ on the <strong>TV</strong>.)<br>
181
+ - Neutral context: does not predict the target word (e.g. She placed the ball next to the
182
+ <strong>TV</strong>)<br>
183
+ Supportive and neutral contexts are matched for sentence length and structure.
184
+ </span>
185
+ </div>
186
+ </div>
187
+ <textarea id="experiment_design" placeholder="Describe the structure of each stimulus item:
188
+ - Component 1: ...
189
+ - Component 2: ...
190
+ - Manipulation: ...
191
+ 💡 Click the info icon (ℹ️) to see a complete example."></textarea>
192
+ <!-- 添加"AutoGenerate properties"按钮 -->
193
+ <div class="button-container-spaced">
194
+ <button id="auto_generate_button" class="auto-generate-btn">AutoGenerate properties</button>
195
+ </div>
196
+ <!-- 重启提醒文本 -->
197
+ <div class="restart-notice">* To ensure the server operates normally, the app will auto-restart periodically.</div>
198
+ <div class="restart-notice">* A countdown will appear in the top-left corner of this page twenty minutes prior to each restart.</div>
199
+ </div>
200
+ </div>
201
+ <h2>Agent Property Settings</h2>
202
+ <div class="pale-blue-section">
203
+ <div class="form-group">
204
+ <div class="label-container">
205
+ <label><i class="fas fa-check-circle agent-icon validator-icon"></i> Validator</label>
206
+ <div class="tooltip info-icon">
207
+ <span class="info-icon-inner">i</span>
208
+ <span class="tooltip-text">
209
+ <strong>### What is a Validator?</strong><br>
210
+ Validators define the <strong>mandatory requirements</strong> that each generated stimulus
211
+ must meet.<br>
212
+ ---<br>
213
+ <strong>### How to use:</strong><br>
214
+ - In the <strong>Properties</strong> column, define a short label (e.g.
215
+ <code>IsSynonym</code>, <code>ContainsTargetWord</code>)<br>
216
+ - In the <strong>Description</strong>, explain what this constraint means<br>
217
+ ---<br>
218
+ <strong>### Example:</strong><br>
219
+ IsSynonym: Whether the two words in the word pair are synonyms<br>
220
+ Predictability: Whether the supportive context can predict target word
221
+ </span>
222
+ </div>
223
+ </div>
224
+ <p class="description-text">Define the property name (left) and its validation logic (right). Help the
225
+ Validator agent to filter out unacceptable items.</p>
226
+ <table id="agent2PropertiesTable">
227
+ <thead>
228
+ <tr>
229
+ <th class="agent_2_properties-column">Properties</th>
230
+ <th>Description</th>
231
+ <th style="width: 70px;">Action</th>
232
+ </tr>
233
+ </thead>
234
+ <tbody>
235
+ <tr>
236
+ <td class="agent_2_properties-column"><input type="text" placeholder="e.g. Synonym"></td>
237
+ <td class="agent_2_description-column"><input type="text"
238
+ placeholder="e.g. Whether the words in the word pair are synonyms."></td>
239
+ <td><button class="delete-row-btn delete-btn">Delete</button></td>
240
+ </tr>
241
+ </tbody>
242
+ </table>
243
+ <div class="button-container-spaced">
244
+ <button id="add_agent_2_property_button">Add Validator's property</button>
245
+ </div>
246
+ <div class="form-group" style="margin-top: 15px;">
247
+ <div class="label-container">
248
+ <label>
249
+ <input type="checkbox" id="agent2_individual_validation" style="margin-right: 8px;">
250
+ Individual Criteria Validation
251
+ </label>
252
+ <div class="tooltip info-icon">
253
+ <span class="info-icon-inner">i</span>
254
+ <span class="tooltip-text">
255
+ <strong>When this option is enabled:</strong><br>
256
+ • Agent 2 (Validator) will validate each criterion individually instead of all at once<br>
257
+ • Validation stops immediately when any criterion fails (early rejection)<br>
258
+ • May provide more precise validation but increases the number of API calls and processing time
259
+ </span>
260
+ </div>
261
+ </div>
262
+ </div>
263
+ </div>
264
+ <div class="spacing"></div>
265
+ <div class="form-group">
266
+ <div class="label-container">
267
+ <label><i class="fas fa-star agent-icon scorer-icon"></i> Scorer</label>
268
+ <div class="tooltip info-icon">
269
+ <span class="info-icon-inner">i</span>
270
+ <span class="tooltip-text">
271
+ <strong>### What is a Scorer?</strong><br>
272
+ Scorers assign <strong>numeric scores</strong> to each generated item based on specific
273
+ quality dimensions.<br>
274
+ These scores can be used to compare, filter, or rank items.<br>
275
+ ---<br>
276
+ <strong>### What to define:</strong><br>
277
+ - <strong>Aspects</strong>: The dimension you're evaluating (e.g. Fluency, Frequency,
278
+ Informativeness)<br>
279
+ - <strong>Description</strong>: What this score represents<br>
280
+ - <strong>Min / Max score</strong>: Define the scoring scale (e.g. from 0 to 10)<br>
281
+ ---<br>
282
+ <strong>### Example:</strong><br>
283
+ | Aspect | Description | Min score | Max score |<br>
284
+ |Word Pair Frequency | How frequently the word pair is used in English | 0 | 10 |<br>
285
+ | Predictability | How strongly the supportive context predicts the target word | 0 | 100 |
286
+ </span>
287
+ </div>
288
+ </div>
289
+ <p class="description-text">Define the dimensions along which the generated items will be rated.</p>
290
+ <table id="agent3PropertiesTable">
291
+ <thead>
292
+ <tr>
293
+ <th class="agent_3_properties-column">Aspects</th>
294
+ <th class="agent_3_description-column">Description</th>
295
+ <th class="agent_3_minimum-column">Min score</th>
296
+ <th class="agent_3_maximum-column">Max score</th>
297
+ <th style="width: 70px;">Action</th>
298
+ </tr>
299
+ </thead>
300
+ <tbody>
301
+ <tr>
302
+ <td class="agent_3_properties-column"><input type="text"
303
+ placeholder="e.g. Word Pair Frequency"></td>
304
+ <td class="agent_3_description-column"><input type="text"
305
+ placeholder="e.g. How frequent the word pair are used in English"></td>
306
+ <td class="agent_3_minimum-column"><input type="number" min="0" placeholder="e.g. 0"></td>
307
+ <td class="agent_3_maximum-column"><input type="number" min="0" placeholder="e.g. 10"></td>
308
+ <td><button class="delete-row-btn delete-btn">Delete</button></td>
309
+ </tr>
310
+ </tbody>
311
+ </table>
312
+ <div class="button-container-spaced">
313
+ <button id="add_agent_3_property_button">Add Scorer's aspect</button>
314
+ </div>
315
+ <div class="form-group" style="margin-top: 15px;">
316
+ <div class="label-container">
317
+ <label>
318
+ <input type="checkbox" id="agent3_individual_scoring" style="margin-right: 8px;">
319
+ Individual Aspect Scoring
320
+ </label>
321
+ <div class="tooltip info-icon">
322
+ <span class="info-icon-inner">i</span>
323
+ <span class="tooltip-text">
324
+ <strong>When this option is enabled:</strong><br>
325
+ • Agent 3 (Scorer) will score each aspect individually instead of all at once<br>
326
+ • Each aspect gets a separate API call for more focused scoring<br>
327
+ • May provide more accurate scores but increases the number of API calls and processing time
328
+ </span>
329
+ </div>
330
+ </div>
331
+ </div>
332
+ </div>
333
+ </div>
334
+ <h2>Output</h2>
335
+ <div class="pale-blue-section">
336
+ <div class="form-group">
337
+ <div class="label-container">
338
+ <label for="iteration">The number of items</label>
339
+ <div class="tooltip info-icon">
340
+ <span class="info-icon-inner">i</span>
341
+ <span class="tooltip-text"><strong>Positive integer.</strong><br>Rounds of stimulus generation, corresponding to
342
+ the number of constructed sets of stimuli.</span>
343
+ </div>
344
+ </div>
345
+ <input type="text" id="iteration" placeholder="e.g. 50">
346
+ </div>
347
+ <div class="button-container">
348
+ <button id="generate_button">Generate stimulus</button>
349
+ <button id="stop_button" disabled>Stop</button>
350
+ <button id="clear_button">Clear all</button>
351
+ </div>
352
+ <div class="generation-status-container">
353
+ <span id="generation_status" class="generation-status"></span>
354
+ </div>
355
+ <div class="progress-section">
356
+ <div class="label-container">
357
+ <label>Progress bar</label>
358
+ </div>
359
+ <div class="progress-container">
360
+ <div class="progress-bar" id="progress_bar">
361
+ <span class="progress-percentage" id="progress_percentage">0%</span>
362
+ </div>
363
+ </div>
364
+ </div>
365
+ </div>
366
+ <!-- Add output log area -->
367
+ <h2>Generation Log</h2>
368
+ <div class="pale-blue-section">
369
+ <div class="log-container">
370
+ <div class="log-panel">
371
+ <div class="log-header">
372
+ <div class="log-header-left">
373
+ <i class="fas fa-lightbulb agent-icon generator-icon"></i>
374
+ <h3>Generator</h3>
375
+ </div>
376
+ <button class="log-clear-btn" onclick="clearLog('generator-log')">
377
+ <i class="fas fa-trash-alt"></i> Clear
378
+ </button>
379
+ </div>
380
+ <div class="log-content" id="generator-log"></div>
381
+ </div>
382
+
383
+ <div class="log-panel">
384
+ <div class="log-header">
385
+ <div class="log-header-left">
386
+ <i class="fas fa-check-circle agent-icon validator-icon"></i>
387
+ <h3>Validator</h3>
388
+ </div>
389
+ <button class="log-clear-btn" onclick="clearLog('validator-log')">
390
+ <i class="fas fa-trash-alt"></i> Clear
391
+ </button>
392
+ </div>
393
+ <div class="log-content" id="validator-log"></div>
394
+ </div>
395
+
396
+ <div class="log-panel">
397
+ <div class="log-header">
398
+ <div class="log-header-left">
399
+ <i class="fas fa-star agent-icon scorer-icon"></i>
400
+ <h3>Scorer</h3>
401
+ </div>
402
+ <button class="log-clear-btn" onclick="clearLog('scorer-log')">
403
+ <i class="fas fa-trash-alt"></i> Clear
404
+ </button>
405
+ </div>
406
+ <div class="log-content" id="scorer-log"></div>
407
+ </div>
408
+ </div>
409
+ </div>
410
+ </div>
411
+ <script src="/static/script.js"></script>
412
+ </body>
413
+
414
+ </html>
requirements.txt ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ flask>=2.0.0
2
+ flask-cors>=3.0.0
3
+ flask-socketio>=5.0.0
4
+ openai==0.28.0
5
+ pandas>=1.0.0
6
+ huggingface-hub>=0.19.0
7
+ python-socketio>=5.0.0
8
+ python-engineio>=4.0.0
9
+ apscheduler>=3.9.0