Spaces:
Running
Running
rewards-and-csv-output-test-report
Date: 2026-04-05 Version: v2.1.0 Author: NeerajCodz
overview
This test report validates the fixes made to the reward calculation system and CSV output formatting in the ScrapeRL agentic web scraper.
issues-fixed
- Reward Function: Previously showing
+0.00for all steps exceptcomplete - CSV Output: Returning nested structure instead of clean CSV data
- Memory Display: Memory entries not visible in frontend
reward-structure-post-fix
| Step Type | Reward | Description |
|---|---|---|
| plugins | +0.10 | Small reward for plugin initialization |
| planner | +0.15 | Reward for planning execution |
| planner_python | +0.10 | Sandbox code execution |
| navigator | +0.05 | URL selection |
| navigator_python | +0.10 | Navigator sandbox execution |
| navigate | +0.50 | Successful page navigation |
| extract | +0.50 per item | Based on extraction count |
| complete | +1.00 | Completion bonus |
test-results-15-tests-total
initial-5-tests
| Test | URL | Output Format | Status | Reward | Duration |
|---|---|---|---|---|---|
| GitHub Trending | github.com/trending | CSV | PASS | 7.50 | 2.28s |
| HackerNews | news.ycombinator.com | JSON | PASS | 7.356 | 1.40s |
| Wikipedia | en.wikipedia.org | Text | PASS | 4.877 | 1.77s |
| PyPI | pypi.org/project/requests | JSON | PASS | 4.877 | 0.36s |
| NPM | npmjs.com/package/express | Markdown | PASS | 4.744 | 0.18s |
additional-10-tests
| Test | URL | Status | Reward |
|---|---|---|---|
| reddit.com/r/programming | PASS | 9.158 | |
| MDN Docs | developer.mozilla.org | PASS | 4.877 |
| DuckDuckGo | duckduckgo.com | PASS | 7.193 |
| Kaggle | kaggle.com/datasets | PASS | 6.970 |
| DevTo | dev.to | PASS | 7.289 |
| Product Hunt | producthunt.com | PASS | 9.545 |
| HN Jobs | news.ycombinator.com/jobs | PASS | 7.356 |
| Python Docs | docs.python.org | PASS | 4.877 |
| Rust Docs | doc.rust-lang.org | PASS | 4.877 |
| Go Docs | go.dev/doc | PASS | 4.877 |
csv-output-sample-github-trending
username,repo_name,stars,forks
google-ai-edge,gallery,"16,334","1,485"
Blaizzy,mlx-vlm,"3,753",410
block,goose,"36,003","3,389"
freeCodeCamp,freeCodeCamp,"441,088","44,069"
memory-system-verification
After running 15 tests:
- Short-term memory: 22 entries
- Long-term memory: 22 entries
- Working memory: 0 entries
- Total: 44 entries
Memory correctly stores scrape requests and summaries for each session.
step-by-step-reward-breakdown-github-trending
Step 0: plugins β +0.10 (enabled 3 plugins)
Step 2: planner β +0.15 (plan created)
Step 3: navigator β +0.05 (URL selected)
Step 1: navigate β +0.00 (starting)
Step 2: navigate β +0.50 (completed)
Step 3: extract β +0.10 (starting)
Step 4: extract β +6.00 (10 repos Γ 0.5 + bonus)
Step 5: complete β +1.00 (completion)
βββββββββββββββββββββββββββββ
Total: β 7.50
key-fixes-applied
1-scrape-py-reward-assignment
# Before
ScrapeStep(action="plugins", reward=0.0, ...)
# After
ScrapeStep(action="plugins", reward=0.1 if enabled_plugins else 0.0, ...)
2-format-output-clean-csv
# Added direct csv_output pass-through
if isinstance(data, dict) and "csv_output" in data:
return data["csv_output"]
3-github-trending-extraction
# Proper reward calculation for extraction
extraction_reward = len(trending_repos) * 0.5 + (1.0 if len(trending_repos) >= 10 else 0.5)
conclusion
All tests pass with proper reward accumulation and clean output formatting:
| Metric | Result |
|---|---|
| Tests Run | 15 |
| Tests Passed | 15 |
| Tests Failed | 0 |
| Success Rate | 100% |
The reward system now properly tracks and displays progress for each step in the scraping pipeline, and CSV output is clean and properly formatted.
document-flow
flowchart TD
A[document] --> B[key-sections]
B --> C[implementation]
B --> D[operations]
B --> E[validation]
related-api-reference
| item | value |
|---|---|
| api-reference | api-reference.md |