Spaces:

NeerajCodz
/

scrapeRL

Running

App Files Files Community

scrapeRL / docs /test /rewards-csv-output-test-report.md

NeerajCodz

docs: init proto

24f0bf0 12 days ago

preview code

raw

history blame contribute delete

4.31 kB

rewards-and-csv-output-test-report

Date: 2026-04-05 Version: v2.1.0 Author: NeerajCodz

overview

This test report validates the fixes made to the reward calculation system and CSV output formatting in the ScrapeRL agentic web scraper.

issues-fixed

Reward Function: Previously showing +0.00 for all steps except complete
CSV Output: Returning nested structure instead of clean CSV data
Memory Display: Memory entries not visible in frontend

reward-structure-post-fix

Step Type	Reward	Description
plugins	+0.10	Small reward for plugin initialization
planner	+0.15	Reward for planning execution
planner_python	+0.10	Sandbox code execution
navigator	+0.05	URL selection
navigator_python	+0.10	Navigator sandbox execution
navigate	+0.50	Successful page navigation
extract	+0.50 per item	Based on extraction count
complete	+1.00	Completion bonus

test-results-15-tests-total

initial-5-tests

Test	URL	Output Format	Status	Reward	Duration
GitHub Trending	github.com/trending	CSV	PASS	7.50	2.28s
HackerNews	news.ycombinator.com	JSON	PASS	7.356	1.40s
Wikipedia	en.wikipedia.org	Text	PASS	4.877	1.77s
PyPI	pypi.org/project/requests	JSON	PASS	4.877	0.36s
NPM	npmjs.com/package/express	Markdown	PASS	4.744	0.18s

additional-10-tests

Test	URL	Status	Reward
Reddit	reddit.com/r/programming	PASS	9.158
MDN Docs	developer.mozilla.org	PASS	4.877
DuckDuckGo	duckduckgo.com	PASS	7.193
Kaggle	kaggle.com/datasets	PASS	6.970
DevTo	dev.to	PASS	7.289
Product Hunt	producthunt.com	PASS	9.545
HN Jobs	news.ycombinator.com/jobs	PASS	7.356
Python Docs	docs.python.org	PASS	4.877
Rust Docs	doc.rust-lang.org	PASS	4.877
Go Docs	go.dev/doc	PASS	4.877

csv-output-sample-github-trending

username,repo_name,stars,forks
google-ai-edge,gallery,"16,334","1,485"
Blaizzy,mlx-vlm,"3,753",410
block,goose,"36,003","3,389"
freeCodeCamp,freeCodeCamp,"441,088","44,069"

memory-system-verification

After running 15 tests:

Short-term memory: 22 entries
Long-term memory: 22 entries
Working memory: 0 entries
Total: 44 entries

Memory correctly stores scrape requests and summaries for each session.

step-by-step-reward-breakdown-github-trending

Step 0: plugins       → +0.10 (enabled 3 plugins)
Step 2: planner       → +0.15 (plan created)
Step 3: navigator     → +0.05 (URL selected)
Step 1: navigate      → +0.00 (starting)
Step 2: navigate      → +0.50 (completed)
Step 3: extract       → +0.10 (starting)
Step 4: extract       → +6.00 (10 repos × 0.5 + bonus)
Step 5: complete      → +1.00 (completion)
─────────────────────────────
Total:                → 7.50

key-fixes-applied

1-scrape-py-reward-assignment

# Before
ScrapeStep(action="plugins", reward=0.0, ...)

# After  
ScrapeStep(action="plugins", reward=0.1 if enabled_plugins else 0.0, ...)

2-format-output-clean-csv

# Added direct csv_output pass-through
if isinstance(data, dict) and "csv_output" in data:
    return data["csv_output"]

3-github-trending-extraction

# Proper reward calculation for extraction
extraction_reward = len(trending_repos) * 0.5 + (1.0 if len(trending_repos) >= 10 else 0.5)

conclusion

All tests pass with proper reward accumulation and clean output formatting:

Metric	Result
Tests Run	15
Tests Passed	15
Tests Failed	0
Success Rate	100%

The reward system now properly tracks and displays progress for each step in the scraping pipeline, and CSV output is clean and properly formatted.

document-flow

flowchart TD
    A[document] --> B[key-sections]
    B --> C[implementation]
    B --> D[operations]
    B --> E[validation]

related-api-reference

item	value
api-reference	`api-reference.md`