Spaces:

sh4shv4t
/

statestrike-env

Sleeping

App Files Files Community

statestrike-env / README.md

sh4shv4t

docs: add HF Space YAML frontmatter with openenv tag

5383444 about 1 month ago

preview code

raw

history blame contribute delete

5.12 kB

	---
	title: RyFlow
	emoji: "🌊"
	colorFrom: purple
	colorTo: red
	sdk: docker
	python_version: "3.11"
	app_file: app.py
	pinned: false
	tags:
	- openenv
	---

	_____ _ _ ____ _ _ _
	/ ____\| \| \| \| / __ \\| \| (_) \|
	\| (___ \| \|_ __ _\| \|_ ___\| \| \| \| \|_ _ _\| \|_
	\___ \\| __/ _` \| __/ _ \ \| \| \| \| \| \| \| \| __\|
	____) \| \|\| (_\| \| \|\| __/ \|__\| \| \| \|_\| \| \| \|_
	\|_____/ \__\__,_\|\__\___\|\___\_\_\|\__,_\|_\|\__\|

	StateStrike Security Audit Environment
	An OpenEnv-ready stateful API security environment for real-world vulnerability triage.

	![Python 3.11](https://img.shields.io/badge/Python-3.11-blue)
	![OpenEnv Compliant](https://img.shields.io/badge/OpenEnv-Compliant-00C853)
	![MIT License](https://img.shields.io/badge/License-MIT-green)
	![HF Spaces](https://img.shields.io/badge/HF%20Spaces-Ready-yellow)

	## Environment Description and Motivation
	StateStrike models a practical security engineering workflow: systematic API auditing to discover, classify, and chain exploitable behaviors in a production-like service.

	Unlike toy game environments, the agent performs genuine tasks security teams run in real engagements:
	- Endpoint reachability mapping
	- Vulnerability probing and classification
	- Stateful exploit-chain execution

	This design creates measurable operational value: better API hardening and earlier detection of latency-amplifying attack paths.

	## Action Space
	\| Field \| Type \| Description \| Values \|
	\|---\|---\|---\|---\|
	\| endpoint \| EndpointChoice \| Target API operation \| POST /users, GET /users/{id}, POST /orders, GET /orders, GET /health \|
	\| payload_strategy \| PayloadStrategy \| Payload mutation strategy \| valid, redos, oversized, malformed \|
	\| target_user_id \| Optional[int] \| User context for stateful calls \| null or integer user id \|

	## Observation Space
	\| Field \| Type \| Description \|
	\|---\|---\|---\|
	\| step \| int \| Current episode step \|
	\| endpoint_called \| str \| Executed endpoint \|
	\| http_status \| int \| HTTP response code \|
	\| latency_ms \| float \| Request latency in milliseconds \|
	\| response_body \| dict[str, Any] \| Parsed response payload \|
	\| session_order_count \| int \| Number of orders created in session \|
	\| endpoints_discovered \| list[str] \| Reachable endpoints found so far \|
	\| vulnerabilities_found \| list[str] \| Confirmed vulnerability labels \|
	\| task_progress \| float \| Normalized task completion in [0.0, 1.0] \|

	## Task Descriptions
	\| Task \| Difficulty \| Max Steps \| Success Threshold \| Description \|
	\|---\|---\|---:\|---:\|---\|
	\| endpoint_discovery \| easy \| 20 \| 0.60 \| Find all reachable API endpoints \|
	\| vulnerability_probe \| medium \| 30 \| 0.50 \| Find and classify vulnerabilities (redos, db_degradation) \|
	\| exploit_chain \| hard \| 60 \| 0.75 \| Execute full stateful exploit chain with evidence \|

	## Reward Function
	Step reward is normalized to [0.0, 1.0] and shaped by true task progress:

	R_step = clamp(Delta task_score + bonuses - penalties)

	Components:
	- Delta task score: max(0, score_t - score_t-1), capped to 0.30
	- +0.05 for a newly discovered endpoint
	- +0.10 for a newly confirmed vulnerability
	- -0.02 for repeated identical no-op action
	- +0.20 terminal completion bonus when task is solved

	Anti-hacking properties:
	- One-time vulnerability flags prevent bounty farming
	- Chain cooldown and order-growth guards prevent POST/GET cycling exploits
	- Baseline latency updated via EMA only on successful steps
	- Connection failures produce neutral reward and never corrupt baseline

	## Setup Instructions
	### Docker (single command)
	```bash
	docker build -t statestrike .
	docker run -p 7860:7860 statestrike
	```

	### Local Python
	```bash
	python -m pip install -r requirements.txt
	cp .env.example .env
	uvicorn honeypot.app:app --host 0.0.0.0 --port 8000
	HONEYPOT_URL=http://localhost:8000 uvicorn statestrike_env.environment:app --host 0.0.0.0 --port 7860
	python inference.py
	```

	### HF Space URL
	Set this to your deployed environment Space URL:
	- https://sh4shv4t-statestrike-env.hf.space

	## Baseline Scores
	\| Task \| Baseline Score \| Model \|
	\|------\|---------------:\|-------\|
	\| endpoint_discovery \| 0.600 \| Qwen/Qwen2.5-72B-Instruct \|
	\| vulnerability_probe \| 0.400 \| Qwen/Qwen2.5-72B-Instruct \|
	\| exploit_chain \| 0.000 \| Qwen/Qwen2.5-72B-Instruct \|

	## OpenEnv Compliance Checklist
	- [x] Real-world task framing (security audit)
	- [x] Typed Pydantic action/observation/state models
	- [x] reset(), step(), state(), close() implemented
	- [x] Three graded tasks (easy, medium, hard)
	- [x] Graders produce normalized scores in [0.0, 1.0]
	- [x] Partial-progress reward shaping
	- [x] Root inference.py with [START]/[STEP]/[END] format
	- [x] Root openenv.yaml manifest
	- [x] Single-container Docker runtime with /health and /reset

	## Architecture Diagram
	```text
	+-------------------------------+
	\| HF Space Container \|
	\| +-------------------------+ \|
	\| \| Honeypot API :8000 \| \|
	\| +-------------------------+ \|
	\| \| OpenEnv Server :7860 \| \|
	\| \| /reset /step /state \| \|
	\| +-------------------------+ \|
	+---------------+---------------+
	\|
	v
	inference.py (LLM agent)
	```

	## License
	MIT