Spaces:
Sleeping
title: Fish Farm Environment Server
emoji: π
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
app_port: 8000
base_path: /web
Fish Farm RL Environment
The world's first OpenEnv-compatible aquaculture farming environment
An AI agent manages a Nile Tilapia Recirculating Aquaculture System (RAS) β making hourly decisions about feeding, aeration, temperature control, water exchange, disease treatment, and harvest timing. Built on real aquaculture science: bioenergetic growth models, coupled DO/ammonia/pH dynamics, SEIR disease epidemiology, stochastic economics, and realistic multi-objective trade-offs.
Built for the Meta PyTorch OpenEnv Hackathon x Scaler School of Technology.
Why Aquaculture?
Aquaculture is a $300B global industry producing 50%+ of the world's fish. Yet:
- No Gymnasium/OpenEnv-compatible aquaculture environment exists β this is the #1 identified gap in aquaculture AI research
- Real farms lose $10B+ annually to preventable die-offs from water quality failures, disease outbreaks, and suboptimal feeding
- The biological cascade (overfeed β ammonia β DO crash β stress β disease β mass mortality) creates a naturally rich RL problem with 13 coupled state variables
- Q-learning already achieved 79% less feed and zero mortality vs traditional control (Chahid et al. 2021) β proving RL has massive real-world impact here
The Biological Cascade
The core challenge: everything is connected.
Overfeeding βββ Ammonia β βββ DO β βββ Fish Stress β βββ Disease βββ Mass Mortality
β β β β β β
Feed Cost pH shift Growth β Feeding β Treatment $ Revenue = 0
β β
Evaporation Nitrification βββ NO2 βββ NO3
concentrates consumes O2 β
Biofilter
An agent that feeds aggressively grows fish faster but risks catastrophic ammonia spikes. An agent that plays it safe grows slowly and loses money. The optimal policy requires balancing 6 continuous controls across 13 coupled state variables β a challenge that scales from easy single-concern tasks to extreme multi-crisis scenarios.
Simulation Engine Highlights
6 Deeply Coupled Subsystem Engines
| Engine | Key Features |
|---|---|
| Water Quality | 10 sub-steps/hour DO mass balance, two-stage nitrification (AOB + NOB), denitrification under anoxic conditions, Smith-Talling photosynthesis, Penman evaporation model, Beer-Lambert light attenuation, nighttime DO crash risk tracking |
| Fish Biology | FAO bioenergetic growth ODE with stochastic noise (Wiener process Ο=2%), dual respiration model (tilapia polynomial RΒ²=0.99 + allometric fallback), size-dependent feeding rates, feeding response behavior |
| Disease | SEIR compartmental model with immunity waning (RβS at 1/30 per day), temperature-dependent pathogen virulence, stress-triggered outbreaks, 4 treatment options + prophylactic vaccination (works without active disease) |
| Economics | Ornstein-Uhlenbeck stochastic feed pricing, seasonal market multipliers (Christmas +15%, Lent +10%, mid-year dip -5%), marginal cost tracking, weight-dependent fish valuation with market premium curve, detailed cost breakdown (7 categories) |
| Weather | Diel temperature/solar cycle, seasonal storm probability (3Γ during monsoon), Beaufort wind scale, humidity-driven evaporation |
| Events | Equipment failures, power outages, algae blooms, feed shortages, market crashes β all wired to appropriate subsystems |
Key Equations
Growth (bioenergetic, from KAUST/FAO research):
dW/dt = [hΒ·ΟΒ·fΒ·bΒ·(1-a)Β·Ο(T)Β·Ο(DO)Β·v(UIA)] Γ W^0.6277 - [k_minΒ·e^(sΒ·(T-T_min))] Γ W^0.8373
DO mass balance (10 sub-steps/hour for stability):
dDO/dt = P_photo - FRΒ·biomass/V - 4.57Β·K_NRΒ·TAN - DO_water + K_aΒ·(DO_sat-DO) + A_mech + Q_exΒ·(DO_in-DO)
Tilapia respiration polynomial (RΒ²=0.99, valid 20-200g, 24-32Β°C):
FR = 2014.45 + 2.75W - 165.2T + 0.007WΒ² + 3.93TΒ² - 0.21WT
Ammonia toxicity:
UIA = TAN / (1 + 10^(pKa - pH)), pKa = 0.09018 + 2729.92/(T + 273.15)
Stochastic feed price (Ornstein-Uhlenbeck):
dp = ΞΊ(ΞΌ - p)dt + ΟΒ·dW, bounded to Β±40% of mean
Environment Design
Action Space (6 continuous controls)
{
"feeding_rate": 0.0-1.0, // Feed intensity (growth vs ammonia trade-off)
"aeration_rate": 0.0-1.0, // Oxygen injection (DO vs electricity cost)
"heater_setting": -1.0-1.0, // Temperature control (growth vs energy)
"water_exchange_rate": 0.0-0.1, // Fresh water (dilution vs water cost)
"harvest_decision": true/false, // Harvest all fish (ends episode)
"treatment": "none/antibiotics/salt/probiotics/vaccination"
}
Observation Space (47 fields, partial observability)
The agent sees sensor readings (temperature, DO, pH, TAN, UIA, NO2, NO3, water quality score, nighttime DO crash risk), fish status (weight, population, mortality, feeding response, stress, FCR, SGR, growth rate, stocking density, survival rate), economics (costs, fish value, profit, feed price, market multiplier, ROI, marginal cost), weather (forecast, daytime, storm, humidity, day of year), equipment status, disease behavioral signals, and event alerts.
Disease infection count is hidden β the agent must infer disease from behavioral indicators (mortality spikes + feeding refusal + elevated stress β disease_suspected flag).
Inference Agent Architecture
The LLM inference agent uses a dual-mode architecture:
- LLM mode: Domain-expert system prompt with full situational awareness (all 30+ observation fields, trend analysis, harvest advisories)
- Heuristic fallback: Rule-based agent that handles the critical cascades correctly when LLM is unavailable or time-constrained
- Adaptive call frequency: More LLM calls during crises (every step), fewer during stable periods (every 4-6 hours)
- Smart time budgeting: Proportional allocation across tasks, automatic fallback to heuristic when time runs low
12 Tasks (Easy β Extreme)
Easy (3 tasks) β Learn one control
| Task | Hours | Challenge |
|---|---|---|
feeding_basics |
168 | Feed fish to 55g+ with FCR < 2.0, zero deaths |
oxygen_management |
72 | Keep DO > 5.0 during hot weather (35Β°C air) |
water_quality_balance |
168 | Maintain all water parameters simultaneously |
Medium (4 tasks) β Multi-concern + events
| Task | Hours | Challenge |
|---|---|---|
temperature_stress |
120 | Survive a 3-day heat wave (38Β°C) |
ammonia_crisis |
72 | Biofilter failure β manage rising ammonia |
disease_outbreak |
240 | Detect and treat disease before 10% mortality |
growth_optimization |
336 | Maximize growth while maintaining water quality |
Hard (3 tasks) β Full lifecycle + compound events
| Task | Hours | Challenge |
|---|---|---|
full_growout |
1440 | 60-day grow-out: 20g β 400g market weight |
storm_response |
120 | Severe storm + 12h power outage + biofilter recovery |
multi_objective |
720 | Pareto-optimize profit Γ welfare Γ environment |
Extreme (2 tasks) β Frontier-model difficulty
| Task | Hours | Challenge |
|---|---|---|
catastrophe_prevention |
336 | 5 compound crises in 14 days (algae bloom β aerator failure β disease β market crash β feed shortage) |
season_management |
2160 | Full 90-day season with random events, ROI optimization |
Quick Start
Setup
git clone https://github.com/Rahul-Rajpurohitk/Agentic-Reinforcement-Learning.git
cd Agentic-Reinforcement-Learning
pip install -r requirements.txt
pip install -e .
# Run tests (304 tests)
pytest tests/ -v
# Start environment server
uvicorn src.agentic_rl.server.app:app --port 8000
Docker
docker build -t fish-farm-env .
docker run -p 8000:8000 fish-farm-env
Run Inference
# Option 1: OpenAI-compatible API
export API_BASE_URL=https://api.openai.com/v1
export MODEL_NAME=gpt-4o
export OPENAI_API_KEY=your_key
python inference.py
# Option 2: HuggingFace Inference (auto-fallback)
export MODEL_NAME=Qwen/Qwen2.5-72B-Instruct
export HF_TOKEN=hf_xxx
python inference_local.py
# Option 3: Heuristic only (no API key needed)
python inference_local.py --heuristic-only
API Endpoints
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check |
/tasks |
GET | List all 12 tasks with action schema |
/reset |
POST | Start episode {"task_id": "feeding_basics"} |
/step |
POST | Submit action, get observation |
/state |
GET | Internal state (ground truth for grading) |
/grader |
POST | Grade a completed episode |
/baseline |
POST | Run constant-action baseline |
/docs |
GET | Interactive Swagger docs |
Project Structure
βββ openenv.yaml # OpenEnv spec (spec_version: 1, type: space)
βββ inference.py # LLM agent + heuristic fallback (< 20 min on 2 vCPU/8GB)
βββ server/app.py # OpenEnv multi-mode entry point (uv run server)
βββ Dockerfile # Container spec
βββ requirements.txt
βββ src/agentic_rl/
β βββ constants.py # All biological/physical/economic constants (frozen dataclasses)
β βββ models.py # FarmAction (6 controls), FarmObservation (30+ fields), FarmState
β βββ tasks.py # 12 task scenarios (easy β extreme)
β βββ rewards.py # Task-weighted reward function (10 component keys) + growth-stage scaling
β βββ engine/
β β βββ water_quality.py # DO mass balance, nitrification, denitrification, evaporation, photosynthesis
β β βββ fish_biology.py # Bioenergetic growth, dual respiration, stress, mortality
β β βββ disease.py # SEIR epidemic, immunity waning, temperature virulence, vaccination
β β βββ economics.py # OU feed pricing, seasonal markets, cost breakdown, ROI
β β βββ weather.py # Diel cycle, seasonal storms, wind/humidity
β β βββ events.py # Event scheduler (equipment, disease, storms, prices)
β β βββ simulator.py # Orchestrator (9-step coupling order)
β βββ server/
β βββ environment.py # FishFarmEnvironment (OpenEnv interface)
β βββ app.py # FastAPI server + custom endpoints
βββ graders/
β βββ base_grader.py # BaseGrader + GradeResult
β βββ farm_graders.py # 12 task-specific graders with partial credit
βββ rewards/
β βββ base_reward.py # BaseReward interface
β βββ example_rewards.py # 5 reward functions (survival, WQ, growth, profit, composite)
βββ training/
β βββ train_grpo.py # GRPO fine-tuning template (requires GPU)
βββ .github/workflows/ci.yml # CI: tests + lint + Docker + OpenEnv validate
βββ tests/ # 304 tests (2.0s)
β βββ test_water_quality.py # DO, TAN, UIA, denitrification, evaporation, temperature
β βββ test_fish_biology.py # Growth, mortality, stress, respiration, size-feeding
β βββ test_disease.py # SEIR dynamics, treatments, vaccination, immunity, temperature
β βββ test_economics.py # Costs, stochastic pricing, seasonal markets, ROI, breakdown
β βββ test_simulator.py # Integration, observations, heuristic, stochastic growth, nighttime DO risk, vaccination prophylaxis, cost breakdown, harvest revenue
β βββ test_constants.py # Parameter sanity, utility functions (32 tests)
β βββ test_tasks_grader.py # Task definitions, all 12 graders
β βββ test_rewards.py # All reward component keys, delta rewards, disease/harvest, nighttime DO risk, growth-stage scaling
β βββ test_models.py # Action/Observation/State model validation
β βββ test_endpoints.py # /tasks, /grader, /baseline API endpoints
βββ docs/
βββ knowledge-base/ # 4,400+ lines of aquaculture research (40+ citations)
Heuristic Agent Scores
The built-in heuristic agent (rule-based, no LLM needed) scores well across all 12 tasks, demonstrating that the graders produce meaningful signal:
| Task | Difficulty | Score | Strategy |
|---|---|---|---|
oxygen_management |
Easy | 1.000 | Proactive aeration, nighttime DO crash prevention |
disease_outbreak |
Medium | 0.990 | Early vaccination before disease onset |
storm_response |
Hard | 0.981 | Pre-storm DO supersaturation + minimal feeding during outage |
ammonia_crisis |
Medium | 0.900 | Aggressive water exchange + feeding reduction |
feeding_basics |
Easy | 0.857 | Growth-stage feeding with size-dependent rates |
water_quality_balance |
Easy | 0.839 | Balanced aeration + exchange with cost awareness |
catastrophe_prevention |
Extreme | 0.817 | Survive through compound crises, harvest after engagement |
temperature_stress |
Medium | 0.703 | Active cooling + reduced metabolic load |
growth_optimization |
Medium | 0.559 | Moderate feeding for growth + FCR balance |
multi_objective |
Hard | 0.469 | Balanced management over 180+ hours |
full_growout |
Hard | 0.446 | Cost management over 60-day cycle |
season_management |
Extreme | 0.362 | Feed conservation + end-of-season harvest |
| Average | 0.744 |
The difficulty gradient is clear: Easy (0.899) β Medium (0.788) β Hard (0.632) β Extreme (0.589). Hard and extreme tasks present genuinely difficult optimization challenges where an LLM agent with multi-step reasoning should significantly outperform rule-based heuristics. Graders enforce engagement β early termination gaming is penalized.
Research Foundation
Built on 4,400+ lines of research across 40+ citations:
- Growth model: FAO bioenergetic equations for Nile Tilapia (Oreochromis niloticus)
- Respiration: Tilapia-specific polynomial (RΒ²=0.99) from controlled feeding experiments
- Water chemistry: DO mass balance, ammonia equilibrium (Emerson et al. 1975), two-stage nitrification
- Disease: SEIR compartmental model with temperature-dependent virulence and immunity waning
- Economics: Ornstein-Uhlenbeck stochastic pricing, seasonal demand curves, real industry cost structures (feed = 50-70% of OpEx)
- RL baseline: Chahid et al. 2021 (Q-learning: 79% feed reduction, zero mortality)
Author
Rahul Rajpurohit β Solo entry, Meta PyTorch OpenEnv Hackathon 2026