fish_farm_env / README.md
rahul24raj's picture
Upload folder using huggingface_hub
69a2c9c verified
metadata
title: Fish Farm Environment Server
emoji: 🐟
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
app_port: 8000
base_path: /web

Fish Farm RL Environment

CI

The world's first OpenEnv-compatible aquaculture farming environment

An AI agent manages a Nile Tilapia Recirculating Aquaculture System (RAS) β€” making hourly decisions about feeding, aeration, temperature control, water exchange, disease treatment, and harvest timing. Built on real aquaculture science: bioenergetic growth models, coupled DO/ammonia/pH dynamics, SEIR disease epidemiology, stochastic economics, and realistic multi-objective trade-offs.

Built for the Meta PyTorch OpenEnv Hackathon x Scaler School of Technology.

Why Aquaculture?

Aquaculture is a $300B global industry producing 50%+ of the world's fish. Yet:

  • No Gymnasium/OpenEnv-compatible aquaculture environment exists β€” this is the #1 identified gap in aquaculture AI research
  • Real farms lose $10B+ annually to preventable die-offs from water quality failures, disease outbreaks, and suboptimal feeding
  • The biological cascade (overfeed β†’ ammonia β†’ DO crash β†’ stress β†’ disease β†’ mass mortality) creates a naturally rich RL problem with 13 coupled state variables
  • Q-learning already achieved 79% less feed and zero mortality vs traditional control (Chahid et al. 2021) β€” proving RL has massive real-world impact here

The Biological Cascade

The core challenge: everything is connected.

Overfeeding ──→ Ammonia ↑ ──→ DO ↓ ──→ Fish Stress ↑ ──→ Disease ──→ Mass Mortality
     ↑              ↑           ↓           ↓               ↓              ↓
  Feed Cost      pH shift    Growth ↓   Feeding ↓       Treatment $    Revenue = 0
                    ↑           ↓
              Evaporation   Nitrification ──→ NO2 ──→ NO3
              concentrates   consumes O2      ↑
                                           Biofilter

An agent that feeds aggressively grows fish faster but risks catastrophic ammonia spikes. An agent that plays it safe grows slowly and loses money. The optimal policy requires balancing 6 continuous controls across 13 coupled state variables β€” a challenge that scales from easy single-concern tasks to extreme multi-crisis scenarios.


Simulation Engine Highlights

6 Deeply Coupled Subsystem Engines

Engine Key Features
Water Quality 10 sub-steps/hour DO mass balance, two-stage nitrification (AOB + NOB), denitrification under anoxic conditions, Smith-Talling photosynthesis, Penman evaporation model, Beer-Lambert light attenuation, nighttime DO crash risk tracking
Fish Biology FAO bioenergetic growth ODE with stochastic noise (Wiener process Οƒ=2%), dual respiration model (tilapia polynomial RΒ²=0.99 + allometric fallback), size-dependent feeding rates, feeding response behavior
Disease SEIR compartmental model with immunity waning (R→S at 1/30 per day), temperature-dependent pathogen virulence, stress-triggered outbreaks, 4 treatment options + prophylactic vaccination (works without active disease)
Economics Ornstein-Uhlenbeck stochastic feed pricing, seasonal market multipliers (Christmas +15%, Lent +10%, mid-year dip -5%), marginal cost tracking, weight-dependent fish valuation with market premium curve, detailed cost breakdown (7 categories)
Weather Diel temperature/solar cycle, seasonal storm probability (3Γ— during monsoon), Beaufort wind scale, humidity-driven evaporation
Events Equipment failures, power outages, algae blooms, feed shortages, market crashes β€” all wired to appropriate subsystems

Key Equations

Growth (bioenergetic, from KAUST/FAO research):

dW/dt = [h·π·fΒ·bΒ·(1-a)Β·Ο„(T)Β·Οƒ(DO)Β·v(UIA)] Γ— W^0.6277 - [k_minΒ·e^(sΒ·(T-T_min))] Γ— W^0.8373

DO mass balance (10 sub-steps/hour for stability):

dDO/dt = P_photo - FRΒ·biomass/V - 4.57Β·K_NRΒ·TAN - DO_water + K_aΒ·(DO_sat-DO) + A_mech + Q_exΒ·(DO_in-DO)

Tilapia respiration polynomial (RΒ²=0.99, valid 20-200g, 24-32Β°C):

FR = 2014.45 + 2.75W - 165.2T + 0.007WΒ² + 3.93TΒ² - 0.21WT

Ammonia toxicity:

UIA = TAN / (1 + 10^(pKa - pH)),  pKa = 0.09018 + 2729.92/(T + 273.15)

Stochastic feed price (Ornstein-Uhlenbeck):

dp = ΞΊ(ΞΌ - p)dt + σ·dW,  bounded to Β±40% of mean

Environment Design

Action Space (6 continuous controls)

{
  "feeding_rate": 0.0-1.0,        // Feed intensity (growth vs ammonia trade-off)
  "aeration_rate": 0.0-1.0,       // Oxygen injection (DO vs electricity cost)
  "heater_setting": -1.0-1.0,     // Temperature control (growth vs energy)
  "water_exchange_rate": 0.0-0.1,  // Fresh water (dilution vs water cost)
  "harvest_decision": true/false,   // Harvest all fish (ends episode)
  "treatment": "none/antibiotics/salt/probiotics/vaccination"
}

Observation Space (47 fields, partial observability)

The agent sees sensor readings (temperature, DO, pH, TAN, UIA, NO2, NO3, water quality score, nighttime DO crash risk), fish status (weight, population, mortality, feeding response, stress, FCR, SGR, growth rate, stocking density, survival rate), economics (costs, fish value, profit, feed price, market multiplier, ROI, marginal cost), weather (forecast, daytime, storm, humidity, day of year), equipment status, disease behavioral signals, and event alerts.

Disease infection count is hidden β€” the agent must infer disease from behavioral indicators (mortality spikes + feeding refusal + elevated stress β†’ disease_suspected flag).

Inference Agent Architecture

The LLM inference agent uses a dual-mode architecture:

  1. LLM mode: Domain-expert system prompt with full situational awareness (all 30+ observation fields, trend analysis, harvest advisories)
  2. Heuristic fallback: Rule-based agent that handles the critical cascades correctly when LLM is unavailable or time-constrained
  3. Adaptive call frequency: More LLM calls during crises (every step), fewer during stable periods (every 4-6 hours)
  4. Smart time budgeting: Proportional allocation across tasks, automatic fallback to heuristic when time runs low

12 Tasks (Easy β†’ Extreme)

Easy (3 tasks) β€” Learn one control

Task Hours Challenge
feeding_basics 168 Feed fish to 55g+ with FCR < 2.0, zero deaths
oxygen_management 72 Keep DO > 5.0 during hot weather (35Β°C air)
water_quality_balance 168 Maintain all water parameters simultaneously

Medium (4 tasks) β€” Multi-concern + events

Task Hours Challenge
temperature_stress 120 Survive a 3-day heat wave (38Β°C)
ammonia_crisis 72 Biofilter failure β€” manage rising ammonia
disease_outbreak 240 Detect and treat disease before 10% mortality
growth_optimization 336 Maximize growth while maintaining water quality

Hard (3 tasks) β€” Full lifecycle + compound events

Task Hours Challenge
full_growout 1440 60-day grow-out: 20g β†’ 400g market weight
storm_response 120 Severe storm + 12h power outage + biofilter recovery
multi_objective 720 Pareto-optimize profit Γ— welfare Γ— environment

Extreme (2 tasks) β€” Frontier-model difficulty

Task Hours Challenge
catastrophe_prevention 336 5 compound crises in 14 days (algae bloom β†’ aerator failure β†’ disease β†’ market crash β†’ feed shortage)
season_management 2160 Full 90-day season with random events, ROI optimization

Quick Start

Setup

git clone https://github.com/Rahul-Rajpurohitk/Agentic-Reinforcement-Learning.git
cd Agentic-Reinforcement-Learning
pip install -r requirements.txt
pip install -e .

# Run tests (304 tests)
pytest tests/ -v

# Start environment server
uvicorn src.agentic_rl.server.app:app --port 8000

Docker

docker build -t fish-farm-env .
docker run -p 8000:8000 fish-farm-env

Run Inference

# Option 1: OpenAI-compatible API
export API_BASE_URL=https://api.openai.com/v1
export MODEL_NAME=gpt-4o
export OPENAI_API_KEY=your_key
python inference.py

# Option 2: HuggingFace Inference (auto-fallback)
export MODEL_NAME=Qwen/Qwen2.5-72B-Instruct
export HF_TOKEN=hf_xxx
python inference_local.py

# Option 3: Heuristic only (no API key needed)
python inference_local.py --heuristic-only

API Endpoints

Endpoint Method Description
/health GET Health check
/tasks GET List all 12 tasks with action schema
/reset POST Start episode {"task_id": "feeding_basics"}
/step POST Submit action, get observation
/state GET Internal state (ground truth for grading)
/grader POST Grade a completed episode
/baseline POST Run constant-action baseline
/docs GET Interactive Swagger docs

Project Structure

β”œβ”€β”€ openenv.yaml              # OpenEnv spec (spec_version: 1, type: space)
β”œβ”€β”€ inference.py              # LLM agent + heuristic fallback (< 20 min on 2 vCPU/8GB)
β”œβ”€β”€ server/app.py             # OpenEnv multi-mode entry point (uv run server)
β”œβ”€β”€ Dockerfile                # Container spec
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ src/agentic_rl/
β”‚   β”œβ”€β”€ constants.py          # All biological/physical/economic constants (frozen dataclasses)
β”‚   β”œβ”€β”€ models.py             # FarmAction (6 controls), FarmObservation (30+ fields), FarmState
β”‚   β”œβ”€β”€ tasks.py              # 12 task scenarios (easy β†’ extreme)
β”‚   β”œβ”€β”€ rewards.py            # Task-weighted reward function (10 component keys) + growth-stage scaling
β”‚   β”œβ”€β”€ engine/
β”‚   β”‚   β”œβ”€β”€ water_quality.py  # DO mass balance, nitrification, denitrification, evaporation, photosynthesis
β”‚   β”‚   β”œβ”€β”€ fish_biology.py   # Bioenergetic growth, dual respiration, stress, mortality
β”‚   β”‚   β”œβ”€β”€ disease.py        # SEIR epidemic, immunity waning, temperature virulence, vaccination
β”‚   β”‚   β”œβ”€β”€ economics.py      # OU feed pricing, seasonal markets, cost breakdown, ROI
β”‚   β”‚   β”œβ”€β”€ weather.py        # Diel cycle, seasonal storms, wind/humidity
β”‚   β”‚   β”œβ”€β”€ events.py         # Event scheduler (equipment, disease, storms, prices)
β”‚   β”‚   └── simulator.py      # Orchestrator (9-step coupling order)
β”‚   └── server/
β”‚       β”œβ”€β”€ environment.py    # FishFarmEnvironment (OpenEnv interface)
β”‚       └── app.py            # FastAPI server + custom endpoints
β”œβ”€β”€ graders/
β”‚   β”œβ”€β”€ base_grader.py        # BaseGrader + GradeResult
β”‚   └── farm_graders.py       # 12 task-specific graders with partial credit
β”œβ”€β”€ rewards/
β”‚   β”œβ”€β”€ base_reward.py        # BaseReward interface
β”‚   └── example_rewards.py    # 5 reward functions (survival, WQ, growth, profit, composite)
β”œβ”€β”€ training/
β”‚   └── train_grpo.py         # GRPO fine-tuning template (requires GPU)
β”œβ”€β”€ .github/workflows/ci.yml  # CI: tests + lint + Docker + OpenEnv validate
β”œβ”€β”€ tests/                    # 304 tests (2.0s)
β”‚   β”œβ”€β”€ test_water_quality.py # DO, TAN, UIA, denitrification, evaporation, temperature
β”‚   β”œβ”€β”€ test_fish_biology.py  # Growth, mortality, stress, respiration, size-feeding
β”‚   β”œβ”€β”€ test_disease.py       # SEIR dynamics, treatments, vaccination, immunity, temperature
β”‚   β”œβ”€β”€ test_economics.py     # Costs, stochastic pricing, seasonal markets, ROI, breakdown
β”‚   β”œβ”€β”€ test_simulator.py     # Integration, observations, heuristic, stochastic growth, nighttime DO risk, vaccination prophylaxis, cost breakdown, harvest revenue
β”‚   β”œβ”€β”€ test_constants.py     # Parameter sanity, utility functions (32 tests)
β”‚   β”œβ”€β”€ test_tasks_grader.py  # Task definitions, all 12 graders
β”‚   β”œβ”€β”€ test_rewards.py       # All reward component keys, delta rewards, disease/harvest, nighttime DO risk, growth-stage scaling
β”‚   β”œβ”€β”€ test_models.py        # Action/Observation/State model validation
β”‚   └── test_endpoints.py     # /tasks, /grader, /baseline API endpoints
└── docs/
    └── knowledge-base/       # 4,400+ lines of aquaculture research (40+ citations)

Heuristic Agent Scores

The built-in heuristic agent (rule-based, no LLM needed) scores well across all 12 tasks, demonstrating that the graders produce meaningful signal:

Task Difficulty Score Strategy
oxygen_management Easy 1.000 Proactive aeration, nighttime DO crash prevention
disease_outbreak Medium 0.990 Early vaccination before disease onset
storm_response Hard 0.981 Pre-storm DO supersaturation + minimal feeding during outage
ammonia_crisis Medium 0.900 Aggressive water exchange + feeding reduction
feeding_basics Easy 0.857 Growth-stage feeding with size-dependent rates
water_quality_balance Easy 0.839 Balanced aeration + exchange with cost awareness
catastrophe_prevention Extreme 0.817 Survive through compound crises, harvest after engagement
temperature_stress Medium 0.703 Active cooling + reduced metabolic load
growth_optimization Medium 0.559 Moderate feeding for growth + FCR balance
multi_objective Hard 0.469 Balanced management over 180+ hours
full_growout Hard 0.446 Cost management over 60-day cycle
season_management Extreme 0.362 Feed conservation + end-of-season harvest
Average 0.744

The difficulty gradient is clear: Easy (0.899) β†’ Medium (0.788) β†’ Hard (0.632) β†’ Extreme (0.589). Hard and extreme tasks present genuinely difficult optimization challenges where an LLM agent with multi-step reasoning should significantly outperform rule-based heuristics. Graders enforce engagement β€” early termination gaming is penalized.


Research Foundation

Built on 4,400+ lines of research across 40+ citations:

  • Growth model: FAO bioenergetic equations for Nile Tilapia (Oreochromis niloticus)
  • Respiration: Tilapia-specific polynomial (RΒ²=0.99) from controlled feeding experiments
  • Water chemistry: DO mass balance, ammonia equilibrium (Emerson et al. 1975), two-stage nitrification
  • Disease: SEIR compartmental model with temperature-dependent virulence and immunity waning
  • Economics: Ornstein-Uhlenbeck stochastic pricing, seasonal demand curves, real industry cost structures (feed = 50-70% of OpEx)
  • RL baseline: Chahid et al. 2021 (Q-learning: 79% feed reduction, zero mortality)

Author

Rahul Rajpurohit β€” Solo entry, Meta PyTorch OpenEnv Hackathon 2026