Text Generation
Transformers
Safetensors
English
qwen3
hydrology
agent
tool-use
grpo
reinforcement-learning
ef5
crest
function-calling
conversational
text-generation-inference
Instructions to use anonymousOwl/HydroAgent with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use anonymousOwl/HydroAgent with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="anonymousOwl/HydroAgent") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("anonymousOwl/HydroAgent") model = AutoModelForCausalLM.from_pretrained("anonymousOwl/HydroAgent") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use anonymousOwl/HydroAgent with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "anonymousOwl/HydroAgent" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "anonymousOwl/HydroAgent", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/anonymousOwl/HydroAgent
- SGLang
How to use anonymousOwl/HydroAgent with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "anonymousOwl/HydroAgent" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "anonymousOwl/HydroAgent", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "anonymousOwl/HydroAgent" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "anonymousOwl/HydroAgent", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use anonymousOwl/HydroAgent with Docker Model Runner:
docker model run hf.co/anonymousOwl/HydroAgent
| license: mit | |
| base_model: Qwen/Qwen3-4B-Instruct-2507 | |
| language: | |
| - en | |
| library_name: transformers | |
| pipeline_tag: text-generation | |
| tags: | |
| - hydrology | |
| - agent | |
| - tool-use | |
| - grpo | |
| - reinforcement-learning | |
| - qwen3 | |
| - ef5 | |
| - crest | |
| - function-calling | |
| datasets: | |
| - anonymousOwl/HydroAgent-dataset | |
| # HydroAgent β Qwen3-4B-Instruct fine-tuned for hydrologic model calibration | |
| **HydroAgent** is a tool-using language model that calibrates the | |
| [EF5/CREST](https://github.com/HyDROSLab/EF5) distributed hydrologic model. | |
| Given a USGS streamflow gage and a precipitation-driven simulation, the agent | |
| iteratively proposes physically plausible parameter sets, runs the simulator, | |
| inspects the resulting NSE / peak / volume metrics, and revises until the | |
| model fits the observations. | |
| This release is the **GRPO step-100 checkpoint** of the SFT + RL pipeline | |
| described in [chrimerss/HydroLLM](https://github.com/chrimerss/HydroLLM). | |
| - **Base model:** [`Qwen/Qwen3-4B-Instruct-2507`](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) | |
| - **Training:** full fine-tuning, BF16, FSDP, no LoRA | |
| - **RL framework:** [verl 0.5](https://github.com/volcengine/verl) GRPO with [SGLang](https://github.com/sgl-project/sglang) rollouts | |
| - **Tool format:** Hermes-style `<tool_call>` JSON (Qwen3-Instruct native) | |
| - **Hardware:** 4Γ H100, ~30 min/step, K=6 rollouts Γ max 50 multi-turn calls | |
| ## How the agent works | |
| The model has access to three tools and runs a multi-turn calibration loop: | |
| | Tool | Purpose | | |
| |---|---| | |
| | `set_parameters` | Set 11 tunable CREST multipliers: `wm`, `b`, `im`, `ke`, `fc`, `under`, `leaki`, `alpha`, `beta`, `alpha0`, `iwu` | | |
| | `run_simulation` | Execute EF5 with the current parameters and produce a hydrograph | | |
| | `evaluate` | Score the latest run vs. observations: NSE, CC, KGE, peak ratio, lag | | |
| Each rollout typically follows: `set_parameters β run_simulation β evaluate β set_parameters β β¦` | |
| until NSE plateaus or the agent runs out of turns. Inputs to the agent are a | |
| short system prompt describing the calibration task and a per-gage user | |
| message with watershed metadata (basin area, lat/lon, time window). | |
| ## Training data | |
| Training calibrates the agent on **10 CONUS USGS gages** (basin areas | |
| 539 β 2401 kmΒ²), each driven by **MRMS 1 km hourly precipitation** and | |
| **hourly USGS streamflow observations** from 60-day windows selected to | |
| contain a clear flood event (rising + receding limbs, edge-buffered). | |
| | Gage ID | Basin (kmΒ²) | Lat | Lon | Window (UTC) | | |
| |---|---:|---:|---:|---| | |
| | 11383500 | 539 | 40.0140 | -121.9483 | 2018-05-19 β 2018-07-17 | | |
| | 11043000 | 575 | 33.4798 | -117.1439 | 2019-03-15 β 2019-05-13 | | |
| | 11152000 | 632 | 36.2805 | -121.3227 | 2018-05-29 β 2018-07-27 | | |
| | 02294781 | 1064 | 27.8245 | -81.8017 | 2018-04-29 β 2018-06-27 | | |
| | 02312000 | 1476 | 28.4800 | -82.1776 | 2018-11-15 β 2019-01-13 | | |
| | 07195430 | 1489 | 36.1086 | -94.5333 | 2018-01-04 β 2018-03-04 | | |
| | 11179000 | 1639 | 37.5871 | -121.9608 | 2018-06-03 β 2018-08-01 | | |
| | 14301000 | 1727 | 45.7040 | -123.7554 | 2018-09-11 β 2018-11-09 | | |
| | 14207500 | 1828 | 45.3507 | -122.6762 | 2018-04-09 β 2018-06-07 | | |
| | 11376000 | 2401 | 40.3871 | -122.2386 | 2018-09-21 β 2018-11-19 | | |
| **Held-out evaluation gages** (never seen during training): | |
| | Gage ID | Basin (kmΒ²) | Lat | Lon | Window (UTC) | | |
| |---|---:|---:|---:|---| | |
| | 02338660 | 329 | 33.2357 | -84.9876 | 2018-07-01 β 2018-08-31 | | |
| | 01403060 | 2033 | 40.5511 | -74.5483 | 2018-11-11 β 2019-01-09 | | |
| | 06279500 | 40792 | 44.7585 | -108.1816 | 2018-06-13 β 2018-08-11 | | |
| | 07144100 | 3209 | 37.8831 | -97.4245 | 2019-03-30 β 2019-05-28 | | |
| The full training dataset β CONUS terrain rasters, per-gage MRMS hourly | |
| precipitation clips, USGS hourly streamflow observations, daily PET, the | |
| EF5 control template, and the 73 GPT-4o calibration trajectories that seed | |
| the SFT phase β is published as | |
| [**anonymousOwl/HydroAgent-dataset**](https://huggingface.co/datasets/anonymousOwl/HydroAgent-dataset). | |
| See that repo's README for the per-folder layout and provenance. | |
| ## Reward | |
| Two reward layers shape the policy: | |
| **Per-turn (returned by tools):** | |
| | Tool call | Reward | | |
| |---|---| | |
| | `set_parameters` (valid) | `+0.02` | | |
| | `run_simulation` (valid) | `+0.05` | | |
| | `evaluate` (valid) | `ΞNSE` (this turn β previous best) | | |
| | Any tool (invalid) | `β0.5` | | |
| **Terminal (returned at end of trajectory):** | |
| | Component | Value | | |
| |---|---| | |
| | Best NSE (clipped) | `[β1, 1]` | | |
| | Target-met bonus | `+0.5` if best NSE > gage target | | |
| | Iteration bonus | `+0.02 Γ n_evaluates` | | |
| | Improvement bonus | `+0.10 Γ max(0, n_improvements β 1)` | | |
| | Empty-trajectory penalty | `β1.0` | | |
| ## GRPO settings | |
| | Setting | Value | | |
| |---|---| | |
| | Algorithm | GRPO (group-relative advantages) | | |
| | K (rollouts per prompt) | 6 | | |
| | Train batch size | 4 prompts (24 trajectories per step) | | |
| | Max assistant turns | 50 | | |
| | Learning rate | 1e-6 with 5% warmup | | |
| | Entropy coefficient | 0.01 | | |
| | KL loss coefficient | 0.05 (anchored to base policy) | | |
| | Sampling | `temperature=1.0`, `top_p=0.95` | | |
| | Steps in this checkpoint | **100** | | |
| ## Quick start | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| repo = "anonymousOwl/HydroAgent" | |
| tokenizer = AutoTokenizer.from_pretrained(repo, trust_remote_code=True) | |
| model = AutoModelForCausalLM.from_pretrained(repo, torch_dtype="bfloat16", device_map="auto") | |
| ``` | |
| The model emits Hermes-style tool calls, e.g.: | |
| ``` | |
| <tool_call> | |
| {"name": "set_parameters", "arguments": {"wm": 1.0, "b": 1.0, "im": 0.5, ...}} | |
| </tool_call> | |
| ``` | |
| Parse with `tokenizer.apply_chat_template(..., tools=HYDRO_TOOLS)` and | |
| dispatch each call to your EF5 sandbox. See | |
| [`modal_app/eval.py`](https://github.com/chrimerss/HydroLLM/blob/main/modal_app/eval.py) | |
| for a reference SGLang loop with retry-on-parse-failure logic. | |
| For full reproduction (image, EF5 binary, multi-turn rollout, reward | |
| computation), use the | |
| [HydroLLM repository](https://github.com/chrimerss/HydroLLM). | |
| ## Limitations | |
| - Trained on **10 small/medium CONUS basins** (β€ 2401 kmΒ²) over short flood | |
| windows. Generalization to large basins (> 3000 kmΒ²), arid catchments, or | |
| out-of-CONUS regions is unverified. | |
| - Calibrates **CREST parameter multipliers only** β does not modify routing, | |
| initial conditions, or sub-basin structure. | |
| - The agent depends on a working EF5 toolchain; the weights alone do not | |
| perform calibration without the simulation environment in the loop. | |
| - This is a research checkpoint, not a production tool. NSE on held-out | |
| gages varies substantially with basin and event. | |
| ## License | |
| MIT β same as the upstream [HydroLLM repository](https://github.com/chrimerss/HydroLLM) | |
| and the base [Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507). | |
| ## Citation | |
| ```bibtex | |
| @software{hydrollm2026, | |
| title = {HydroLLM: Reinforcement Learning Fine-Tuning of LLMs with Hydrologic Simulation Feedback}, | |
| year = {2026}, | |
| url = {https://github.com/chrimerss/HydroLLM} | |
| } | |
| ``` | |
| ## Acknowledgement | |
| Compute for this research was sponsored by [Modal](https://modal.com). | |