Add images and visual enhancements to README

Browse files

Files changed (9) hide show

.gitattributes +4 -0
README.md +283 -0
config.json +81 -0
lacuna-end.png +3 -0
lacuna.png +3 -0
model.safetensors +3 -0
normalization_stats.npz +3 -0
trades.csv +3 -0
updates.csv +3 -0

.gitattributes ADDED Viewed

	@@ -0,0 +1,4 @@

+*.png filter=lfs diff=lfs merge=lfs -text
+*.csv filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,283 @@

+---
+license: mit
+tags:
+  - reinforcement-learning
+  - ppo
+  - trading
+  - prediction-markets
+  - polymarket
+  - crypto
+  - cross-market
+  - temporal-encoding
+language:
+  - en
+library_name: pytorch
+pipeline_tag: reinforcement-learning
+thumbnail: lacuna.png
+---
+<div align="center">
+![LACUNA](lacuna.png)
+# LACUNA
+**Cross-Market Data Fusion for Prediction Market Trading**
+[![Live Results](https://img.shields.io/badge/Live%20Results-humanplane.com%2Flacuna-blue)](https://humanplane.com/lacuna)
+[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
+[![PyTorch](https://img.shields.io/badge/PyTorch-2.0+-red.svg)](https://pytorch.org/)
+</div>
+---
+An experiment in cross-market data fusion. A reinforcement learning agent trained to trade Polymarket's 15-minute crypto prediction markets by fusing Binance futures order flow with Polymarket orderbook data.
+**The thesis**: read the "fast" market (Binance) and trade the "slow" market (Polymarket) before the price adjusts.
+## Results
+| Metric | Value |
+|--------|-------|
+| Total PnL | $50,195 |
+| Return on Exposure | 2,510% |
+| Sharpe Ratio | 4.13 |
+| Profit Factor | 1.21 |
+| Total Trades | 29,176 |
+| Win Rate | 23.9% |
+| Runtime | ~10 hours |
+### Learning Progression
+The model genuinely learned profitable strategies through reinforcement learning:
+| Phase | Avg PnL/Trade | Win Rate |
+|-------|---------------|----------|
+| Early | +$1.29 | 23.6% |
+| Late | +$2.15 | 24.2% |
+**+0.6% win rate improvement** and **+$0.85 avg PnL improvement** per trade.
+### Performance by Asset
+| Asset | PnL | Trades | Win Rate |
+|-------|-----|--------|----------|
+| BTC | +$38,794 | 8,257 | 32.6% |
+| ETH | +$9,978 | 7,859 | 27.0% |
+| SOL | +$1,752 | 6,310 | 16.3% |
+| XRP | -$328 | 6,750 | 16.7% |
+## Architecture
+LACUNA (v5) uses a temporal PPO architecture:
+- **Temporal Encoder**: Sees last 5 states instead of just the present
+- **Asymmetric Actor-Critic**: Separate networks for policy and value
+- **Feature Normalization**: Stabilizes training across different market conditions
+### Model Constraints
+- Fixed position size: $500 per trade
+- Max exposure: $2,000 (up to 4 concurrent positions)
+- Markets: 15-minute crypto prediction markets (BTC, ETH, SOL, XRP)
+### Observation Space (18 dimensions)
+Fuses data from two sources into an 18-dimensional state:
+| Category | Features |
+|----------|----------|
+| Momentum | 1m/5m/10m returns |
+| Order flow | L1/L5 imbalance, trade flow, CVD acceleration |
+| Microstructure | Spread %, trade intensity, large trade flag |
+| Volatility | 5m vol, vol expansion ratio |
+| Position | Has position, side, PnL, time remaining |
+| Regime | Vol regime, trend regime |
+## Training Evolution
+Five phases over three days. Each taught us something. Only the last earned a name.
+### Phase 1: Shaped Rewards (Failed)
+**Duration**: ~52 min | **Trades**: 1,545 | **Result**: Policy collapse
+Started with micro-bonuses to guide learning:
+- +0.002 for trading with momentum
+- +0.001 for larger positions
+- -0.001 for fighting momentum
+**What happened**: Entropy collapsed from 1.09 → 0.36. The agent learned to game the reward function—collect bonuses while ignoring actual profitability. Buffer showed 90% win rate while real trade win rate was 20%.
+**Lesson**: Reward shaping is risky. When shaping rewards are gameable and similar magnitude to the real signal, agents optimize the wrong thing.
+### Phase 2: Pure Realized PnL
+**Duration**: ~1 hour | **Trades**: 2,000+ | **Result**: 55% ROI
+Stripped everything back:
+- Reward ONLY on position close
+- Increased entropy coefficient (0.05 → 0.10)
+- Simplified actions (7 → 3)
+- Smaller buffer (2048 → 512)
+| Update | Entropy | PnL | Win Rate |
+|--------|---------|-----|----------|
+| 1 | 0.68 | $5.20 | 33.3% |
+| 36 | 1.05 | $10.93 | 21.2% |
+Win rate settled at 21%—below random (33%)—but profitable. Binary markets have asymmetric payoffs. (Still using probability-based PnL at this point.)
+### Phase 3: Scaled Up ($50 trades)
+**Duration**: ~50 min | **Trades**: 4,133 | **Result**: -$64 → +$23
+First update hit -$64 drawdown. But the agent recovered:
+| Update | PnL | Win Rate |
+|--------|-----|----------|
+| 1 | -$63.75 | 29.5% |
+| 36 | +$23.10 | 15.6% |
+**Lesson**: The agent can recover from large adverse moves without policy collapse.
+### Phase 4: Share-Based PnL ($500 trades)
+**Duration**: ~1 hour | **Trades**: 4,873 | **Result**: 170% ROI
+Changed reward signal to reflect actual market economics:
+```python
+# Old: probability-based
+pnl = (exit_price - entry_price) * dollars
+# New: share-based
+shares = dollars / entry_price
+pnl = (exit_price - entry_price) * shares
+```
+| Update | PnL | Win Rate |
+|--------|-----|----------|
+| 1 | -$197 | 18.9% |
+| 20 | -$465 | 18.5% |
+| 46 | +$3,392 | 19.0% |
+**4.5x improvement** over Phase 3's reward signal.
+### Phase 5: LACUNA (Final)
+**Duration**: ~10 hours | **Trades**: 29,176 | **Result**: 2,510% ROI
+Architecture rethink:
+- **Temporal encoder**: 5-state history instead of single-frame
+- **Asymmetric actor-critic**: Separate network capacities
+- **Feature normalization**: Stable across market regimes
+It started with a big loss. Seemed broken. Left it running on New Year's Eve while counting down to midnight—not out of hope, just neglect.
+Checked back hours later. The equity curve had inflected. By morning: **+$50,195**.
+Phases 1-4 might have done the same. They just never got the time. Phase 5 got the time it needed—because it had been written off.
+---
+## Emergent Behaviors
+These weren't explicitly rewarded. The model discovered them while optimizing for profit—and we can see them evolve over time.
+| Behavior | Early → Late | What It Learned |
+|----------|--------------|-----------------|
+| **Low volatility specialist** | Consistent | $4.07/trade on calm markets vs -$1.44 on volatile |
+| **Hunts cheap outcomes** | 23% → 39% of trades | Cheap entries yield $8.63/trade vs $1.53 for expensive |
+| **Rides DOWN momentum** | Consistent 77% | Bets DOWN when prob is falling → +$97k net profit |
+| **Fat tail capture** | $5.8k → $20.5k net | Learned to position for asymmetric payoffs |
+| **Recovery after loss streaks** | 47% WR after 3+ losses | Anti-tilt behavior (vs 24% baseline) |
+| **Avg PnL per trade** | $1.62 → $4.30 | 2.7x improvement through genuine learning |
+Consistent throughout: **Cuts winners fast** (0.35x hold time vs losers)—opposite of human intuition, but it works in these markets.
+## Key Takeaways
+1. **Reward shaping is dangerous** - When shaping rewards are gameable and similar magnitude to real signal, agents optimize the wrong thing. Sparse but honest > dense but noisy.
+2. **Reward signal design matters** - Share-based PnL outperformed probability-based by 4.5x ROI. Match actual market economics.
+3. **Entropy coefficient matters** - 0.05 caused policy collapse; 0.10 maintained healthy exploration.
+4. **Watch buffer/trade win rate divergence** - When these diverge, the agent is optimizing the wrong objective.
+5. **Give it time** - Learning and recovery from drawdowns take time. Early performance is not indicative of final results.
+---
+## The Story
+This is our final checkpoint. We're done experimenting with LACUNA, but you don't have to be.
+## Usage
+```python
+import torch
+from safetensors.torch import load_file
+import numpy as np
+import json
+# Load model weights
+weights = load_file("model.safetensors")
+# Load normalization stats (for preprocessing observations)
+stats = np.load("normalization_stats.npz")
+obs_mean = stats["obs_mean"]
+obs_std = stats["obs_std"]
+# Load config for architecture details
+with open("config.json") as f:
+    config = json.load(f)
+# Normalize observations before inference
+def normalize_obs(obs):
+    return (obs - obs_mean) / (obs_std + 1e-8)
+```
+## Files
+- `README.md` - This documentation
+- `config.json` - Model configuration and architecture details
+- `model.safetensors` - Model weights in SafeTensors format
+- `normalization_stats.npz` - Observation normalization statistics
+- `trades.csv` - All 29,176 trades with full details
+- `updates.csv` - Training updates with metrics over time
+## Links
+- [Live Results](https://humanplane.com/lacuna) - Interactive visualization
+- [Training Code](https://github.com/humanplane/cross-market-state-fusion) - GitHub repository
+---
+<div align="center">
+![LACUNA Results](lacuna-end.png)
+*Final equity curve after 10 hours of live trading*
+</div>
+---
+## License
+MIT
+## Citation
+```bibtex
+@misc{lacuna2025,
+  author = {HumanPlane},
+  title = {LACUNA: Cross-Market Data Fusion for Prediction Market Trading},
+  year = {2025},
+  publisher = {HuggingFace},
+  url = {https://huggingface.co/HumanPlane/LACUNA}
+}
+```

config.json ADDED Viewed

	@@ -0,0 +1,81 @@

+{
+  "model_type": "ppo_temporal",
+  "version": "5.0",
+  "name": "LACUNA",
+  "description": "Cross-market data fusion RL agent for prediction market trading",
+  "architecture": {
+    "type": "asymmetric_actor_critic",
+    "temporal_encoder": {
+      "history_length": 5,
+      "hidden_size": 128
+    },
+    "actor": {
+      "hidden_layers": [256, 128],
+      "activation": "tanh"
+    },
+    "critic": {
+      "hidden_layers": [256, 128],
+      "activation": "tanh"
+    }
+  },
+  "observation_space": {
+    "dimensions": 18,
+    "features": [
+      {"name": "return_1m", "category": "momentum"},
+      {"name": "return_5m", "category": "momentum"},
+      {"name": "return_10m", "category": "momentum"},
+      {"name": "l1_imbalance", "category": "order_flow"},
+      {"name": "l5_imbalance", "category": "order_flow"},
+      {"name": "trade_flow", "category": "order_flow"},
+      {"name": "cvd_acceleration", "category": "order_flow"},
+      {"name": "spread_pct", "category": "microstructure"},
+      {"name": "trade_intensity", "category": "microstructure"},
+      {"name": "large_trade_flag", "category": "microstructure"},
+      {"name": "vol_5m", "category": "volatility"},
+      {"name": "vol_expansion", "category": "volatility"},
+      {"name": "has_position", "category": "position"},
+      {"name": "position_side", "category": "position"},
+      {"name": "position_pnl", "category": "position"},
+      {"name": "time_remaining", "category": "position"},
+      {"name": "vol_regime", "category": "regime"},
+      {"name": "trend_regime", "category": "regime"}
+    ]
+  },
+  "action_space": {
+    "type": "discrete",
+    "actions": ["HOLD", "BUY_UP", "BUY_DOWN"]
+  },
+  "training": {
+    "algorithm": "PPO",
+    "entropy_coefficient": 0.10,
+    "learning_rate": 0.0003,
+    "gamma": 0.99,
+    "gae_lambda": 0.95,
+    "clip_range": 0.2,
+    "n_epochs": 10,
+    "batch_size": 64,
+    "buffer_size": 2048
+  },
+  "constraints": {
+    "position_size_usd": 500,
+    "max_exposure_usd": 2000,
+    "max_concurrent_positions": 4,
+    "markets": ["BTC", "ETH", "SOL", "XRP"],
+    "market_type": "15-minute crypto prediction"
+  },
+  "performance": {
+    "total_pnl_usd": 50195,
+    "return_on_exposure_pct": 2510,
+    "sharpe_ratio": 4.13,
+    "profit_factor": 1.21,
+    "total_trades": 29176,
+    "win_rate_pct": 23.9,
+    "runtime_hours": 10
+  }
+}

lacuna-end.png ADDED Viewed

Git LFS Details

SHA256: 931c63d7bfbc589749b0b18b3ce3875519d8aa98b52734284870a332ac10e477
Pointer size: 132 Bytes
Size of remote file: 2.26 MB

lacuna.png ADDED Viewed

Git LFS Details

SHA256: 0cb5832a5c1a2dd0191f59f18ed6844edf33d9b63c163e7ac875b83ae914aa2b
Pointer size: 132 Bytes
Size of remote file: 1.02 MB

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0f910dfb6d82c9a2a660f8f335f976264bc78eb37537799c0306d7b773b8be3f
+size 158112

normalization_stats.npz ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:16ee9885e5d54ab1056e0ec86960823a6a8a125b8c328674afc262e3a798202d
+size 2642

trades.csv ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:52ea3afd7056bdb48d5a392b54186b47da8e28953ab06df23c4dedfde0965eef
+size 4000486

updates.csv ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:aecb2a6f393edd51f7d3a0d4e316e2eba2cdd6df0c029d31b236828555145e38
+size 177966