ESPResso V2: Water Footprint Prediction Model
Cross-Attention Geo-Aware Network for predicting product-level water footprints in textiles. 469K parameters, 3 output heads (raw materials, processing, packaging). Achieves R2=0.969 on test set. Geography is the dominant signal -- AWARE water stress factors create 40-100x variance across manufacturing locations. Developed at the University of Amsterdam.
Architecture
Input Features
|
+-- MaterialEncoder ---------> mat_emb [B, 5, 64]
| emb(32) + log_weight + pct -> MLP(64) + 2-head self-attn + LN
|
+-- StepEncoder -------------> step_emb [B, 27, 64]
| emb(24) + sinusoidal position(4) -> MLP(64)
|
+-- LocationEncoder ---------> loc_emb [B, 8, 64]
| emb(32) + sincos coords(4) -> MLP(64)
|
+-- ProductEncoder ----------> product_emb [B, 48]
| cat_emb(16) + subcat_emb(16) + log_weight + 5 mask flags
|
+-- PackagingEncoder --------> pkg_emb [B, 32]
3 log_masses + 3 category embeddings(16 each) -> MLP(32)
|
v
GeoAttentionBlock (materials x locations)
2 layers, 4 heads, d_model=64
+ ConfidenceGate (MLP, bias_init=-2.0)
|
GeoAttentionBlock (steps x locations)
2 layers, 4 heads, d_model=64
+ ConfidenceGate (MLP, bias_init=-2.0)
|
v
Mean Pool -> Concatenate [mat(64) + step(64) + product(48) + pkg(32) = 208]
|
Residual Trunk (128-dim, 2 blocks, LN + Linear + GELU + Dropout)
|
+-- head_raw_materials -----> pred [B, 1] (MLP 128->64->1)
+-- head_processing --------> pred [B, 1] (MLP 128->64->1)
+-- head_packaging ---------> pred [B, 1] (MLP 128->64->1)
Key Design Decisions
- GeoAttentionBlock: Two stacked cross-attention layers (4 heads each) where materials/steps attend to location keys. The same location encoder output serves as both keys and values, creating a shared geographic representation.
- Confidence Gate: MLP gate (d_model+1 -> hidden(16) -> sigmoid) blends cross-attention output with a learned prior embedding. Gate bias initialized to -2.0 (starting at ~0.12, prior-heavy). When geographic info is available and attention quality is high, the gate opens; when locations are masked, the model falls back to the prior. Critical because geography drives 40-100x variance in water footprint via AWARE factors.
- Linked Journey Mode: During training, 50% of batches use linked location representations (origin-to-processing journey order), 50% use unlinked sets. Teaches the model to handle both sequential supply chains and unordered location inventories.
- Auxiliary Weight Prediction: Dedicated head predicts product weight from category and material features only (alpha=0.3). Regularizes representations, improves robustness when weight is unavailable at lower tiers.
Results
Overall test set:
| Metric | Value |
|---|---|
| R2 | 0.969 |
| MAE | 0.587 m3 world-eq |
| MAPE | 15.8% |
Per-component:
| Component | MAE | R2 | MAPE |
|---|---|---|---|
| Raw materials | 0.470 m3 | 0.970 | 39.3% |
| Processing | 0.174 m3 | 0.946 | 21.6% |
| Packaging | 0.0001 m3 | 0.962 | -- |
Raw materials MAPE (39.3%) is higher than the carbon model equivalent (6.4%) due to multiplicative AWARE factors: small geographic attribution errors are amplified by 40-100x characterization factor ranges. R2 remains high (0.970) because dominant patterns are captured.
Tier Degradation
Six data availability tiers (A through F). Overall degradation factor: 2.4x. The confidence gates are critical -- when location data is unavailable (Tiers A-B), gates close (~0.12) and the model relies on learned prior embeddings rather than noisy cross-attention output.
Tier distribution during training: A:35%, B:25%, C:15%, D:10%, E:10%, F:5%. Heavily skewed toward degraded tiers (A+B=60%) compared to carbon model (25%), reflecting that geography is so dominant the model must excel at graceful degradation.
Training Configuration
| Parameter | Value |
|---|---|
| Optimizer | AdamW |
| Learning rate | 5e-4, cosine schedule |
| Batch size | 1024 |
| Weight decay | 0.01 |
| Curriculum warmup | 20 epochs |
| Subcategory mask | 15% independent dropout |
| Target transform | log1p + z-score normalization |
| Epochs | 105 |
| Parameters | ~469,000 |
Loss Function
UW-SO (Uncertainty-Weighted Softmax) with learnable log-variance scalars clamped to [-4, 4]:
- Raw materials: MSE
- Processing: MSE
- Packaging: Huber (delta=1.5) -- near-constant values, Huber prevents outlier instability
- Auxiliary weight: MSE (alpha=0.3, weight-available samples only)
Training Data
Dataset: Tr4m0ryp/espresso-v2-carbon-water-data
49,732 records, 70/15/15 train/val/test split stratified by category.
Limitations
- Trained on synthetic data; not a substitute for formal LCA
- AWARE factor precision limited by country-level granularity (sub-national water stress varies)
- Raw materials MAPE is high (39.3%) due to AWARE amplification of geographic errors
- Covers 47 textile product categories; not designed for non-textile products
- Requires the same preprocessing pipeline (log1p + z-score) used during training
Citation
@misc{espresso-v2-2026,
title={ESPResso V2: LLM-Orchestrated Synthetic Data Pipeline and Neural Estimation of Product-Level Carbon and Water Footprints in Textiles},
author={Ouallaf, Moussa},
year={2026},
institution={University of Amsterdam},
url={https://github.com/tr4m0ryp/ESPResso-V2}
}
License
CC BY-SA 4.0.