ESPResso V2: Water Footprint Prediction Model

Cross-Attention Geo-Aware Network for predicting product-level water footprints in textiles. 469K parameters, 3 output heads (raw materials, processing, packaging). Achieves R2=0.969 on test set. Geography is the dominant signal -- AWARE water stress factors create 40-100x variance across manufacturing locations. Developed at the University of Amsterdam.

Architecture

Input Features
    |
    +-- MaterialEncoder ---------> mat_emb [B, 5, 64]
    |     emb(32) + log_weight + pct -> MLP(64) + 2-head self-attn + LN
    |
    +-- StepEncoder -------------> step_emb [B, 27, 64]
    |     emb(24) + sinusoidal position(4) -> MLP(64)
    |
    +-- LocationEncoder ---------> loc_emb [B, 8, 64]
    |     emb(32) + sincos coords(4) -> MLP(64)
    |
    +-- ProductEncoder ----------> product_emb [B, 48]
    |     cat_emb(16) + subcat_emb(16) + log_weight + 5 mask flags
    |
    +-- PackagingEncoder --------> pkg_emb [B, 32]
          3 log_masses + 3 category embeddings(16 each) -> MLP(32)
    |
    v
GeoAttentionBlock (materials x locations)
    2 layers, 4 heads, d_model=64
    + ConfidenceGate (MLP, bias_init=-2.0)
    |
GeoAttentionBlock (steps x locations)
    2 layers, 4 heads, d_model=64
    + ConfidenceGate (MLP, bias_init=-2.0)
    |
    v
Mean Pool -> Concatenate [mat(64) + step(64) + product(48) + pkg(32) = 208]
    |
Residual Trunk (128-dim, 2 blocks, LN + Linear + GELU + Dropout)
    |
    +-- head_raw_materials -----> pred [B, 1]  (MLP 128->64->1)
    +-- head_processing --------> pred [B, 1]  (MLP 128->64->1)
    +-- head_packaging ---------> pred [B, 1]  (MLP 128->64->1)

Key Design Decisions

  • GeoAttentionBlock: Two stacked cross-attention layers (4 heads each) where materials/steps attend to location keys. The same location encoder output serves as both keys and values, creating a shared geographic representation.
  • Confidence Gate: MLP gate (d_model+1 -> hidden(16) -> sigmoid) blends cross-attention output with a learned prior embedding. Gate bias initialized to -2.0 (starting at ~0.12, prior-heavy). When geographic info is available and attention quality is high, the gate opens; when locations are masked, the model falls back to the prior. Critical because geography drives 40-100x variance in water footprint via AWARE factors.
  • Linked Journey Mode: During training, 50% of batches use linked location representations (origin-to-processing journey order), 50% use unlinked sets. Teaches the model to handle both sequential supply chains and unordered location inventories.
  • Auxiliary Weight Prediction: Dedicated head predicts product weight from category and material features only (alpha=0.3). Regularizes representations, improves robustness when weight is unavailable at lower tiers.

Results

Overall test set:

Metric Value
R2 0.969
MAE 0.587 m3 world-eq
MAPE 15.8%

Per-component:

Component MAE R2 MAPE
Raw materials 0.470 m3 0.970 39.3%
Processing 0.174 m3 0.946 21.6%
Packaging 0.0001 m3 0.962 --

Raw materials MAPE (39.3%) is higher than the carbon model equivalent (6.4%) due to multiplicative AWARE factors: small geographic attribution errors are amplified by 40-100x characterization factor ranges. R2 remains high (0.970) because dominant patterns are captured.

Tier Degradation

Six data availability tiers (A through F). Overall degradation factor: 2.4x. The confidence gates are critical -- when location data is unavailable (Tiers A-B), gates close (~0.12) and the model relies on learned prior embeddings rather than noisy cross-attention output.

Tier distribution during training: A:35%, B:25%, C:15%, D:10%, E:10%, F:5%. Heavily skewed toward degraded tiers (A+B=60%) compared to carbon model (25%), reflecting that geography is so dominant the model must excel at graceful degradation.

Training Configuration

Parameter Value
Optimizer AdamW
Learning rate 5e-4, cosine schedule
Batch size 1024
Weight decay 0.01
Curriculum warmup 20 epochs
Subcategory mask 15% independent dropout
Target transform log1p + z-score normalization
Epochs 105
Parameters ~469,000

Loss Function

UW-SO (Uncertainty-Weighted Softmax) with learnable log-variance scalars clamped to [-4, 4]:

  • Raw materials: MSE
  • Processing: MSE
  • Packaging: Huber (delta=1.5) -- near-constant values, Huber prevents outlier instability
  • Auxiliary weight: MSE (alpha=0.3, weight-available samples only)

Training Data

Dataset: Tr4m0ryp/espresso-v2-carbon-water-data

49,732 records, 70/15/15 train/val/test split stratified by category.

Limitations

  • Trained on synthetic data; not a substitute for formal LCA
  • AWARE factor precision limited by country-level granularity (sub-national water stress varies)
  • Raw materials MAPE is high (39.3%) due to AWARE amplification of geographic errors
  • Covers 47 textile product categories; not designed for non-textile products
  • Requires the same preprocessing pipeline (log1p + z-score) used during training

Citation

@misc{espresso-v2-2026,
  title={ESPResso V2: LLM-Orchestrated Synthetic Data Pipeline and Neural Estimation of Product-Level Carbon and Water Footprints in Textiles},
  author={Ouallaf, Moussa},
  year={2026},
  institution={University of Amsterdam},
  url={https://github.com/tr4m0ryp/ESPResso-V2}
}

License

CC BY-SA 4.0.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Tr4m0ryp/espresso-v2-water-model