\section{\ourfm Reference Backbone}
\label{Reference_backbone}

\ourfm is a wildfire-specialized regional backbone trained on fire-relevant multimodal data for wildfire prediction. Existing general-purpose Earth FMs are pretrained for atmospheric and geophysical objectives~\cite{lam2023graphcast}, or for remote-sensing objectives~\cite{reed2023scalemae}, so wildfire-relevant information enters only indirectly through those objectives. In contrast, \ourfm is trained with weather, active-fire observations, topography, vegetation, and static environmental context, so its representation is learned from inputs tied directly to wildfire behavior. This design makes \ourfm a strong wildfire-specific backbone whose features are shaped by signals directly relevant to fire occurrence and spread.
It provides a task-aligned regional model trained directly for wildfire prediction.
It also serves as an empirical anchor for interpreting how transferred Earth FMs behave under matched evaluation contracts. This section describes the data resources and training strategy used to build \ourfm as an in-domain reference backbone. The fixed-contract protocol used to compare it with transferred Earth FMs is defined separately in Section~\ref{sec:eval}.

\subsection{Data Resources}
We group the resources by their role in the study: dynamic weather inputs, occupancy supervision, static context, and event-level resources for supporting tasks. Source and terms-of-use notes for the external data and model assets used in this study are summarized in Appendix Table~\ref{tab:external_assets_licenses}.


\noindent\textbf{Dynamic weather inputs.}
The weather inputs come from a California regional dataset built from NOAA High-Resolution Rapid Refresh (HRRR) fields~\cite{noaa_hrrr_ncei,noaa_hrrr_emc}. The data are placed on a projected 5 km grid in EPSG:5070. Each time map uses weather fields every 6 hours and predicts wildfire occupancy at a 12-hour lead. The variables include near-surface temperature and dew point, wind, CAPE, surface pressure, boundary-layer height, visibility, precipitation rate, and accumulated precipitation.

\noindent\textbf{Occupancy supervision.}
Wildfire supervision comes from NASA FIRMS active-fire detections~\cite{nasa_firms}. The detections are mapped to the same grid as the weather fields. \ourfm is trained on gridded occupancy labels derived from these detections. This defines the occupancy target used by the reference backbone throughout the primary experiments.

\noindent\textbf{Static context.}
Static context describes landscape and exposure factors that do not change at the weather time step. These variables are LANDFIRE fire-behavior fuel model~\cite{landfire_fbfm40}, LANDFIRE canopy cover~\cite{landfire_canopy_cover}, Wildfire Risk to Communities housing-unit density~\cite{usfs_wrc_housing_density}, and LandScan population~\cite{ornl_landscan_2024}. Together with validity masks for the weather and static fields, the occupancy input has 16 channels: 10 weather fields, two validity masks, and four static layers for regional fire prediction.

\noindent\textbf{Event-level resources.}
Event-level resources are used for supporting burned-area and analog tasks, not as occupancy labels for \ourfm. These resources include WFIGS incident and perimeter attributes~\cite{nifc_wfigs_perimeters} and MTBS burned-area and burn-severity records~\cite{mtbs_usgs_2025}. They provide event-scale outcomes and incident metadata for supporting tasks in the experiments and appendix analyses.


\subsection{Training Strategy}

\noindent\textbf{Model and data split.}
\ourfm uses a compact U-Net~\cite{ronneberger2015unet} that maps gridded weather and static inputs to wildfire predictions.
Its primary output is fire occupancy on the common spatial grid.
Data are split by time: June--August 2024 for training, September 2024 for validation, and October 2024 for testing.
This yields 368 training time maps, 120 validation time maps, and 120 test time maps.
Temporal splitting keeps later fire outcomes out of earlier training periods.

\noindent\textbf{Fire-aware tile training.}
Training is performed on 32$\times$32 tiles sampled from the time maps. The tiles include fire-centered regions and non-fire context, so the model sees both sparse fire labels and surrounding background conditions. This sampling reduces the dominance of empty cells without removing non-fire examples from the training distribution. Class-weighted binary cross-entropy is used for the primary occupancy target to further balance sparse positives.

\noindent\textbf{Spatial-support training objective.}
Wildfire labels can shift by a few grid cells because detections, weather fields, and static layers are aligned on a common grid. To reduce sensitivity to these small displacements during training, the occupancy target is dilated by two grid cells. An auxiliary spatial-support output is trained for the same neighborhood alongside the primary occupancy output. At test time, \ourfm is scored under the same task-specific evaluation contracts as the transferred Earth-FM backbones in Section~\ref{sec:eval}, ensuring matched comparison conditions.