Machine learning

Surrogate models, emulators, Gaussian processes, and data-driven turbulence and urban-process modelling.

3 people 0 projects 4 papers

A fast-growing strand that cuts across every other area in the lab. Direct simulation tells us what the physics actually does; data-driven methods turn that knowledge into something fast, deployable, and capable of running where the simulations cannot — in operational forecasts, on real-time dashboards, against streams of sensor data.

Emulators are the most direct payoff. Machine-learning surrogates compress urban land-surface processes that are far too expensive to resolve directly into something an operational weather model can run at every grid cell (Meyer et al., J. Adv. Mod. Earth Syst. 2022). The active EU UrbanAIR project pushes the same idea further — pairing uDALES with data-driven surrogates to deliver near-real-time digital twins for urban air quality and microclimate.

When the data is sparse and expensive — for example, low-cost sensor networks scattered across a city — Gaussian processes do most of the heavy lifting. They turn a handful of measurements into a coherent street-level pollutant field with calibrated uncertainty (Schoucair & van Reeuwijk, Sci. Tot. Env. 2025) and characterise vehicle emission factors directly from on-road measurements (Le Cornec et al., Sci. Tot. Env. 2020).

The same Bayesian machinery feeds back into the physics: Ensemble Kalman Filter assimilation of in-situ, UAV and satellite temperature observations into hydrodynamic models of thermal effluents in shallow, tidal bays (Alsulaiman et al., Earth & Space Science 2025) — a textbook case where measurements and models become more useful when they’re built to answer to each other.

Current PhD work is taking ML inside LES itself, training data-driven correctors for sub-grid turbulence closures against the high-fidelity DNS and LES output the turbulence & mixing and urban fluid mechanics groups produce. This area sits next to the simulation-heavy ones rather than replacing them: every surrogate or emulator is trained against — and validated against — uDALES or DNS data the group itself generates. The simulation tools and the data-driven tools feed each other.

People

Recent publications

Browse all 4 →