TIES2023: OBSERVATION MEETS THEORY: BREAKING DOWN BARRIERS BETWEEN STATISTICS AND ENVIRONMENTAL SCIENCE
PROGRAM FOR TUESDAY, JULY 25TH
Days:
previous day
next day
all days

View: session overviewtalk overview

09:00-10:00 Session 6: Machine Learning in Model-Based Geostatistics (Plenary #1)

Spatial generalized linear mixed-models, consisting of a linear covariate effect and a Gaussian Process (GP) distributed spatial random effect, are widely used for analyses of geospatial data. We consider the setting where the covariate effect is non-linear and propose modeling it using a flexible machine learning algorithm like random forests or deep neural networks. We propose well-principled extensions of these methods, for estimating non-linear covariate effects in spatial mixed models where the spatial correlation is still modeled using GP. The basic principle is guided by how ordinary least squares extends to generalized least squares for linear models to account for dependence. We demonstrate how the same extension can be done for these machine learning approaches like random forests and neural networks. We provide extensive theoretical and empirical support for the methods and show how they fare better than naïve or brute-force approaches to use machine learning algorithms for spatially correlated data.

09:00
Machine Learning in Model-Based Geostatistics

ABSTRACT. Spatial generalized linear mixed-models, consisting of a linear covariate effect and a Gaussian Process (GP) distributed spatial random effect, are widely used for analyses of geospatial data. We consider the setting where the covariate effect is non-linear and propose modeling it using a flexible machine learning algorithm like random forests or deep neural networks. We propose well-principled extensions of these methods, for estimating non-linear covariate effects in spatial mixed models where the spatial correlation is still modeled using GP. The basic principle is guided by how ordinary least squares extends to generalized least squares for linear models to account for dependence. We demonstrate how the same extension can be done for these machine learning approaches like random forests and neural networks. We provide extensive theoretical and empirical support for the methods and show how they fare better than naïve or brute-force approaches to use machine learning algorithms for spatially correlated data.

10:20-11:20 Session 7: Impacts of Climate Change, a UK statistical and data science perspective

(Invited session) This session is organised by the Environmental Statistics Section (ESS) of the Royal Statistical Society (RSS), UK, with the aim of illustrating the work of members of the section, individuals, institutions, and organisations related to the ESS and creating stronger links between the ESS/RSS and TIES communities. The session will particularly focus on statistical and data science perspectives from UK academics, research institutes and public sector on applications related to climate change. Speakers are from the UK Met Office, UK British Geological Survey, the Data Science for the Natural Environment Team at Lancaster and the Alan Turing Institute.

10:20
A changepoint-based approach to modelling soil moisture dynamics and identifying soil signatures
PRESENTER: Mengyi Gong

ABSTRACT. Soil moisture is an important measure of soil health. Soil scientists apply soil drydown models to soil moisture time series to investigate soil infiltration properties. The typical modelling process requires manually separating a time series into segments representing the drydown process and fitting exponential decay models to these segments to obtain an estimation of the key parameters. With the advancement of sensor technology, scientists can now obtain higher frequency measurements over longer periods in a larger number of locations. To enable automatic data processing, a changepoint-based approach is developed to identify structural changes in the time series and to obtain a dynamic view of the infiltration parameters.

Specifically, timings of the sudden rises in soil moisture over a long time series are captured and the parameters characterising the drydown processes following the sudden rises are estimated simultaneously. An algorithm based on the penalised exact linear time (PELT) method was developed to identify the changepoints. This method can be considered as a complement to the conventional soil drydown modelling. It requires little data pre-processing and can be applied to a soil moisture time series directly. Since each segment has its unique parameters, the method also has the potential of capturing any temporal variations in the drying process, thus providing a more comprehensive summary of the data.

The method was applied to the hourly soil moisture time series of nine field sites from the NEON data portal. Distributions and summary statistics of key parameters, such as the exponential decay rate and the asymptotic soil moisture level, are produced for each field site. Comparing these quantities from different field sites enables the identification of soil signatures which can reflect the infiltration properties of soils.

10:35
Exploring changes in seasonal patterns of environmental time series
PRESENTER: Kathryn Leeming

ABSTRACT. Many environmental time series exhibit repeating seasonal patterns on daily or annual time scales. In this work we investigate evidence for changes in the seasonal pattern of baseflow using the CAMELS-GB dataset. By treating the annual baseflow patterns as observations of curves, we apply functional data analysis to compare the seasonal shapes over space and time. Rather than asking questions about level changes in the baseflow, this analysis allows us to assess whether the seasonal distribution of baseflow over the year has changed through time.

For each catchment, the average seasonal patterns of baseflow are calculated for two twenty-year time blocks. To characterise these seasonal patterns we use a functional data clustering method (funFEM) to group the seasonal patterns into three clusters. These clusters align with geological and climatic differences between catchments, and we explore potential causes of the baseflow changes over time. Whilst this application focuses on baseflow, this approach could be applied to seasonal patterns within other environmental time series.

10:50
Environmental Data Science Book: A Computational Notebook Community for Open Environmental Science

ABSTRACT. Environmental Data Science Book (or EDS Book), www.edsbook.org, is a pan-european community-driven resource hosted on GitHub and powered by Jupyter Book. The resource leverages executable notebooks, cloud computing resources and technical implementations of the FAIR (Findable, Accessible, Interoperable and Reusable) principles to support the publication of datasets, innovative research and open-source tools in environmental science.

EDS book provides practical guidelines and templates that maximise open infrastructure services to translate research outputs into curated, interactive, shareable and reproducible executable notebooks which benefit from a collaborative and transparent reviewing process. Each notebook and its dependencies (input/output data, documentation, computational environments, etc.) are bundled into a Research Object (RO) and deposited to RoHub (a RO management platform) that provides the technical basis for implementing FAIR executable notebooks.

To date, the community has successfully published multiple python-based notebooks covering a wide range of topics in environmental data science. The notebooks consume open-source python libraries e.g., Pangeo stack (intake, iris, xarray) and Holoviz (hvplot, panel) for fetching, processing and interactively visualizing environmental research.

In future work, we expect to increase contributions showcasing scalable and interoperable open-source developments in other programming languages e.g Julia and R, and engage with computational notebooks communities and research networks interested in improving scientific software practices in environmental science.

11:05
Towards reliable projections of global mean surface temperature
PRESENTER: Philip Sansom

ABSTRACT. Quantifying the risk of global warming exceeding critical targets such as 2.0 K requires reliable projections of uncertainty as well as best estimates of Global Mean Surface Temperature (GMST). However, uncertainty bands on GMST projections are often calculated heuristically and have several potential shortcomings. In particular, the uncertainty bands shown in IPCC plume projections of GMST are based on the distribution of GMST anomalies from climate model runs and so are strongly determined by model characteristics with little influence from observations of the real-world. Physically motivated time-series approaches are proposed based on fitting energy balance models (EBMs) to climate model outputs and observations in order to constrain future projections. It is shown that EBMs fitted to one forcing scenario will not produce reliable projections when different forcing scenarios are applied. The errors in the EBM projections can be interpreted as arising due to a discrepancy in the effective forcing felt by the model. A simple time-series approach to correcting the projections is proposed based on learning the evolution of the forcing discrepancy so that it can be projected into the future. These approaches give reliable projections of GMST when tested in a perfect model setting, and when applied to observations lead to well constrained projections with lower mean warming and narrower projection bands than previous estimates. Despite the reduced uncertainty, the lower warming leads to a greatly reduced probability of exceeding the 2.0 K warming target.

11:30-12:15 Session 8A: Statistical Modeling and Environmental Monitoring (Online)

(Organized session) This session will discuss different models that involve model averaging for monitoring suspended solids in water quality in Nigeria the most populous African country. We shall also discuss change-point models for evaluating carbon dioxide pollution using Bayesian approach and also to be considered is the use of log-normal distribution in detecting the concentration of polluted water sample as the concentration reduces along various dilution stages. In this session attempts shall be made to provide solutions to various environmental pollution problems using data that originate from the local communities.

11:30
On the Bayesian Modeling of Suspended Solids in Oyo State Reservoirs
PRESENTER: Oladapo Oladoja

ABSTRACT. Aquatic life and water quality can be negatively impacted by suspended particles. They may lessen water clarity and obstruct sunlight, which may prevent aquatic plants from photosynthesis. A risk to human health if consumed, they can also transport nutrients and toxins like germs and heavy metals that can affect aquatic life. Thus minimizing suspended particles is a crucial part of managing water quality. This study is aimed at modeling suspended solids in the two major reservoirs in Oyo state (Asejire and Eleyele reservoirs). Bayesian inference, because of its incorporation of prior information, flexibility, probability estimations and prediction was used for this study. Suspended solids in both reservoirs observed over a period of 200 months from 2003 to 2019 assumes Normal Distribution. A conjugate prior for normal density was used to give an update of knowledge about suspended solids inform of the posterior distribution with unknown mean and known precision. The posterior mean and precision for Asejire Reservoir and that of Eleyele Reservoir was obtained. In addition, the 95% credible interval was obtained for the two reservoirs. It is likely that the true (unknown) Bayesian estimate of suspended solids in Asejire Reservoir and Eleyele Reservoir would lie within a particular interval. Using the knowledge of the posterior distribution, the posterior predictive distribution of future observation was determined. There is an update of belief about the posterior mean of suspended solids in both reservoirs in Oyo state. Also the posterior median and standard deviation were evaluated if rounding was ignored. Concerned authorities should be informed that Eleyele reservoir is more polluted with suspended solids than Asejire reservoir.

11:45
Use of Lognormal Distribution in Assessing the Concentration of a Polluted Water Sample

ABSTRACT. In this work the final concentration of four-stage successive random dilution of magnesium polluted water sample was carried out in a research laboratory of the University of Ilorin. The Atomic Absorption Spectrometer (A.A.S) was used to measure the concentration of the polluted water sample at each stage. Based on the data derived from the laboratory more data were simulated considering various sample sizes: 500; 1000; 2000; and 5000. The simulated data of the final concentration of the polluted water sample were fitted to a 2-parameter lognormal distribution and a 3-parameter lognormal distribution using the methods of Maximum Likelihood Estimation (M.L.E) and the Method of Moment (MoM). The Mean-Square Error (M.S.E) was used for model selection and the M.L.E was found to be better than MoM. Anderson Darling Statistic was used in assessing the simulated data and the results showed that the 3- parameter lognormal distribution was a better fit (than the 2-parameter lognormal distribution) to the final concentration of polluted water sample for all the sample sizes considered.

12:00
Change-point detection in Carbon Dioxide Emission in Nigeria Using A Bayesian Hypothesis Testing Approach
PRESENTER: Taiwo Adegoke

ABSTRACT. Drawing conclusion about the presence and characteristics of shifts in mean level of carbon dioxide emission that could arise due to the contribution of man-made activities such as liquid fuel consumption, solid fuel consumption and gaseous fuel consumption is of important in other to better understand the impact of our actions in the environment we live in. In this work, a Bayesian methodology is introduced that employs a single shifting model in the mean level of a linear regression system. Two distinct issues were addressed: the first involves identifying the occurrence of a change while the second involves estimating the magnitude and timing of the change-point. The data used in this study was gotten from World Bank and comprises of Nigeria dataset on carbon dioxide emission, liquid fuel consumption, solid fuel consumption and gaseous fuel consumption between the years 1990 to 2016. The Bayesian hypothesis testing techniques was used to detect the point of change and Gibbs sampling Monte Carlo simulation was adopted to estimate the magnitude of the change that occurred. From the result obtained a change-point was detect in the year 2008. Relevant authorities in the country will be able to make proper policies based on the findings in this study.

11:30-12:30 Session 8B: Environmental pollution and climate change: assessment and impacts

(Invited Session) Data sets, quantitative methods and implementation of these methods now existwhich allow us to investigate the importance of factors and of changing regimes in explainingoutcomes. In this session, two of the papers consider the impact of climate change, one on thevegetation dynamics of four East African climatic and agricultural zones, and the other on themalnutrition status of children in Egypt. To analyze the complex data sets involved, themethods of wavelet and causal discovery analysis are used in the first case, and a multi-levelgeostatistical model fitted under the Bayesian framework using the integrated Laplaceapproximation in the second. Another common characteristic of environmental data sets is thatof regime changes over time. Various change-point methods are considered which are suitablefor the assessment of environmental pollution and climate data in the third paper. The sessionwill be of interest to individuals who are working with one of the types of methods or who wouldlike to learn more about such methods, as well as individuals interested in the issues related tothe impact of climate change or pollution considered in the papers.

11:30
A Bayesian geo-statistical model for the impacts of climate on children malnutrition in Egypt
PRESENTER: Amira Elayouty

ABSTRACT. Climate change poses a critical threat to human societies. Children are more susceptible to the effects of climate change than adults, with short- and long-term impacts on their physical health. Children in developing countries face the greatest risks of all given their relative vulnerability to food insecurity shocks which is exacerbated with poverty. In Egypt, two-thirds of under 5 years child mortality are attributed to malnutrition. It is, therefore, critical to estimate the impacts of climate change on malnutrition in Egypt. This study aims at estimating the spatial distribution of malnutrition among children under the age of five across Egypt; and at investigating whether and how climate change and climate anomalies impact the malnutrition status of children while accounting for the socio-economic factors of the child and the spatial dependence and heterogeneity between children across different areas in Egypt. To better understand the cumulative effects of climate change over time on individual-level nutrition outcomes, data on Egyptian children and their families are obtained from the latest available Demographic and Health Survey available, which are then spatially merged with climate gridded time series available at 0.5x0.5 resolution. The long- and short- term effects of the environmental conditions and anomalies that prevailed during different time periods of the child’s history on the risk of child malnutrition is modelled using a multi-level geostatistical model fitted under the Bayesian framework using the integrated Laplace approximation approach. The results of this model are then used to detect the clustering structure of children malnutrition across the second administrative level areas in Egypt. The model results highlighted the significant spatial variability in malnutrition across the country; and hence social policies and public health interventions targeted to reduce the burden of childhood stunting should consider geographical heterogeneity and adaptable risk factors. The results also indicated a significant association between high temperature anomalies and malnutrition highlighting the importance of adopting climate change related regulations.

11:50
On multiple change-point analysis and its use in assessing environmental pollution

ABSTRACT. The change-point analysis is an important subject of interest in many scientific disciplines, particularly in climatology and environmental science. It is an essential part of decision-making and an integral part of evaluating the effectiveness of adopted environmental policies. This talk will briefly provide a general overview of multiple change-point analyses alongside some historical notes. Then, it will introduce the change-point methodologies we have developed over the past several years to analyze various complex data structures in univariate, multivariate and functional data. Finally, we will discuss how these methods can be used in assessing environmental pollution.

13:30-14:30 Session 9: Spatio-Temporal Modeling

Contributed Talks on the topic of Spatio-Temporal Modeling

13:30
Enhancing machine learning models for spatiotemporal predictions of environmental factors

ABSTRACT. As the availability, size, and complexity of environmental data have increased, there has been rapid advancement in the development of statistical and machine learning techniques for environmental modeling. Environmental data are characterized by spatial, temporal, and spatiotemporal correlations, which are important features to capture in generating accurate and physically plausible model predictions. Furthermore, uncertainty quantification, an often overlooked but critical component of environmental modeling, plays a significant role in how model predictions are used and interpreted.

We propose enhancements to popular machine learning algorithms used for environmental predictions that incorporate a) geostatistical functions to to enhance predictions, and b) a modified quantile regression to the objective function to estimate prediction intervals, which quantify uncertainty. Specifically, we included the Huber norm in the quantile regression model to construct a differentiable approximation to the quantile regression error function. This key step allows the gradient-based optimization algorithm to make probabilistic predictions efficiently.

Two applications are presented: predicting air quality surfaces from satellite observations, and traffic-related noise from mobile monitoring.

13:45
Multi-scale Geographically Weighted Quantile Regression
PRESENTER: Allaa H. Elkady

ABSTRACT. Spatial non-stationarity and spatial autocorrelation are two typical properties of spatial data. Geographically Weighted Regression (GWR) is one class of models which explores and accounts for the potential non-stationarity of relationships between a response and some explanatory variables across space. In some situations, heterogeneity does not only arise from the non-stationarity of data relationships over space but also from the response heterogeneity across different locations of the outcome distribution. This leads to the rise of Geographically Weighted Quantile Regression (GWQR) which accounts for both sources of heterogeneity and provides an entire description of the response distribution across space. However, GWQR assume that modelled processes operate at the same spatial scale. This is an unrealistic assumption for spatially varying relationships that may operate at different spatial scales. Therefore, Multi-scale Geographically Weighted Quantile Regression (MGWQR) that relaxes the assumption that all relationships operate at the same scale is proposed here. This proposed methodology relies on estimating a vector of optimal bandwidths measuring the spatial scales at which the different processes (relationships) operate at each quantile. The estimation of the model and the selection of the vector of bandwidths in MGWQR are implemented using a back-fitting algorithm. The performance of the proposed model is evaluated against the existing GWQR with means of a simulation study and an empirical illustration. The application considers the impacts of a set of climate variables on children’s health and growth data.

14:00
Efficient Large-scale Nonstationary Spatial Covariance Function Estimation using Convolutional Neural Networks
PRESENTER: Pratik Nag

ABSTRACT. Spatial processes observed in many applications, such as climate and environmental science, are often large-scale and exhibit spatial nonstationarity. Gaussian processes are widely used in spatial statistics to model such nonstationarity by specifying a nonstationary covariance function, such as the nonstationary Mat\'ern covariance. However, the estimation remains a challenge. In literature, existing work relies on spatial region partitions to estimate the spatially varying parameters in the covariance function. Although the choice of partitions is a key factor, it is typically subjective and not data-driven. In this work, we exploit the capabilities of the Convolutional Neural Networks (CNNs) to obtain subregions from the nonstationary . We use a clustering mechanism to obtain the subregions that behave close to stationary. To classify stationary and nonstationary random fields, we train the CNN using a set of simulated data from general classes of covariance models with various parameter settings to ensure wide coverage of both stationary and nonstationary spatial data. We also provide a parallel high-performance implementation of the nonstationary modeling and predictions on most recent hardware architectures, including shared memory, GPUs, and distributed memory systems. Finally, we assess the proposed implementation using both synthetic and real datasets on a large-scale. The results show better accuracy and performance than the traditional method.

14:15
Sparse Estimation of Multi Way Dependence In High Dimensional Spatio-Temporal Climate Data
PRESENTER: Jaidev Goel

ABSTRACT. Due to the proliferation of high-dimensional climate data over multiple modalities and instrumentation over the past several years, there has been an ever growing need to account for complex dependencies between climatological variables. Tensor representations offer us a systematic manner to describe such multi-way structure of data. For example, weather data can be represented as a 3-mode tensor, preserving spatial information, weather variables, and different instrumentation used, enabling us to model variability across all three modes[1]. This advantage, however, comes at the cost of introducing the ”Curse of Dimensionality” by imposing a high dimensional structure on the data. One way to enjoy the benefits of the higher order tensor structure is to introduce sparsity in the modelling process. This is achieved by the L1-norm penalisation method called Lasso, which aims to reduce insignificant dependencies to zero, thereby encouraging sparsity. We introduce WedLasso, which adapts the Lasso method to tensor variate data to estimate the covariance matrix across a k-way data structure. Furthermore, our method models variability in the data by considering the tensors as separate temporal units, that is, a time series of tensors, allowing for relaxation of the independent and identically distributed assumption of data over time. In return, this enables us to estimate data variability for weakly dependent data units under mixing conditions. We illustrate the utility of WedLasso in application to two climate datasets, namely the Sea Surface Temperature dataset and the UHSCN dataset [2], which contain climatological data from over 120 stations, with the goal of efficiently estimating the multi-way covariance structure of the data.

14:30-15:15 Session 10: Predictive Analytics in Agriculture

(Invited Session) Predictive Analytics in Agriculture

14:30
Forecasting corn yield for nitrogen management in southern Ontario: evaluating machine learning and mechanistic models.
PRESENTER: John Sulik

ABSTRACT. The Canadian government is calling for a 30% reduction in nitrogen fertilizer use to address concerns about climate change. However, nitrogen management for corn is considered a wicked problem due to environmental losses and uncertainty about soil nitrogen supply. To address this, farmers typically apply nitrogen in two stages to prevent losses: half may be applied at planting and the rest during a “split application” between the 4 and 10 leaf development stage. However, determining how to use variable rate technology to apply nitrogen within a field is challenging, much less using a constant nitrogen rate for an entire field.

To estimate nitrogen requirements, the difference in yield between a sub-field plot with no nitrogen and one with excessive nitrogen is calculated. This yield difference, or delta, represents the yield response to applied nitrogen, also known as delta-yield. However, relying on yield data is not ideal as it is a rearview mirror approach.

To address this issue, the authors propose a forecasting approach to explore whether remote sensing can predict delta-yield during nitrogen sidedress timing. Vegetation indices have been used to calibrate statistical models, but mechanistic crop and soil models show greater promise. The authors evaluate approaches to estimate leaf area index from remote sensing data and assimilate it into crop models such as the simple algorithm for yield estimates (SAFY) and the Agricultural Production Systems sIMulator (APSIM). A machine learning model such as XGBoost will be used as a benchmark.

The authors' key innovation is to reduce corn nitrogen management to a yield prediction problem. If the difference in yield between low and high nitrogen plots can be predicted, then forecasting optimal nitrogen rates should be feasible. This will reduce nitrous oxide emissions, prevent excess nitrogen application, and increase grower profit.

14:45
Predicting multi-crop land suitability in Canada under climate change
PRESENTER: R. Ayesha Ali

ABSTRACT. Crop land suitability typically produces binary maps that predict which geographic regions are suitable for growing a single crop based on expert opinion. However, these maps do not provide continuous scores of crop production potential or limitations. We present a multi-crop, semi-supervised machine learning method that simultaneously predicts the land suitability of several crops in Canada. The model uses Statistics Canada crop yield census data at the district level, downscales it to the farm level where the crops are cultivated, and then leverages soil-climate-landscape variables obtained from Google Earth Engine. This novel approach can accommodate data from different spatial resolutions, enables training with unlabelled data, and allows for the training of a multi-crop model that can capture the interdependences and correlations between various crops, thereby leading to more accurate predictions. Our multi-crop model supports a future northern shift in Canada's agricultural frontiers, which may have important implications for crop management.

15:00
Integration of multi-source within-season datasets for improving crop yield prediction using machine learning and deep learning approaches
PRESENTER: Jumi Gogoi

ABSTRACT. Accurate predictions of crop yields are key to making strategic decisions related to food security planning from local to global scales, especially with a changing climate. A satellite-based modeling approach offers promise for yield monitoring and decision making by different agricultural stakeholders, especially in data sparse regions. In terms of modeling approaches, statistical regression models have remained a popular tool owing to their ease of use with growing availability of observational data. With a big-data revolution and advancements in computational power, a shift from traditional statistical models to state-of-the-art Machine Learning techniques is just getting underway in Canadian agricultural research. This data-driven research work will present a multi-source integration of spatio-temporal datasets within different machine learning and deep learning algorithms for assessing performances of different models and the behavior of different environmental variables for yield predictions in the Canadian Prairies. We aim to develop new methodological analysis by integrating high spatial resolution yield monitor data over multiple fields with various satellite-derived observations, weather data and environmental variables. The novelty of modeling approach extends beyond obtaining a satisfactory performance, as models will be adapted to better exploit the wealth of information in different datasets while balancing the needs of model interpretability and reproducibility to identify the best combination of modeling algorithm and data inputs that improves prediction accuracy. We expect that introducing information representing variability of environmental conditions in addition to satellite-derived vegetation indices would lead to better prediction performance of models. Further, an application of machine and deep learning algorithms could help in detecting spatiotemporal variation in yields over traditional regression models using high spatial resolution yield datasets. Such efforts will support exploration of predictors that improve representation of atypical impacts on final crop yield which could be useful for stakeholders such as crop insurers, commodity traders and policy analysts.

15:30-16:15 Session 11: Precipitation & Water

Contributed Talks on the topic of Water

15:30
Comparing approaches for water quality prediction in rivers from satellite data, with an application to India’s Ramganga river
PRESENTER: Craig Wilkie

ABSTRACT. Satellite datasets are increasingly used to understand marine and lake water quality, and new high spatial resolution data now allow their use for river water quality applications. However, there has been limited study of approaches for river water quality prediction. Challenges include the long, narrow, winding shape of rivers which results in land being included in satellite data grid cells, which are then excluded causing missing values, and computational issues for longer rivers resulting from the high number of grid cells, compounded by the missing data, and finally accounting for river network distance.

We present a comparison of approaches to predicting river water quality using the functional data approach of st-PDE that treats data as observations of underlying spatio-temporally smooth curves (Bernardi et al., 2017; Arnone et al., 2021) and fixed rank kriging where the spatial dependence is captured using basis functions (Cressie and Johannesson, 2008). These approaches are computationally efficient, while st-PDE explicitly accounts for the shape of the river boundary.

Our work is part of the Ramganga Water Data Fusion Project (https://ramganga.org/), an interdisciplinary project focusing on the highly polluted Ramganga river in India. We have derived Sentinel-2 satellite data for chlorophyll and total suspended matter at a 10m resolution across 6 years of approximately fortnightly satellite passes, but these contain gaps in space and time due to cloud cover, motivating our comparison of prediction approaches. We present results for spatio-temporal predictions, with validation using limited in-river data. We explain how our work can be combined with catchment information and presented to end-users such as NGOs or government bodies as an R shiny app.

Data were provided by Rajiv Sinha, Manudeo Singh, Umar Farooq and Bharat Choudhary (Indian Institute of Technology Kanpur), and Andrew Tyler, Peter Hunter and Veloisa Mascarenhas (University of Stirling). Funding: EPSRC reference EP/T003669/1.

References: Arnone E., Sangalli L.M., Vicini A., 2021, Smoothing spatio-temporal data with complex missing data patterns, Statistical Modelling, DOI:10.1177/1471082X211057959. Bernardi S., Sangalli L.M., Mazza G., Ramsay, J.O., 2017, A penalized regression model for spatial functional data with application to the analysis of the production of waste in Venice province, Stochastic Environmental Research and Risk Assessment, 31, 23-38, DOI:10.1007/s00477-016-1237-3. Cressie, N. and Johannesson, G., 2008, Fixed rank kriging for very large spatial data sets, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70, 209-226, DOI:10.1111/j.1467-9868.2007.00633.x.

15:45
State Space Models for Semi-Continuous Precipitation Data
PRESENTER: Jiaye Xu

ABSTRACT. Provinces in the central and northeastern regions of South Africa are the main corn production regions in this country. However, this area frequently experiences severe droughts which pose a serious threat to South Africa’s agricultural economy and food security. In order to address this challenge, researchers are motivated to study precipitation in the ‘corn belt’. At several time scales, the recorded precipitation data contain many true zeros, as well as positive precipitation measurements, leading to semi-continuous observations. We therefore present state-space models for time series of semi-continuous data, motivated by South African precipitation.

State space models are powerful tools for dynamic time series (and space-time) modeling, frequently used in many fields such as environmetrics, engineering, and economics. They are particularly useful for modeling time series data with missing observations or irregularly spaced sampling intervals, and for non-linear and non-Gaussian time series.

Here we present state space models as the basic framework for dynamic models of a semi-continuous time series, using the Tweedie family as the response distribution within the observation equation. Thereby we construct dynamic Hierarchical Tweedie state space models. The Tweedie family includes many frequently-encountered probability distributions. Here we use the compound Poisson-Gamma subclass of distributions with probability mass at zero, that is well-suited for modeling semi-continuous precipitation data.

We perform Bayesian inference, via forward filtering backward sampling (FFBS) and MCMC algorithms combined to infer the state vector and unknown parameters in the model.

Given the relationship between severe summer droughts in South Africa’s corn production area and El Nino events, we also explore associations with the ENSO indices.

16:00
A study of snow water equivalent in the Sierra Nevada of California, using snow pillow data
PRESENTER: Wendy Meiring

ABSTRACT. The Sierra Nevada snowpack is one of the primary water resources for California. Precipitation occurs predominantly in the winter and spring months in this region. In recent decades, snow pillow records provide a set of spatially-located functional data describing the snowpack accumulation and melt patterns in each year at each snow pillow location. We present a functional data analysis study of space-time variation in the annual snowpack accumulation and melt patterns, associated with spatial location attributes and large-scale climate indices

16:15-17:30 Session 12: The International Environmetrics Society Annual General Meeting, Hybrid Mode

Annual General Meeting for The International Environmetrics Society, to be hosted in-person with attendance online. For members of the Society; all others may take an earlier afternoon conclusion. 

Chair:
19:00-21:00 Reservations at One Fine Food (10 people, under 'TIES Meeting')

We've made reservations for the patio at One Fine Food, which is an Italian restaurant with the best wood-fired pizza in town. For 10 people, on the patio, at 7pm. You can sign up to come along at the registration desk if this sounds interesting - entirely optional, everyone can pay their own way. Just a nice way to end the first day of science for those who are interested in this kind of food.