Skip to main content Skip to navigation

Algorithms & Computationally Intensive Inference seminars

The seminars will take place on Fridays 1 pm UK time in room MB0.08.

2023-2024 Organisers: Shenggang Hu and Andi Wang.

If you would like to speak, or you want to be included in any emails, please contact one of the organisers.

Website URL: www.warwick.ac.uk/compstat

Mailing List Sign-Up: http://mailman1.csv.warwick.ac.uk/mailman/listinfo/algorithmseminar

Mailing List: algorithmseminar@listserv.csv.warwick.ac.uk (NB - only approved members can post)

Term 2:
Date Speaker Title
12/01 Shenggang Hu Statistical Disaggregation — a Monte Carlo Approach for Imputation under Constraints

Note:

Hybrid

SlidesLink opens in a new window

​Statistical disaggregation has become more and more important for smart energy systems. A typical example of such disaggregation problems is to learn energy consumption for a higher resolution level based on data at a lower resolution, e.g., to decompose a daily reading into readings at a finer level. Constrained models are often used in such problems and they are often very useful compared to their unconstrained counterparts in terms of reducing uncertainty and leading to an improvement in overall performance. However, these constrained models usually are not expressible as ordinary distributions due to their intractable density functions which makes it hard to conduct further analysis. We introduce a novel Monte Carlo sampling algorithm based on Langevin diffusions and rejection sampling to solve the problem of sampling from equality-constrained models. We test our method on statistical disaggregation problems for electricity consumption datasets, and our approach provides better uncertainty estimation and accuracy in data imputation compared with other naive/unconstrained methods.
19/01 Andrew DuncanLink opens in a new window (Imperial) Energy Discrepancies: Fitting Unnormalised Probability Models without Scores

Note:

Hybrid

SlidesLink opens in a new window

The problem of fitting unnormalised densities to data is a common challenge across statistics (e.g. likelihood free inference) and machine learning (e.g. Energy Based Models). Solutions to this either involve score-matching or its variants, which require computation of score functions and its derivatives; or alternatively require expensive Markov Chain Monte Carlo Simulations. I present a very simple alternative approach called Energy Discrepancy (ED), which does not require either. I show that, as a divergence, ED interpolates between score matching and KL divergence. As a result of this, minimum ED estimation overcomes the problem of “near-sightedness” encountered in score-based estimation methods, while also enjoying theoretical guarantees. This score-free approach also opens the door to fitting unnormalized models on discrete spaces. I will demonstrate how to implement the method in practice, and its application to a few problems.
26/01 Matthew ThorpeLink opens in a new window (Warwick) Discrete-To-Continuum Limits in Graph-Based Semi-Supervised Learning

Note:

In-person

SlidesLink opens in a new window

Semi-supervised learning (SSL) is the problem of finding missing labels from a partially labelled data set. The heuristic one uses is that “similar feature vectors should have similar labels”. The notion of similarity between feature vectors explored in this talk comes from a graph-based geometry where an edge is placed between feature vectors that are closer than some connectivity radius. A natural variational solution to the SSL is to minimise a Dirichlet energy built from the graph topology. And a natural question is to ask what happens as the number of feature vectors goes to infinity? In this talk I will give results on the asymptotics of graph-based SSL using an optimal transport topology. The results will include a lower bound on the number of labels needed for consistency.
30/01

Rocco CaprioLink opens in a new window (Warwick)

Andi WangLink opens in a new window (Warwick)

1. A calculus for Markov chain Monte Carlo: studying approximations in algorithms

2. Comparison theorems for Hybrid Slice Sampling

Note:

In-person

Slides1Link opens in a new window

Slides2Link opens in a new window

1.

Markov chain Monte Carlo (MCMC) algorithms are based on the construction of a Markov Chain with transition probabilities $P_\mu(x,\cdot)$, where $\mu$ indicates an invariant distribution of interest. In this work, we look at these transition probabilities as functions of their invariant distributions, and we develop a notion of derivative in the invariant distribution of a MCMC kernel. We build around this concept a set of tools that we refer to as Markov Chain Monte Carlo Calculus. This allows us to compare Markov chains with different invariant distributions within a suitable class via what we refer to as mean value inequalities. We explain how MCMC Calculus provides a natural framework to study algorithms using an approximation of an invariant distribution, also illustrating how it suggests practical guidelines for MCMC algorithms efficiency. We conclude this work by showing how the tools developed can be applied to prove convergence of interacting and sequential MCMC algorithms, which arise in the context of particle filtering.

2.

Hybrid Slice Sampling is a variant of Simple Slice Sampling, whereby exact sampling from the uniform distribution on a level set is replaced with sampling from a Markov kernel which is merely invariant with respect to this distribution. It is known that the spectral gap of the Hybrid Slice Sampling algorithm, if it exists, is smaller than that of the Simple Slice Sampling algorithm. Following the framework first presented in Andrieu, Lee, Power, Wang (2022), we shall derive new comparison results, giving convergence guarantees for Hybrid Slice Sampling, relative to convergence rates of Simple Slice Sampling. These comparisons are valid even in subgeometric scenarios when the spectral gap is 0, and significantly extend and simplify previous results in the literature.

02/02 Connor DuffinLink opens in a new window (Cambridge) Time-Dependent Statistical Finite Element Problems: Unstructured Data, Unknown Observation Operators, and Inversion

Note:

In-person

SlidesLink opens in a new window

The so-called ``statistical finite element method'' (statFEM) is a novel approach to conduct statistically coherent data assimilation for PDEs. A spatially smooth additive noise is included in the PDE, yielding a prior distribution which can be used to assimilate data in an online fashion, the well-known filtering problem. Results show that the method performs well and deals with model misspecification across many contexts (such as internal waves, reaction-diffusion systems, and shallow water flows). However, an obvious problem with the method is its reliance on the observation operator being known. The observation operator is the mapping from the model to the data and, in many scenarios, can be an unknown and highly nontrivial object. In this talk, I will present a statFEM-based approach to conducting data assimilation with unknown observation operators, motivated through solving the inverse problem for a collection of nonlinear differential equations. We will motivate our work through the Korteweg-de Vries equation, and discuss further case studies with the Lorenz-63 system of ODEs, and the Wave equation.
 
This is joint work, with Alex Glyn-Davies (University of Cambridge), Dr. Deniz Akyildiz (Imperial College London), and Prof. Mark Girolami (University of Cambridge & Alan Turing Institute).
09/02 Guglielmo GattiglioLink opens in a new window (Warwick)
Nearest Neighbor GParareal: Improving Scalability of Gaussian Processes for Parallel-in-Time Solvers

Note:

Hybrid

SlidesLink opens in a new window

With the advent of supercomputers, multi-processor environments and parallel-in-time (PiT) algorithms provide ways to integrate ordinary differential equations (ODEs) over long time intervals, a task often unfeasible with sequential time-stepping solvers within realistic timeframes. A recent approach, GParareal, combines machine learning (Gaussian Processes) with traditional PiT methodology (Parareal) to achieve faster parallel speed-ups. Unfortunately, the applicability of the model is limited to a small number of computer cores and ODE dimensions. We present Nearest Neighbor GParareal (NN-GParareal), a data-enriched parallel-in-time integration algorithm that builds upon GParareal by improving its scalability properties for higher dimensional systems and increased processor count. Through data reduction, the model complexity is reduced from cubic in the sample size to loglinear, yielding a fast, automated procedure to integrate initial value problems over long intervals. The practical utility of NN-GParareal is demonstrated theoretically and empirically through its evaluation on nine different systems. Our analysis offers tangible evidence of NN-GParareal's behavior, advantages, and validity.
16/02 Giorgos VasdekisLink opens in a new window (UCL) Skew-symmetric sampling schemes for SDEs and where to find them

Note:

Hybrid

SlidesLink opens in a new window

Locally balancing algorithms are a new class of MCMC algorithms, recently introduced in (Livingstone and Zanella, 2022). One of these algorithms, the Barker algorithm, has been shown to be robust to heteroskedasticity of the posterior target and the step size of the algorithm. At the same time, the algorithm seems to preserve high dimensional properties of state of the art MCMC, making it an interesting alternative to the existing literature. It turns out that in order to sample from the Barker algorithm, one can use ideas of sampling from skew-symmetric distributions. We will transfer these ideas in the context of simulating from diffusion processes and we will suggest a new class of unadjusted MCMC algorithms, which are robust with respect to the step size.
 
This is joint work with S. Livingstone, N. Nusken and R. Zhang.
23/02 Pierre NyquistLink opens in a new window (Chalmers University of Technology, Gothenburg University) Large deviations for Markov chain Monte Carlo methods: the surprisingly curious case of Metropolis-Hastings

Note:

Hybrid

SlidesLink opens in a new window

Markov chain Monte Carlo (MCMC) methods have become the workhorse for numerical computations in a range of scientific disciplines, e.g., computational chemistry and physics, statistics, and machine learning. The performance of MCMC methods has therefore become an important topic at the intersection of probability theory and (computational) statistics: e.g., when the underlying distribution one is trying to sample from becomes sufficiently complex, convergence speed and/or the cost per iteration becomes an issue for most MCMC methods.
The analysis, and subsequently design, of MCMC methods has to a large degree relied on classical tools used to determine the speed of convergence of Markov chains, e.g., mixing times, spectral gap and functional inequalities (Poincaré, log-Sobolev). An alternative avenue is to use the theory of large deviations for empirical measures. In this talk I will first give a general outline of this approach to analysing MCMC methods, along with some recent examples. I will then consider the specific case of the Metropolis-Hastings algorithm, the most classical amongst all MCMC methods and a foundational building block for many more advanced methods. Despite the simplicity of this method, it turns out that the theoretical analysis of it is still a rich area, and from the large deviation perspective it is surprisingly difficult to treat. As a first step we show a large deviation principle for the underlying Markov chain, extending the celebrated Donsker-Varadhan theory. Time permitted I will also discuss ongoing and future work on using this result for better understanding of both the Metropolis-Hastings method and more advanced methods, such as approximate Bayesian computation (ABC-MCMC) and the Metropolis-adjusted Langevin algorithm (MALA).
 
The talk will be self-contained and no prior knowledge of either MCMC methods or large deviations is required.
 
The talk is primarily based on join work with Federica Milinanni (KTH).
01/03 Evandro KonzenLink opens in a new window (Warwick) Efficient Bayesian modelling of infectious diseases in wildlife: an application to bovine tuberculosis in badgers

Note:

In-person

SlidesLink opens in a new window

​To better understand the dynamics of infectious diseases of wildlife, it is crucial to be able to fit dynamic transmission models to observed data in a robust and efficient way, in order to estimate key epidemiological parameters and generate well calibrated predictive information. In practice, epidemiological events are at best only partially observed, and as such it is necessary to infer missing information alongside the model parameters as part of the inference routine, requiring computationally intensive inference algorithms where computational load increases non-linearly with population size and with increased dimensionality of the hidden states.
With this in mind, we implement a recently proposed individual forward filtering backward sampling algorithm to fit a complex individual-based epidemic model to data from a large-scale longitudinal study of bovine tuberculosis in badgers. This data set, from Woodchester Park in south-west England, comprises >2,000 badgers across 34 social groups over a 40-year period. We deal with many complexities typical to endemic wildlife disease systems: incomplete sampling of individuals over time (through capture-mark-recapture events), the use of multiple diagnostic tests, spatial meta-population structures, and non-Markovian demographic aspects such as age-dependent mortality rates (with censoring), all alongside a hidden stochastic compartmental model of disease spread. The method produces full posterior distributions for the parameters, and predictive distributions for the hidden states over time for each individual, and fits in just a few hours on a desktop machine.
We also propose a novel individual-level reproduction number which accounts for major sources of uncertainty of the disease system, and from it provide quantitative evidence for the presence of superspreader badgers in Woodchester Park. The inference framework is very flexible, and could be applied to other individual-level disease systems, and we will discuss future extensions to explore further important epidemiological questions.
08/03 Emma HortonLink opens in a new window (Warwick) A theoretical analysis of one-dimensional discrete generation ensemble Kalman particle filters

Note:

Hybrid

SlidesLink opens in a new window

Despite the widespread use of discrete generation ensemble Kalman particle filtering methodology to solve nonlinear and high-dimensional filtering and inverse problems, little is known about their mathematical foundations. In this talk, we give a gentle introduction to these models and build on the mathematical analysis of these sophisticated particle filters developed by Le Gland–Monbet–Tran and by Mandel–Cobb–Beezley. We discuss a variety of results including stochastic perturbation analysis of the fluctuations, the stability, and the long-time performance of these discrete generation ensemble Kalman particle filters, including time-uniform and nonasymptotic mean-error estimates that apply to possibly unstable signals.
15/03

Harita DellaportaLink opens in a new window (Warwick)

Robust Bayesian inference for Berkson and Classical Measurement Error models

Note:

Hybrid

SlidesLink opens in a new window

Measurement error occurs when a set of covariates influencing a response variable are corrupted by noise. This can lead to misleading inference outcomes, particularly in problems where accurately estimating the relationship between covariates and response variables is crucial, such as causal effect estimation. Existing methods for dealing with measurement error often rely on strong assumptions such as knowledge of the error distribution or its variance and availability of replicated measurements of the covariates. We propose a Bayesian Nonparametric Learning method which is robust to mismeasured covariates, does not require the preceding assumptions, and is able to incorporate prior beliefs about the true error distribution. This approach gives rise to a general framework which is suitable for both Classical and Berkson error models via the specification of the prior centering measure of a Dirichlet Process (DP). Moreover, it allows for some flexibility on the choice of loss function depending on the type of regression model and desired robustness properties. In this talk I will introduce the framework, provide some theoretical results on bounding the generalisation error based on the Maximum Mean Discrepancy (MMD) loss and empirically show the effectiveness of the proposed method on some synthetic and real-world examples.
08/04

Mareike Hasenpflug (Passau)

 

Note:

 

Term 1:

Date Speaker Title
06/10 Artavazd MaranjyanLink opens in a new window (KAUST) GradSkip: Communication-Accelerated Local Gradient Methods with Better Computational Complexity

Note:

Hybrid

SlidesLink opens in a new window

Abstract:

We study a class of distributed optimization algorithms that aim to alleviate high communication costs by allowing the clients to perform multiple local gradient-type training steps prior to communication. While methods of this type have been studied for about a decade, the empirically observed acceleration properties of local training eluded all attempts at theoretical understanding. In a recent breakthrough, Mishchenko et al. (ICML 2022) proved that local training, when properly executed, leads to provable communication acceleration, and this holds in the strongly convex regime without relying on any data similarity assumptions. However, their method ProxSkip requires all clients to take the same number of local training steps in each communication round. Inspired by a common sense intuition, we start our investigation by conjecturing that clients with ``less important'' data should be able to get away with fewer local training steps without this impacting the overall communication complexity of the method. It turns out that this intuition is correct: we managed to redesign the original ProxSkip method to achieve this. In particular, we prove that our modified method, for which coin the name GradSkip, converges linearly under the same assumptions, has the same accelerated communication complexity, while the number of local gradient steps can be reduced relative to a local condition number. We further generalize our method by extending the randomness of probabilistic alternations to arbitrary unbiased compression operators and considering a generic proximable regularizer. This generalization, which we call GradSkip+, recovers several related methods in the literature as special cases. Finally, we present an empirical study on carefully designed toy problems that confirm our theoretical claims.

13/10 Robin EvansLink opens in a new window (Oxford) Parameterizing and Simulating from Causal Models

Note:

Hybrid

SlidesLink opens in a new window

Many statistical problems in causal inference involve a probability distribution other than the one from which data are actually observed; as an additional complication, the object of interest is often a marginal quantity of this other probability distribution. This creates many practical complications for statistical inference, even where the problem is non-parametrically identified. In particular, it is difficult to perform likelihood-based inference, or even to simulate from the model in a general way.

We introduce the frugal parameterization, which places the causal effect of interest at its centre, and then builds the rest of the model around it. We do this in a way that provides a recipe for constructing a regular, non-redundant parameterization using causal quantities of interest. In the case of discrete variables we can use odds ratios to complete the parameterization, while in the continuous case copulas are the natural choice. Our methods allow us to construct and simulate from models with parametrically specified causal distributions, and fit them using likelihood-based methods, including fully Bayesian approaches. Our proposal includes parameterizations for the average causal effect and effect of treatment on the treated, as well as other common quantities of interest.

I will also discuss some other applications of the frugal parameterization, including to survival analysis, parameterizing nested Markov models, and ‘Many Data’: combining randomized and observational datasets in a single parametric model. This is joint work with Vanessa DidelezLink opens in a new window.

 

Reference

Evans, R.J. and Didelez, V. Parameterizing and Simulating from Causal Models (with discussion), J. Roy. Statist. Ser. B, 2023.

20/10 Julian Hofstadler (Universität Passau) Error Bounds for Adaptive MCMC within Doubly Intractable Distributions

Note: 

Hybrid

SlidesLink opens in a new window

We study the problem of sampling from doubly intractable distributions, where, additionally to an unknown normalisation constant, we
have to deal with an intractable factor in the non-normalised density. Distributions (or densities) of this form occur, e.g., in biophysics or statistical mechanics. Not being able to evaluate the non-normalised density is problematic in classical MCMC algorithms and to overcome this issue different approaches have been suggested. We consider an adaptive MCMC approach and show error bounds for the corresponding Monte Carlo estimator for numerical integration.


The talk is based on ongoing work with B. Eltzner, B. de Groot, M. Habeck and D. Rudolf.

27/10 Michael FaulknerLink opens in a new window (Bristol) Phase transitions, metastability and critical slowing down in statistical physics

Note:

In-person

SlidesLink opens in a new window

Sampling algorithms are commonplace in statistics and machine learning – in particular, in Bayesian computation – and have been used for decades to enable inference, prediction and model comparison in many different settings. They are also widely used in statistical physics, where many popular sampling algorithms first originated [1, 2]. At a high level, the goals within each discipline are the same – to sample from and approximate statistical expectations with respect to some probability distribution – but the motivations, nomenclature and methods of explanation differ significantly. This has led to challenges in communicating between the fields, and indeed the fundamental goals of one field are often misunderstood in the other. In this talk, we elucidate statistical physics for the statistician, emphasising that probability models are studied as functions of thermodynamic hyperparameters such as the temperature. This is particularly useful for characterising phase transitions, ie, boundaries in thermodynamic hyperparameter space between distinct thermodynamic phases.
We then move on to sampling algorithms, with a particular focus on the behaviour of the Metropolis algorithm [1] when simulating the 2D Ising and 2DXY models of magnetism. Metropolis dynamics are metastable in the low-temperature phase of each model, mixing between states of equal probability density on a timescale the diverges with system size (proportional to the dimensionality of parameter space). Moreover, the Metropolis algorithm also suffers from the closely related phenomenon of critical slowing down at phase transitions. These strongly correlated dynamics are characterised by asymptotically long integrated autocorrelation times, due to a flattening of the target density that essentially results from the system trying to exist simultaneously in both thermodynamic phases. Indeed, these key aspects of statistical physics have led to innovations in sampling algorithms that inform the Bayesian world. In particular, we present the Swendsen—Wang [3], Wolff [4] and event-chain Monte Carlo [5-7] algorithms. The first two simulate the 2D Ising model and were developed in response to the metastability and critical slowing down of the Metropolis algorithm. They circumvent both phenomena to mix with low autocorrelation and independent of system size. We then show that event-chain Monte Carlo similarly circumvents the low-temperature Metropolis metastability of the 2DXY model [7] and discuss its potential utility in bypassing an hypothesised critical slowing down at the phase transition. This talk is based on a recent review paper on the subject [8].
[1] Metropolis et al., J. Chem. Phys. 21 1087 (1953)
[2] Alder & Wainwright, J. Chem. Phys. 27 1208 (1957)
[3] Swendsen & Wang, Phys. Rev. Lett. 58 86 (1987)
[4] Wolff, Phys. Rev. Lett. 62 361 (1989)
[5] Bernard, Krauth & Wilson, Phys. Rev. E 80 056704 (2009)
[6] Michel, Mayer & Krauth, EPL (Europhys. Lett.) 112 20003 (2015)
[7] Faulkner, arXiv:2209.03699 (2022)
[8] Faulkner & Livingstone, arXiv:2209.03699 (2022)
03/11 Marie-Therese WolframLink opens in a new window (Warwick) Ensemble Inference Methods for Models with Noisy and Expensive Likelihoods (joint work with O.R. Dunbar, A. Duncan, A.M. Stuart)

Note: 

Hybrid

SlidesLink opens in a new window

The increasing availability of data presents an opportunity to calibrate unknown parameters which appear in complex models of phenomena in the biomedical, physical, and social sciences. However, model complexity often leads to parameter-to-data maps which are expensive to evaluate and are only available through noisy approximations.
 
In this talk we focus on interacting particle systems for the solution of the resulting inverse problems for parameters. Of particular interest is the case where the available forward model evaluations are subject to rapid fluctuations, in parameter space, superimposed on the smoothly varying large-scale parametric structure of interest. Multiscale analysis is then used to analyze the behavior of interacting particle system algorithms when rapid fluctuations, which we refer to as noise, pollute the large-scale parametric dependence of the parameter-to-data map. Ensemble Kalman methods and Langevin-based methods (the latter use the derivative of the parameter-to-data map) are compared in this light. The ensemble Kalman methods are shown to behave favorably in the presence of noise in the parameter-to-data map, whereas Langevin methods are adversely affected. On the other hand, Langevin methods have the correct equilibrium distribution in the setting of noise-free forward models, while ensemble Kalman methods only provide an uncontrolled approximation, except in the linear case. Therefore a new class of algorithms, ensemble Gaussian process samplers, which combine the benefits of both ensemble Kalman and Langevin methods, are introduced and shown to perform favorably.
10/11 Joe BentonLink opens in a new window (Oxford) From Denoising Diffusions to Denoising Markov Models

Note:

In-person

SlidesLink opens in a new window

Denoising diffusions are state-of-the-art generative models exhibiting remarkable empirical performance. They work by diffusing the data distribution into a Gaussian distribution and then learning to reverse this noising process to obtain synthetic datapoints. In this talk, we will introduce denoising Markov models (DMMs), a unifying framework generalising diffusion models to a wide class of spaces. We will show how to derive suitable training objectives for DMMs and discover that DMMs lead to a novel and principled generalization of score matching to arbitrary state spaces.
17/11 Woojung Kim (Warwick) Latent variable approaches for multimorbidity analysis

Note:

In-person

at MB0.02

12.45-13.30

SlidesLink opens in a new window

Multimorbidity refers to the acquisition of multiple long-term chronic health conditions in a single person. This is becoming an increasing public health issue with aging populations and gaining insights into patterns of multimorbidity is essential for managing increased health system burdens.

In this talk, we focus on latent variable approaches for the analysis of multimorbidity. We introduce a novel discrete latent variable approach for multimorbidity acquisition within a Bayesian framework of latent feature allocation, which allows an individual's morbidity profile to be driven by multiple latent factors and allows the modelling of age-dependent multimorbidity trajectories. We demonstrate the utility of our model in applications to both simulated data and disease event data from patient electronic health records. In each setting, we show our model can reconstruct clinically meaningful latent multimorbidity patterns and their age-dependent prevalence trajectories.

24/11

Filippo Ascolani (Università Bocconi)

Dimension-free mixing times of coordinate-wise samplers for Bayesian hierarchical models

Note:

In-person

SlidesLink opens in a new window

Coordinate-wise MCMC schemes (e.g. Gibbs and Metropolis-within-Gibbs) are popular algorithms to sample from posterior distributions arising from Bayesian hierarchical models. We introduce a novel technique to analyse the asymptotic behaviour of their mixing times, based on tools from Bayesian asymptotics. We apply our methodology to high-dimensional hierarchical models, obtaining dimension-free convergence results for Gibbs under random data-generating assumptions, for a broad class of two-level models with generic likelihood function. Specific examples with Gaussian, binomial and categorical likelihoods are discussed. We then extend the results to Metropolis-within-Gibbs schemes combining the Bayesian asymptotics approach with a novel notion of conditional conductance.
01/12 Tengyao Wang (LSE)
Sparse change detection in high-dimensional linear regression

Note:

In-person

SlidesLink opens in a new window

We introduce a new methodology 'charcoal' for estimating the location of sparse changes in high-dimensional linear regression coefficients, without assuming that those coefficients are individually sparse. The procedure works by constructing different sketches (projections) of the design matrix at each time point, where consecutive projection matrices differ in sign in exactly one column. The sequence of sketched design matrices is then compared against a single sketched response vector to form a sequence of test statistics whose behaviour shows a surprising link to the well-known CUSUM statistics of univariate changepoint analysis. The procedure is computationally attractive, and strong theoretical guarantees are derived for its estimation accuracy. Simulations confirm that our methods perform well in extensive settings, and a real-world application to a large single-cell RNA sequencing dataset showcases the practical relevance.
08/12

Richard EverittLink opens in a new window (Warwick)

Massimiliano TamborrinoLink opens in a new window (Warwick)

1. Ensemble Kalman inversion approximate Bayesian computation
2. Network inference in a stochastic multi-population neural mass model via approximate Bayesian computation

Note:

In-person

Slides1Link opens in a new window

Fig1Link opens in a new window Fig2Link opens in a new window

Slides2Link opens in a new window

1.
Approximate Bayesian computation (ABC) is the most popular approach to inferring parameters in the case where the data model is specified in the form of a simulator. It is not possible to directly implement standard Monte Carlo methods for inference in such a model, due to the likelihood not being available to evaluate pointwise. The main idea of ABC is to perform inference on an alternative model with an approximate likelihood, sometimes known as the ABC likelihood. The ABC likelihood is chosen such that an unbiased estimator of it is easy to construct from simulations from the data model, allowing the use of pseudo-marginal Monte Carlo algorithms for inference under the approximate model. The central challenge of ABC is then to trade-off bias (introduced by approximating the model) with the variance introduced by estimating the ABC likelihood. Stabilising the variance of the ABC likelihood requires a computational cost that is exponential in the dimension of the data, thus the most common approach to reducing variance is to perform inference conditional on summary statistics.
In this talk we introduce a new approach to estimating the ABC likelihood: using ensemble Kalman inversion (EnKI). Ensemble Kalman algorithms are Monte Carlo approximations of Bayesian inference for linear/Gaussian models. These methods are often applied outside of the linear/Gaussian setting being used, for example, as an alternative to particle filtering for inference in non-linear state space models. Loosely speaking, EnKI can be used as an alternative to an SMC sampler on a sequence of annealed likelihoods. We see that EnKI has some appealing properties when used to estimate the ABC likelihood. It circumvents the exponential scaling with dimension of standard ABC, and does not require the reparameterisation imposed by the rare-event SMC approach of Prangle et al. (2018). It is able to achieve this with no additional simulations from the data model, thus it is likely to bring the most benefit in cases where this model is very expensive to simulate.

2.

The aim of this article is to infer the connectivity structures of brain regions before and during epileptic seizure. Our contributions are fourfold. First, we propose a 6N-dimensional stochastic differential equation for modelling the activity of N coupled populations of neurons in the brain. This model further develops the (single population) stochastic Jansen and Rit neural mass model, which describes human electroencephalography (EEG) rhythms, in particular signals with epileptic activity. Second, we construct a reliable and efficient numerical scheme for the model simulation, extending a splitting procedure proposed for one neural population. Third, we propose an adapted Sequential Monte Carlo Approximate Bayesian Computation algorithm for simulation-based inference of both the relevant real-valued model parameters as well as the {0,1}-valued network parameters, the latter describing the coupling directions among the N modelled neural populations. Fourth, after illustrating and validating the proposed statistical approach on different types of simulated data, we apply it to a set of multi-channel EEG data recorded before and during an epileptic seizure. The real data experiments suggest, for example, a larger activation in each neural population and a stronger connectivity on the left brain hemisphere during seizure.
Reference: S. Ditlevsen, M. Tamborrino, I. Tubikanec, Preprint at arXiv:2306.15787Link opens in a new window