Statistics and Data Science seminars - School of Mathematics

The Statistics and Data Science seminar is held regularly during termtime.

Semester 1, 2025-26

Pattern Causality: Tapping into Hidden Time Series Dynamics

Stavros Stavroglou, University of Edinburgh

Monday 6 October 2025, 14:00-15:00
Watson Building, Lecture Theatre C

This paper introduces the R package patterncausality, which implements a causal inference method grounded in non-linear dynamics and chaos theory. The algorithm serves as a tool for quantifying causality by utilising reconstructed attractors and spatial mappings. The paper provides a concise overview of the algorithm, detailed descriptions of each function within the package, and several illustrative examples. As a modular algorithm, it offers a framework for causal analysis that is both extensible and adaptable.

A classification of protein backbones by complete and bi-continuous invariants in linear time

Olga Anosova, University of Liverpool

Monday 13 October 2025, 14:00-15:00
Watson Building, Lecture Theatre C

Proteins are large biomolecules that regulate all living organisms and consist of one or several chains. The primary structure of a protein chain is a sequence of amino acid residues whose three main atoms (alpha-carbon, nitrogen, and carbonyl carbon) form a protein backbone. The tertiary structure of a protein chain is a geometric graph represented by atomic positions in 3-dimensional space. Because different geometric graphs often have distinct functional properties, it is important to continuously quantify differences in rigid graphs of protein backbones. Unfortunately, many widely used similarities of proteins fail the axioms of a metric and discontinuously change under tiny perturbations of atoms. This work develops a complete invariant that identifies any protein backbone in 3-dimensional space, uniquely under rigid motion. This invariant is Lipschitz bi-continuous in the sense that it changes up to a constant multiple of a maximum perturbation of atoms, and vice versa. The new invariant detected thousands of (near-)duplicates in the Protein Data Bank, whose presence skews machine learning predictions. The talk is based on the paper in MATCH 2025.

Doubly Robust Alignment for Large Language Models

Chengchun Shi, London School of Economics

Monday 20 October 2025, 15:15-16:15
Arts, Lecture Room 2

This paper studies reinforcement learning from human feedback (RLHF) for aligning large language models with human preferences. While RLHF has demonstrated promising results, many algorithms are highly sensitive to misspecifications in the underlying preference model (e.g., the Bradley-Terry model), the reference policy, or the reward function, resulting in undesirable fine-tuning. To address model misspecification, we propose a doubly robust preference optimisation algorithm that remains consistent when either the preference model or the reference policy is correctly specified (without requiring both). Our proposal demonstrates superior and more robust performance than state-of-the-art algorithms, both in theory and in practice. The code is available at the GitHub respository.

Developments in comparative judgement: statistics, education and other applications

Ian Jones, Loughborough University

Wednesday 5 November 2025, 13:00-14:00
Watson Building, Lecture Theatre C

I will present a straightforward and adaptable approach to measurement called 'comparative judgement (CJ)'. CJ involves participants making pairwise comparisons, and we then statistically model the resulting binary dataset to create a measurement scale. I'll summarise how I have used CJ in my own research to measure conceptual understanding, proof comprehension, and other important but elusive mathematics learning outcomes. We will then look at how CJ is used for statistical analysis across all kinds of disciplines—from philosophy to human geography—leading to the recently founded Comparative Judgement Research Consortium. Finally, I'll swerve back into education to look at how CJ is being used as a pedagogic tool for assessment and feedback across university subjects.

Structured Linear Controlled Differential Equations: Expressive and Parallel Sequence Models

Lingyi Yang, University of Oxford

Monday 10 November 2025, 14:00-15:00
Watson Building, Lecture Theatre B

When designing the architecture of deep sequence models, we want state-transition matrices that are expressive enough to capture complex patterns while maintaining the ability to be trained at scale. In this talk, I will introduce Structured Linear Controlled Differential Equations (SLiCEs), a unifying neural differential framework that brings together existing structured approaches as well as introduces new structures motivated by this issue. SLiCEs with block-diagonal, sparse, and Walsh-Hadamard transition structures can retain the expressivity of dense models while being cheaper to compute. On benchmarks, SLiCEs solve the A5 state-tracking task with a single layer, achieve best-in-class generalisation on regular language tasks, and match state-of-the-art performance on time-series classification while cutting per-step training time by a factor of twenty.

Machine learning for hydrodynamic stability

David Silvester, University of Manchester

Monday 17 November 2025, 14:00-15:00
Watson Building, Lecture Theatre B

This talk builds on recent developments in the design of computational solution methods for the Navier-Stokes equations modelling incompressible fluid flow. A data driven strategy for investigating the stability of flow problems is proposed herein. Our computational procedure is demonstrably robust and does not require extensive parameter tuning. The essential feature of our strategy is that the computational solution of the Navier-Stokes equations is a reliable proxy for laboratory experiments investigating sensitivity to flow parameters.

Hydrodynamic stability has been extensively studied over the last century. Where is the novelty? Machine learning will be shown to be useful in two ways: (i) classification of the hydrodynamic stability boundary and (ii) generation of sampling points using a generative flow-based learning strategy for density estimation. An adaptive strategy that incorporates these two features will be shown to provide new insight into the stability of a classical buoyancy-driven flow problem.

This is joint work with Anshima Singh.

The geo-mapping problem in Geometric Data Science

Vitaliy Kurlin, University of Liverpool

Monday 8 December 2025, 14:00-15:00
Watson Building, Lecture Theatre B

The talk introduces the central problem in the emerging area of Geometric Data Science [1], which aims to continuously parametrise moduli spaces of all real objects under practical equivalences. The key example is a cloud $A$ of unordered points under an isometry in $\mathbb{R}^n$. The following geo-mapping problem formalises the fundamental questions 'Same or different? If different, by how much?' by requiring a geocode $I$ (geographic-style invariant) of clouds of $m$ unordered points satisfying the conditions below.
(a) Completeness: any clouds $A$, $B$ in $\mathbb{R}^n$ are related by rigid motion if and only if $I(A)=I(B)$;
(b) Realisability: the invariant space $\{I(A)\text{ for all clouds $A$ in }\mathbb{R}^n\}$; is explicitly parameterised so that any sampled value $I(A)$ can be realised by a cloud $A$, uniquely under motion in $\mathbb{R}^n$;
(c) Bi-continuity: the bijection from the space of clouds to the space of complete invariants is Lipschitz continuous in both directions in a suitable metric $d$ on the invariant space;
(d) Computability: the invariant $I$, the metric $d$, and a reconstruction of $A$ in $\mathbb{R}^n$ from $I(A)$ can be obtained in polynomial time in the size of $A$, for a fixed dimension $n$.

The talk will outline a full solution to this problem [3] for point clouds, which remains open for other data (graphs, complexes) and relations (affine or projective maps), see the latest results at https://kurlin.org/research-papers.php#Geometric-Data-Science.

[1] V.Kurlin. Complete and continuous invariants of 1-periodic sequences in polynomial time. SIAM J Mathematics of Data Science, v.7, p.1643-1663 (2025).

[2] P.Smith, V.Kurlin. Generic families of finite metric spaces with identical or trivial 1-dimensional persistence. J Applied and Computational Topology, v.8, p.839-855 (2024).

[3] D.Widdowson, V.Kurlin. Recognizing rigid patterns of unlabeled point clouds by complete and continuous isometry invariants with no false negatives and no false positives. CVPR 2023, p.1275-1284.