Our work - From theory to practice.

We believe that your company deserves products that are based on the latest and greatest scientific advances. Our suite of Interpolation Machines™ delivers security while not compromising on user experience, allowing your company to prioritise safety and maximise efficiency in your AI solutions.

We have presented our research at these world-class conferences!

  • Phobia
  • Family Fund
  • Mail Smirk
  • Home Work
  • Unseal

Publications our team have previously authored

International Conference of Machine Learning (ICML)

Random Matrix Theory, Critical and Robust Layers, Data-Free Methods

Data Free Metrics Are Not Reparameterisation Invariant Under the Critical and Robust Layer Phenomena

Data-free methods for analysing and understanding the layers of neural networks have offered many metrics for quantifying notions of strong versus weak layers, with the promise of increased interpretability. We examine how robust data-free metrics are under random control conditions of critical and robust layers. Contrary to the literature, we find counter-examples that provide counter-evidence to the efficacy of data-free methods. We show that data-free metrics are not reparameterisation invariant in these conditions and lose predictive capacity across correlation measures. Thus, we argue that to understand neural networks fundamentally, we must rigorously analyse the interactions between data, weights, and resulting functions that contribute to their outputs, contrary to traditional Random Matrix Theory perspectives.

International Conference of Machine Learning (ICML)

Sample Efficency, Grokking, SVD and Compression

Decomposed Learning: An Avenue for Mitigating Grokking

Grokking is a delayed transition from memorisation to generalisation in neural networks. It challenges perspectives on efficient learning, particularly in structured tasks and small-data regimes. We explore grokking in modular arithmetic from the perspective of a training pathology. We use Singular Value Decomposition (SVD) to modify the weight matrices of neural networks by changing the representation of the weight matrix into the product of three matrices. Through empirical evaluations on the modular addition task, we show that this representation significantly reduces the effect of grokking and, in some cases, eliminates it.

International Conference of Machine Learning (ICML)

Loss Landscape Geometry, Robustness, Calibration, Functional Similarity, Safety

Generalisation and Safety Critical Evaluations at Sharp Minima: A Geometric Reappraisal

The geometric flatness of neural network minima has long been associated with desirable generalisation properties. In this paper, we extensively explore the hypothesis that robust, calibrated and functionally similar models sit at flatter minima, inline with prevailing understandings of the relationship between flatness and generalisation. Contrary to common assertions in the literature, we find a relationship between increased sharpness, generalisaton, calibration, robustness and functional representation in neural networks across architectures when using Sharpness Aware Minimisation, augmentation and weight decay as regulariser controls. Our findings suggest that the role of increased sharpness should be considered independently for individual models when reasoning about the geometric properties of neural networks. We show that sharpness can be related to generalisation and safety-relevant properties against the flatter minima found without the use of our regularisation controls. Understanding these properties calls for a re-evaluation of the role of sharpness in geometric landscapes.

International Conference of Machine Learning (ICML)

Reproducibility, Scientific Standards, Open Science

Reproducibility: The New Frontier in AI Governance

AI policymakers are responsible for delivering effective governance mechanisms that can provide safe, aligned and trustworthy AI development. However, the information environment offered to policymakers is characterised by an unnecessarily low Signal-To-Noise Ratio, favouring regulatory capture and creating deep uncertainty and divides on which risks should be prioritised from a governance perspective. We posit that the current publication speeds in AI combined with the lack of strong scientific standards, via weak reproducibility protocols, effectively erodes the power of policymakers to enact meaningful policy and governance protocols. Our paper outlines how AI research could adopt stricter reproducibility guidelines to assist governance endeavours and improve consensus on the AI risk landscape. We evaluate the forthcoming reproducibility crisis within AI research through the lens of crises in other scientific domains; providing a commentary on how adopting preregistration, increased statistical power and negative result publication reproducibility protocols can enable effective AI governance.

Neural Information Processing Systems (NeurIPS)

Knowledge Distillation, Compression, Functional Analaysis

Knowledge Distillation: The Functional Perspective

Empirical findings of accuracy correlations between students and teachers in the knowledge distillation framework have served as supporting evidence for knowl- edge transfer. In this paper, we sought to explain and understand the knowledge transfer derived from knowledge distillation via functional similarity, hypothesising that knowledge distillation provides a functionally similar student to its teacher model. While we accept this hypothesis for two out of three architectures across a range of metrics for functional analysis against four controls, the results show that knowledge transfer is significant but it is less pronounced than expected for conditions that maximise opportunities for functional similarity. Furthermore, results from the use of Uniform and Gaussian Noise as teachers suggest that the knowledge-sharing aspects of knowledge distillation inadequately describe the accuracy benefits witnessed when using the knowledge distillation training setup itself. Moreover, in the first instance, we show that knowledge distillation is not a compression mechanism but primarily a data-dependent training regulariser with a small capacity to transfer knowledge in the best case.

Neural Information Processing Systems (NeurIPS)

Calibration, Generalisation, Flat and Sharp Minima,

Explicit Regularisation, Sharpness and Calibration

We probe the relation between flatness, generalisation and calibration in neural networks, using explicit regularisation as a control variable. Our findings indicate that the range of flatness metrics surveyed fail to positively correlate with variation in generalisation or calibration. In fact, the correlation is often opposite to what has been hypothesized or claimed in prior work, with calibrated models typically existing at sharper minima compared to relative baselines, this relation exists across model classes and dataset complexities.

International Conference of Learning Represenations (ICLR)

Model Compression, Pruning, Quantization, Knowledge Distillation, Functional Analaysis

Neural Network Compression: The Functional Perspective

Compression techniques, such as Knowledge distillation, Pruning, and Quantization reduce the computational costs of model inference and enable on-edge machine learning. The efficacy of compression methods is often evaluated through the proxy of accuracy and loss to understand similarity of the compressed model. This study aims to explore the functional divergence between compressed and uncompressed models. The results indicate that Quantization and Pruning create models that are functionally similar to the original model. In contrast, Knowledge distillation creates models that do not functionally approximate their teacher models. The compressed model resembles the dissimilarity of function observed in independently trained models. Therefore, it is verified, via a functional understanding, that Knowledge distillation is not a compression method. Thus, leading to the definition of Knowledge distillation as a training regulariser given that no knowledge is distilled from a teacher to a student.

International Conference of Learning Represenations (ICLR)

Pruning, Deep Learning, Neural Networks, Interpretability, Loss landscapes, Optimization, Kurtosis

What Makes a Good Prune? Maximal Unstructured Pruning for Maximal Cosine Similarity

Pruning is an effective method to reduce the size of deep neural network mod- els, maintain accuracy, and, in some cases, improve the network’s overall perfor- mance. However, the mechanisms underpinning pruning remain unclear. Why can different methods prune by different percentages yet achieve similar per- formance? Why can we not prune at the start of training? Why are some models more amenable to being pruned than others? Given a model, what is the maximum amount it can be pruned before significantly affecting the perfor- mance? This paper explores and answers these questions from the global un- structured magnitude pruning perspective with one epoch of fine-tuning. We develop the idea that cosine similarity is an effective proxy measure for func- tional similarity between the parent and the pruned network.

Want to learn more?

Our offices

  • London
    Shoreditch, Tower Hamlets,
    United Kingdom.