Geometry of Closed-Form Statistical Manifolds

Jun 06, 2025

Machine learning models grounded in information geometry can be challenging due to the intractability of their geometric properties.

This article, the first in a series dedicated to information geometry, explores simpler, closed-form statistical manifolds that offer practical insights for developing geometry-aware deep learning models.

🎯 Why this Matters
🎨 Modeling & Design Principles
Overview
Statistical Manifolds
Exponential Distributions
Geometric Distributions
Poisson Distributions
Binomial Distributions with Fixed Draws
⚙️ Hands-on with Python
Environment
📈 Visualization
Setup
Fisher Information Metric
Exponential Maps
Logarithm Maps
🧠 Key Takeaways
📘 References
🛠️ Exercises
💬 News & Reviews

🎯 Why this matters

Purpose: Statistical manifolds generalize the concept of Riemannian manifolds to the space of probability distributions. While their geometric properties—such as exponential and logarithm maps or distance—are often intractable, univariate distributions typically allow for closed-form or analytical expressions of these properties.

Audience: Data scientists and engineers involved in model relying on information geometry.

Value: Discover how to easily compute the geometric properties of closed-form statistical manifolds, including the Exponential, Binomial, and Geometric distributions.

🎨 Modeling & Design Principles

📌 This article explores key components of Fisher-Riemannian manifolds, including geodesics, as well as the exponential and logarithmic maps. A subsequent article will delve into applying the Fisher-Rao information metric to compute geodesic distances and inner products.

Overview

Information geometry is a branch of mathematics that leverages differential geometry to analyze the structure of statistical models. Its central idea is to treat families of probability distributions as geometric objects called statistical manifolds.

A statistical manifold is a smooth manifold in which each point represents a probability distribution, parameterized by one or more variables—for example, the rate parameter in an exponential distribution or the mean and standard deviation in a normal distribution.

By extending Riemannian geometry to the space of probability density functions, statistical manifolds provide a geometric framework for statistical inference. As such, a foundational understanding of differential geometry and Riemannian manifolds is recommended for readers.

Information geometry has various applications to machine learning:

Bayesian inference using divergence (Kullback-Leibler, Bregman,...)
Geometrically-informed priors
Regularization for highly curved region on statistical manifolds
Embedding on hypersphere
Stiefel manifolds

💡 A manifold is a topological space that, around any given point, closely resembles Euclidean space. Specifically, an n-dimensional manifold is a topological space where each point is part of a neighborhood that is homeomorphic to an open subset of n-dimensional Euclidean space.

Smooth or Differential manifolds are types of manifolds with a local differential structure, allowing for definitions of vector fields or tensors that create a global differential tangent space.

A Riemannian manifold is a differential manifold that comes with a metric tensor, providing a way to measure distances and angles.

For readers unfamiliar with differential geometry—including concepts like the Riemannian metric, geodesics, tangent spaces, and manifolds—I recommend reviewing my introductory articles [ref 1, 2] and tutorials [ref 3, 4, and 5].

Statistical Manifolds

Metric

A Fisher-Riemann manifold is a differentiable manifold MM whose points correspond to probability distributions p(x∣θ) from a statistical model parameterized by θ∈Θ⊂Rn, and which is equipped with the Fisher information metric [ref 6].

The Fisher information metric or Fisher-Rao metric is a Riemannian metric defined on the parameter space of a family of probability distributions, constructed using the Fisher information matrix. It remains invariant under reparameterization and is the only Riemannian metric that aligns with the model’s notion of information about its parameters [ref 7].

From the Riemannian metric:

\( g_{j k}(\theta)=\int_X \frac{\partial \log p(x, \theta)} {\partial \theta_j}\frac{\partial \log p(x, \theta)} {\partial \theta_k} p(x, \theta) d x \ \ \ \ [A] \)

The Fisher information for a continuous probability density function p is defined as

\(I_{X}(\theta)=g(\theta)= \int_{X}^{.} \left( \frac{\partial \ log \ p(x|\theta)}{\partial \theta} \right)^2.p_{\theta}(x).dx \)

and for discrete distributions,

\(I_{X}(\theta)=\sum_{x \in X}^{}\left( \frac{\partial \ log \ p(x|\theta)}{\partial \theta} \right)^2 . p_{\theta}(x)\)

Under mild regularity conditions, the Fisher Information metric can be re-written

\(I_{X}(\theta)=g(\theta) = \int_{X}^{.} \frac{\partial^2 log \ p(x|\theta)}{\partial \theta^2}.p_{\theta}(x).dx\)

The Fisher information induces a positive semi-definite metric on the parameter manifold, allowing the use of tools from differential/Riemannian geometry.

Fig. 1 Visualization of the elements of a statistical manifold

The Fisher metric is invariant under sufficient statistics, making it intrinsic to the statistical model.

⚠️ The terminology found in the literature can be confusing. Strictly speaking the Fisher-Riemann manifold is the manifold of distributions equipped with the Fisher-Rao metric. It is also called the Fisher information manifold or the statistical manifold.

Non-tractable Manifolds

For most statistical manifolds associated with multivariate distributions, geometric properties such as the exponential and logarithm maps, inner product, and distance generally lack closed-form (analytical) expressions.

Both formulas A and B rely on an integral to compute the Fisher-Riemannian metric over a given domain or interval. As a result, for non-tractable statistical manifolds, the lower and upper bounds also known as support of the integral must be specified.

Exponential & Log Maps

In information geometry, an exponential manifold is defined over a statistical model that forms an exponential family. The exponential map at a point θ maps a tangent vector v at θ to a point on the manifold along the geodesic defined by v.

📌 Multi-dimensional statistical manifolds are typically intractable. However, by reducing the number of parameters to one—such as fixing the standard deviation—it becomes possible to derive closed-form expressions for the exponential and logarithm maps.

Exponential Distributions

Probability Density Function

Given a rate parameter θ > 0 the probability density function of the exponential distribution is

\(p(x|\theta)= \begin{matrix} \theta e^{-\theta x} & x > 0 \\ 0 \ \ \ \ \ \ \ & x \leq 0 \end{matrix} \)

Fisher Information Metric

The Riemannian metric associated with the exponential distribution with

\(g(\theta )= \frac{1}{\theta^{2}} \ \ \ \ \ [1]\)

Exponential map

Fisher-Rao geodesics correspond to exponential curves in probability space, but the parameterization remains linear in θ, so the exponential map is affine.

\(exp_{\theta}(v)=\theta + v \ \ \ \ \ [2]\)

logarithm map

In general, the logarithm map is the inverse of the exponential map. Given a base point on the Exponential manifold θ1 that points towards a point θ2:

\(log_{\theta_{1}}\left ( \theta_{2} \right )=\theta_{1}.log\left ( \frac{\theta_{2}}{\theta_{1}} \right ) \ \ \ \ \ [3]\)

Geometric Distributions

Probability Density Function

Given the probability of success p at each trial, the probability of success after k trials is

\( p(X=k)=p(1-p)^{k-1} \ \ \ \ k \in 1, 2, ... \)

Fisher Information Metric

\(g(p)=\frac{1}{p^{2}(1-p)} \ \ \ \ \ [4]\)

Exponential Map

Given a tangent vector v and a base point p₁on the Geometric Distribution manifold

\(exp_{p_{1}}(v)=1-tanh^{2}\left ( \frac{\phi (p_{1})+v}{2} \right ) \ \ \ \ \ [5]\)

Logarithm Map

Given a base point p₁and a point on a Geometric distribution manifold p₂

\(log_{p_{1}}(p_{2})= -2\left ( arctanh\left ( \sqrt{1-p_{2}} \right )-arctanh\left ( \sqrt{1-p_{1}} \right ) \right ) \ \ \ \ \ [6]\)

Poisson Distributions

Probability Density Function

Given a Poisson distribution with the expectation λ (rate parameter) events for a specific interval, the probability of k events in the same interval is

\(p(x=k|\lambda)=\frac{e^{-\lambda}\lambda^{k}}{k!} \ \ \ \ \ k \in \left\{0, 1, 2, .., \lambda \right\}\)

Fisher Information Metric

\(g(\lambda)=\frac{1}{\lambda} \ \ \ \ \ [7]\)

Exponential Map

Given a base point λ₁on the Poisson manifold and a tangent vector v

\(exp_{\lambda_{1}}(v)=\left ( \sqrt{\lambda_{1}} + \frac{v}{2} \right )^2 \ \ \ \ \ [8]\)

Logarithm Map

Given a base point λ₁and a point on a manifold λ₂

\(log_{\lambda_{1}}(\lambda_{2})=2\left ( \sqrt{\lambda_{2}}-\sqrt{\lambda_{1}} \right ) \ \ \ \ \ [9]\)

Binomial Distributions with Fixed Draws

Binomial Distribution has a closed form in the case number of draws is fixed.

Probability Density Function

Given a probability distribution of success p in a sequence of n independent experiments.

\(p(X=k)=\binom{n}{k}p^{k}(1-p)^{n-k} \ \ \ \ k={0, 1, ... n} \ \ \ \ n \ constant \)

Fisher Information Metric

The Riemann metric g is defined as

\(g(p)=\frac{n}{p(1-p)} \ \ \ \ \ n \ constant \ \ \ \ \ [10]\)

Exponential map

Given a tangent vector v and a base point p₁

\(exp_{p_{1}}(v)=sin^{2}\left ( arcsin(\sqrt{p_{1}})+\frac{v}{2\sqrt{n}} \right ) \ \ \ \ n \ constant \ \ \ \ \ [11]\)

Logarithm Map

Given a base point p₁and a point on a binomial manifold p₂

\(log_{p_{1}}(p_{2})=2\sqrt{n}\left ( arcsin(\sqrt{p_{2}})- arcsin(\sqrt{p_{1}})\right ) \ \ \ \ \ p_{1}, p_{2}\in (0, 1) \ \ \ \ \ \ [12]\)

📌 The computation of the exponential and logarithm map for Normal Distribution is intractable, However, fixing either the mean or standard deviation generates a closed form.

⚙️ Hands-on with Python

Environment

Libraries: Python 3.12.5, PyTorch 2.5.0, Numpy 2.2.0, Geomstats 2.8.0
Source code: Github.com/geometriclearning/geometry/information_geometry
To enhance the readability of the algorithm implementations, we have omitted non-essential code elements like error checking, comments, exceptions, validation of class and method arguments, scoping qualifiers, and import statements.

Our implementation of the evaluation of various statistical manifolds relies on Geomstats Python library [ref 8], introduced in a previous article [ref 9].

📈 Visualization

First let’s briefly review these family of one-dimensional distribution by sampling their parameter and visualizing their probability density functions.

Fig. 2 Family of Exponential Distributions with random samples of the natural parameter θ

Fig. 3 Family of Geometric Distributions with random samples of the probability of success p

Fig. 4 Family of Poisson Distributions with random samples of the rate parameter λ

Fig. 5 Family of Binomial Distributions with random samples of probability of success p

Setup

Let’s create a class, CFStatisticalManifold to encapsulates the various operations on the statistical manifolds with closed-form for exponential and logarithm map (Code snippet 1).

📌 I limit this evaluation to four families of distributions: Exponential, Poisson, Binomial, and Geometric. Several other distributions, like the Bernoulli, also admit closed-form expressions for the exponential and logarithm maps.

The constructor initializes the distribution family on the manifold, info_manifold and sets up a reference to the Fisher-Rao metric provided by Geomstats, fisher_rao_metric, using predefined bounds.

⚠️ Computing the Fisher metric in Geomstats requires either the Autograd or PyTorch backend.

The method belongs (Code snippet 2) checks whether each point lies on the manifold by calling the corresponding belongs method from Geomstats.

As discussed in a previous article on Riemannian manifolds like the hypersphere, this evaluation depends on the framework’s ability to generate random data points on statistical manifolds, as illustrated in Code snippet 3.

Let’s now put the current implementation of our CFStatisticalManifold class to use, as shown in Code snippet 4.

Output

Exponential Distribution Manifold 8 random samples 
tensor([0.8494])
tensor([0.1072])
tensor([0.6849])
tensor([0.3375])
tensor([0.1472])
tensor([0.4026])
tensor([0.4321])
tensor([0.2589])
Geometric Distribution Manifold 4 random samples
tensor([0.4640])
tensor([0.6331])
tensor([0.5167])
tensor([0.7566])

Fisher Information Metric

The computation of the exponential and logarithm maps, distance, and inner product relies on the Fisher-Rao metric g. The metric_matrix method (Code Snippet 5) implements the computation of this metric at a specified point on the manifold, referred to as the base_point. If no base point is provided, the method selects one at random.

In all cases, the base point is first validated to ensure it lies on the manifold. The actual metric computation is then performed using a call to the Geomstats API.

Let’s compute the Fisher metric for the Exponential and Poisson distributions. Since these are univariate distributions, both the points on the manifold and the resulting metric are represented as 1×1 matrices

Output:

Exponential Distribution Fisher metric: tensor([[1.0288]])
Poisson Distribution Fisher metric: tensor([[3.1438]])

Exponential Maps

Let’s implement the computation of endpoint locations on the manifold, (method exp) given a base point and a tangent vector, for these four closed-form manifolds using the formulas defined in the previous section for the exponential map (Code snippet 7).

Finally, we leverage the method exp to compute the end points on the Exponential and Geometric manifolds, exponential_end_point (resp. geometric_end_point).

Output

Exponential Distribution Manifold End point: tensor([0.8868])
Geometric Distribution Manifold End point: tensor([0.9326])

Logarithm Maps

We implement the formulas to compute the tangent vector given a base point and another point on the manifold for the same closed-form statistical manifolds (method log in code snippet 9).

Exponential Distribution Manifold Tangent Vector
Base:tensor([0.4471]) to:tensor([0.7085]): tensor([0.2059])
Geometric Distribution Manifold Tangent Vector
Base:tensor([0.9392]) to:tensor([0.7573]): tensor([-5.7546])

🧠 Key Takeaways

✅ Information geometry is a field of mathematics that applies differential geometry to study the structure of statistical models. It extends Riemannian geometry to the space of probability density functions, enabling a geometric approach to statistical inference.

✅ While multivariate statistical manifolds often lead to intractable exponential and logarithm maps, univariate distributions typically admit closed-form expressions.

✅ In some cases, fixing one or more parameters of a distribution can simplify an otherwise intractable manifold into a closed-form representation.

✅ The Geomstats library in Python provides a practical toolkit for exploring, analyzing, and applying core concepts of information geometry, though closed-form manifolds can often be computed manually with relative ease.

📘 References

Riemannian Manifolds: Foundational Concepts
Riemannian Manifolds: Hands-on with Hypersphere
Differential Geometric Structures W. Poor - Dover Publications, New York 1981
Introduction to Smooth Manifolds J. Lee - Springer Science+Business media New York 2013
Introduction to Differential Geometry - ETH Zurich
What is Fisher Information? YouTube - Ian Collings
An Elementary Introduction to Information Geometry F. Nielsen - Sony Computer Science Laboratories.
Geomstats
Exploring Geometry Learning with Geomstats

🛠️ Exercises

What is the geometric structure of a point on a statistical manifold defined by n parameters?
What is the Fisher information metric of the normal distribution when the standard deviation is fixed and not treated as a parameter?
Can you implement the Fisher information metric for a normal distribution with a constant, fixed standard deviation?
Can you compute the Fisher-Rao metric manually for the exponential distribution manifold?

👉 Answers

💬 News & Reviews

This section focuses on news and reviews of papers pertaining to geometric deep learning and its related disciplines.

Paper Reviews: Manifold Matching via Deep Metric Learning for Generative Modeling M. Dai, H. Hang

This study advances the recent progress in blending geometry with statistics to enhance generative models. It proposes a novel method for identifying manifolds within Euclidean spaces for generative models like variational encoders and GANs through two neural networks:

Data generator sampling data on the manifold
Metric generator learning geodesic distances.

Metric learning:

The metric generator produces a pullback of the Euclidean space while the data generator produces a push forward of the prior distribution. The algorithm is described with easy-to-follow pseudo code.

The method is tested on unconditional ResNet image creation and GAN-based image super-resolution, showing improved Frechet Inception Distance and perception scores.

This paper will be especially of interest to engineers already familiar with GANs and Frechet metric.

Patrick Nicolas has over 25 years of experience in software and data engineering, architecture design and end-to-end deployment and support with extensive knowledge in machine learning.
He has been director of data engineering at Aideo Technologies since 2017 and he is the author of "Scala for Machine Learning", Packt Publishing ISBN 978-1-78712-238-3 and Geometric Learning in Python Newsletter on LinkedIn.

Appendix

Computation of the Fisher matrix for Normal distribution.

\(log p(x | \mu, \sigma|)= -\frac{log(2\pi\sigma^{2})}{2}-\frac{(x-\mu)^{2}}{2\sigma^{2}}\)

\(\mathbb{E}\left [ \left ( \frac{\partial \ log \ p(x|\mu, \sigma)}{\partial \mu} \right )^{2} \right ] = \frac{1}{\sigma^{2}} \ \ \ \ \mathbb{E}\left [ \left ( \frac{\partial \ log \ p(x|\mu, \sigma)}{\partial \sigma} \right )^{2} \right ] = \frac{2}{\sigma^{2}} \)

\(\mathbb{E}\left [ \frac{\partial \ log \ p(x|\mu, \sigma) }{\partial \mu} \frac{\partial \ log \ p(x|\mu, \sigma) }{\partial \sigma}\right ] = 0\)

Geometry of Closed-Form Statistical Manifolds

Table of Contents

🎯 Why this matters

🎨 Modeling & Design Principles

Overview

Statistical Manifolds

Metric

Non-tractable Manifolds

Exponential & Log Maps

Exponential Distributions

Geometric Distributions

Poisson Distributions

Binomial Distributions with Fixed Draws

⚙️ Hands-on with Python

Environment

📈 Visualization

Setup

Fisher Information Metric

Exponential Maps

Logarithm Maps

🧠 Key Takeaways

📘 References

🛠️ Exercises

💬 News & Reviews

Appendix

Discussion about this post