Exercises Answers 🛠️

Feb 01, 2025

Below are the answers to the exercises from the Hands-on Geometric Deep Learning posts.

Introduction to Geometric Deep Learning

Q1: What are the two most widely used Python libraries for Geometric Deep Learning?

A1: Data scientists often have preferences and strong opinions on the most suitable libraries for their environment and data. This newsletter primarily uses Geomstats (Github geomstats)for foundational concepts like differential geometry and manifolds, while PyG (Pytorch Geometric) is leveraged for Graph Neural Networks and machine learning on data manifolds.

Q2: Can you name four different types of Graph Neural Networks?

A2: The list under the section Types of Graph Neural Networks is far from exhaustive, but as of January 2025, the popular models in research papers are: Graph Convolutional Networks, Graph Attention Networks, GraphSAGE, Spectral Graph Neural Networks, Graph Transformers.

Q3: What are the advantages of using Topological Data Analysis?

A3: Topological Data Analysis (TDA):

Uncovers hidden structures in complex data, such as clusters, loops, and voids, which may not be apparent through standard statistical analysis.
Supports structured, unstructured, and graph-based data.
Captures non-linear patterns and intricate relationships.
Is robust to small perturbations in input data, including noise and drift.

Q4: How does a differential (smooth) manifold differ from a Riemannian manifold?

A4: Riemannian manifolds are smooth manifold equipped with a metric (i.e. Riemann curvature tensor).

Examples of metrics are Euclidean metric, Spherical metric, Poincare disk in H2, Minkowski (theory of relativity) metric, Fisher Information Metric (for statistical manifolds) or Symmetric Positive Definite manifold metric

Q5: Between mesh-based and grid-based learning models, which has the higher computational cost?

A5: The grid-based manifold model typically has a finer granularity, with more cloud points for computing exponential and logarithm maps compared to mesh-based models. As a result, it demands significantly more memory

Insights into Logistic Regression on Riemannian Manifolds

Q1: What two conditions must a Symmetric Positive Definite (SPD) matrix satisfy?

A1: The two conditions are

Matrix be Positive Definite

\(\forall b\in \textrm{R}^{n} \ b \neq 0, \ b^{T}Ab >= 0\)

Symmetric

\(A^{T}=A\)

Q2: Which metric uses an SPD matrix as its covariance matrix?

A2: The Mahalanobis distance on a manifold M between a vector x and a mean vector μ, given a covariance matrix Σ, is defined as:

\(d_{M}(x, \mu )=\sqrt{(x-\mu)^{T}\Sigma^{-1}(x - \mu)}\)

x is the data point (or feature vector).
μμ is the mean of the distribution.
Σ is the covariance matrix of the data.

Q3: In code snippet 9, what would be the mean test scores if a 32 × 32 SPD matrix ( n_features = 32) is used?

A3:

Mean score for Log Euclidean metric: 0.496
Mean score for Affine Invariant metric: 0.500

Q4: What is the formula for generating a random SPD matrix?

A4: The random values have to be positive (for Positive Definite condition) and the matrix has to be symmetric. This can be easily achieved by adding the transposed to a randomly generated matrix.

\(A_{rand-spd} =\frac{1}{2} \left ( A_{rand} + A_{rand}^{T} \right )\)

Q5: Which Python library is commonly used to implement binary logistic regression for SPD matrices?

A5: Geomstats (Github Geomstats) is widely used for studying data manifolds, including in this newsletter. However, for more specialized applications, libraries such as pyRiemann: Machine learning for multivariate data with Riemannian geometry and PyManOpt may be better suited for specific problem domains.

Dive into Functional Data Analysis

Q1: Observations can be classified based on their measurement process (e.g., repeated measurements, regular time intervals). Can you list the three main categories of observed data?

A1: The 3 main categories of observations are:

Panel Data: Data with limited number of repeated measurements for each unit or subject, with varying time points across different subjects.
Time Series: Single observations made at regular time intervals, such as those seen in financial markets.
Functional Data: Data recorded over consistent time intervals and frequencies, featuring a high number of measurements per observational unit/subjects.

Q2: A function space is a manifold composed of which type of functions?

A2: 2 Square-integrable functions: They are defined as

\(\textit{L}^{2}(T)=\left \{ f: T\rightarrow \mathbb{R}| \int_{T}^{.} f(t)^{T}f(t)dt < \infty \right \}\)

Q3: What is the dimensionality of a Hilbert Sphere?

A3: 2) A manifold with infinite dimensions: Hilbert space is defined by continuous functions and therefore has infinite dimension.

Q4: Can you modify test code snippet #5 to compute the inner product using only the first half of each vector? What would be the value of `num_Hilbert_samples`, and what is the resulting inner product, `inner_prod`?

A4: The value of num_Hilbert_samples should be len(vector1) /2 = 4 and the inner product 0.235. The modified code for the test would look like

Q5: How is the inner product of two tangent vectors/functions <f, g> defined?

A5: The formula is similar to the inner product of two vectors:

Vectors space:

\(<\mathbf{v}, \mathbf{w}> = \mathbf{v}^{T}\mathbf{w}=\sum_{i=1}^{n}v_{i}w_{i}\)

Functions space:

\(\left \langle f, g \right \rangle = \int_{M}^{} f(t)^{T}g(t)dt\)

Uniform Manifold Approximation & Projection

Q1: What are the two primary categories of models used for feature reduction while preserving distances?

A1: The two categories are

Models that preserve the global distance such as PCA
Models that preserve local distances such as t-SNE

Q2: What formula does UMAP use to calculate the similarity between two points, x & y?’

A2:

\(sim(x, y)=e^{ - \frac{||x-y|| ^{2}}{\sigma_{x} \sigma_{y}}} \ ; \ \ \sigma_{x} \ \ is \ \ the \ \ scaler \ \ for \ \ x\)

Q3: What is the role of the n_components configuration parameter in t-SNE?

A3: In t-SNE, clusters are visualized based on the n_components configuration parameter: n_components = 2 produces a 2D (x, y) plot, while n_components = 3 generates a 3D (x, y, z) volumetric plot.

Q4: What are the two key configuration parameters in UMAP?

A4:

n_neighbors: parameter that defines the number of neighbors in UMAP. It determines the balance between global and local distances in the data visualization. A smaller number of neighbors means the local neighborhood is defined by fewer data points.
min_dist: parameter that represent the compactness in low dimension. It plays a crucial role in determining the appearance of the low-dimensional representation.
A small min_dist value allows UMAP to pack points closer together in the low-dimensional space

Q5: How does decreasing min_dist affect the UMAP visualization?

A5: Decreasing the value of min_dist will increase the compactness (data points moving closer together in the low-dimensional space). It highlights the preservation of local structure and can make clusters more distinct.

Hands-on Principal Geodesic Analysis

Q1: How does Principal Geodesic Analysis overcome the limitations of Principal Component Analysis?

A1: By assuming that the data resides on a low-dimensional manifold, Principal Geodesic Analysis can effectively handle non-linear data and capture interdependencies among features.

Q2: What is the purpose of computing the eigenvectors of the covariance matrix?

A2: Eigenvectors represent the axes with the greatest variance, making them the most sensitive to variations in feature values.

Q3: Principal components are computed in the locally Euclidean tangent space using:

Exponential map
Projection along a geodesic
Logarithmic map

A3: Logarithmic map. The purpose of Principal Geodesic Analysis is to project the principal components on the geodesic using the logarithmic map.

Q4: How can you extract the singular values and components of PCA using the Scikit-learn library?

A4: Sample code

Q5: What would be the Euclidean PCA matrix and the Tangent Space PCA matrix when set to 64? 1024?

A5: he number of samples has a significant impact of the computation of both PCA and PGA

Riemannian Manifolds: 1. Foundation

Q1: How would you intuitively define a directional derivative?

A1: The directional derivative measures how a function changes as you move in a specific direction.

If you move in the direction of the steepest ascent (the gradient direction), the directional derivative is at its maximum.
If you move perpendicular to the gradient, the directional derivative is zero (meaning the function isn't changing in that direction).
If you move in the direction of descent, the directional derivative is negative.

Q2: In a manifold, does the Euclidean mean of two data points (i.e., the midpoint between them) necessarily lie on the manifold?

A2: No. The Euclidean mean lies on the manifold only if the manifold is flat. In a 3-dimensional space, the mean of two points A and B falls on the straight line connecting them."

Q3: What is the difference between the exponential map and the logarithm map?

A3: The exponential map, commonly referred as exp|p(v) maps a tangent vector v to a point p on the manifold by following a geodesic.

The logarithm map is the inverse of the exponential map. It retrieves the tangent vector that would generate a geodesic from p to another point q.

\( q=exp_{p}(v) \ \ ; \ \ log_{p}(q)=v\)

Q4: When analyzing a low-dimensional structure such as a manifold, Lie group, or embedding within a higher-dimensional Euclidean space, which coordinate system should be used?

Intrinsic coordinates
Extrinsic coordinates

A4: 2) Extrinsic coordinates

Q5: What are the four axioms that a Lie group must satisfy?

A5: The 4 axioms are:

Closure
Associativity
Identity
Invertibility

Q6: How would you intuitively define a Lie algebra?

A6: Intuitively, a Lie algebra is a tangent space of a Lie group. This actually makes sense as a Lie group is fundamentally a smooth manifold.

Riemannian Manifolds: 2. Hands-on with Hypersphere

Q1: What are the two main modules of the Geomstats library? Which module is specifically dedicated to differential geometry?

A1: Geometry and Learning

Geometry is more theoretical and implements the key components of differential geometry such as exponential, logarithm maps, geodesics, tangent space and Lie groups.
Learning extends scikit-learn framework to smooth manifolds.

Q2: Given a point with coordinates (x, y), write a Python function to convert it to polar coordinates (rho, theta).

A2: Just two lines of code

Q3: When computing and visualizing data points on the hypersphere, should the coordinates be intrinsic?

A3: No. It should be extrinsic. The data points on the manifold (Hypersphere) are always visualized in 3D Euclidean space. The last argument of the constructor for HypersphereSpace, intrinsic, default to False.

Q4: What are the two required arguments of the exponential map on the hypersphere, used to compute the endpoint on the manifold?

A4: The end point G(1) is computed as

\(G_{v}(1) = exp_{p}(v)\)

Therefore the two arguments are

Base data point on the Manifold
Tangent vector on the tangent plane

Given a tangent vector tangent_v and a base point base_pt, the end point is computed

Q5: What is the relationship between computing an endpoint using the exponential map and computing a geodesic?

A5: A geodesic consists of an infinite (or practically large) number of points along its path. These points are obtained by applying the exponential map to a sequence of base points on the manifold, given a tangent vector that defines the direction. Thus, a geodesic can be seen as a sequence of points computed through successive applications of the exponential map.

Insights into k-Means on Riemannian Manifolds

Q1: What is the recommended configuration for the implementation k-Means in Euclidean space using Scikit-learn?

A1: The two recommended configuration parameters are

Initialization: k-Means++. In this approach, the first center is selected at random from the data points and the distance from this center to all points is computed. The subsequent centers are selected at random from the data points with probability proportional to the distance.
Algorithm: Elkan: This is a variation of the ubiquitous Lloyd's algorithm that leverages the triangle inequality to reduce the number of distance computations when assigning points to clusters. The Elkan's algorithm requires storage proportional to the number of clusters and the number of data points, making it impractical for very large datasets

Q2: What are the advantages of using the SO(3) Lie Group to generate random clusters of data points on a manifold?

A2: SO3 matrices are invariant under rotation. Therefore they can be used to replicate cluster along the hypersphere

Q3: Which algorithm is most suitable for generating random data points and clusters on a hypersphere?

1- Uniform distribution
2- Von Mises-Fisher
3- Bounded Uniform distribution

A3: Von Mises-Fisher

A uniform random generator distributes data evenly across the hypersphere, making it unsuitable for evaluating k-Means. A bounded uniform random generator can create clusters of data points but fails to preserve the local distance between each cluster point and its centroid. In contrast, the Von Mises-Fisher distribution is specifically designed to account for the curvature of the manifold, making it more suitable for clustering on a hypersphere.

Q4: How does changing the number of samples in Code Snippet 5 affect the visualization? Reducing to 64 samples? Increasing to 1028 samples?

Visualization k-Means clusters on Hypersphere with 64 samples

Visualization k-Means clusters on Hypersphere with 1024 samples

Q5: Can you write Python code to compute the four centroids of data points generated on a hypersphere, using the k-Means class from Scikit-learn?

A5: Here is an example of implementation of k-Means using Scikit-learn library:

Overview of Geomstats for Geometric Learning

Q1: What are the advantages of using a Beta distribution on a manifold?

A1: There are many benefits of applying the Beta distribution on a manifold, among them;

Representation of probability densities over curved spaces
Handling of non Euclidean geometries
Regularization to reduce overfitting
Smooth interpolation of pdf on a manifold

Q2: What is the role of the exponential map in differential geometry?

A2: The exponential map is used to project an end data point onto a manifold given a point of the manifold and a tangent vector. It is also used to compute geodesics.

Q3: Which Geomstats method computes the tangent vector on a manifold, given a base point and a directional vector?

A3: to_tangent which takes a base point on a manifold and vector in Euclidean space

Q4: How does decreasing the number of samples (n_samples) from 6000 to 1000 affect the cross validation on logistic regression on SP3 manifold (code snippet 6)?

A4: Cross validation [0.55 0.525 0.545 0.52 0.525] with mean: 0.533

Q5: Can you implement geodesic computation using the HypersphereMetric geodesic method in Geomstats?

A5: Creating a geodesics consist simply to apply a tangent vector to a sequence base_points of contiguous points on the manifold.

The nested method, geodesic, invoked the Geomstats geodesic method on an instance of the metric class, HypersphereMetric.

Q6: What will be the resulting endpoint on the SO(3) Lie group if the tangent vector represents a 90-degree rotation around the Y-axis?

A6: Executing the code in snippet 7 for a 90-degree rotation around the Y-Axis

Base point:
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
Tangent vector:
[[ 0.  0.  1.]
 [ 0.  1.  0.]
 [-1.  0.  0.]]
End point on SO3:
[[ 0.9196  0.0000  0.9972]
 [ 0.0000  2.0000  0.0000]
 [-0.9972  0.0000  0.9196]]

Reusable Neural Blocks in PyTorch & PyG

Q1: Can you implement a convolutional neural block for 3D objects?

A1: As expected, the class for 3-dimensional convolutional block, `Conv3dBlock` has a similar structure as its 2-dimension counterpart, `Conv2dBlock`

Q2: The `ConvBlock` class, which defines a generic convolutional neural block, includes an optional dictionary of configuration parameters, `attributes` of type Dict[str, nn.Module]. What is its purpose?

A2: There are two key reasons for incorporating a dictionary of PyTorch convolutional modules:

Ensuring module compatibility with the input data's dimensionality (e.g., an image-based convolutional classifier requires `Conv2d`, `BatchNorm2d`, `MaxPool2d`, and `Dropout2d` modules).
Enabling the automatic creation of corresponding deconvolutional blocks.

Q3: Can you write a validation method for a convolutional block using the `attributes` dictionary declared in ConvBlock class?

A3: Here is an implementation of the validation of modules for a 2-dimensional convolutional neural block:

Q4: What are the 3 modules of a variational neural block?

A4: The 3 components of the probabilistic latent space are:

Linear module for the mean of the distribution
Linear module for the log of the variance of the distribution
1-dimension shape linear module for sampling the Gaussian distribution

Q5: Which attributes uniquely define a Graph Convolutional Network (GCN) implemented in the `GCNBlock` class?

A5: The minimum set of attributes for a valid Graph Neural Block are

A generic message-passing module from PyTorch Geometric
An optional batch normalization module
A standard PyTorch activation function
A generic graph pooling mechanism
An optional dropout layer for regularization

Modular Deep Learning Models with Neural Blocks

Q1: Can you explain the Builder Pattern?

A1: Well-known to software developers, the Builder Pattern is a creational design pattern that facilitates the step-by-step construction of complex objects. It enables the creation of different representations of an object while maintaining a consistent construction process [ref: Design Patterns: Elements of Reusable Object-Oriented Software - E. Gamma, R. Helm, R. Johnson, J. Vlissides - Addison-Wesley Publishing 1995]

Q2: What is the purpose of transposing a neural network?

A2: The transposition of a neural network aims to automatically generate a mirrored version of its architecture, such as an encoder-decoder pair. This process enables the automatic construction of an Autoencoder using its encoder components.

For example, transposing a set of four 3D convolutional layers results in a corresponding set of four 3D deconvolutional layers.

Q3: Can you write a function, output_size, to compute the size of the output of an input of an image from the convolutional block with a kernel_size, padding and stride, given the size, input_size = (w, h) of the input data?

A3: You need to apply the resizing formulas for an image

\(W_{conv}[out] = \frac{w+2p-k}{s} +1 \ \ \ \ H_{conv}[out]= \frac{h+2p-k}{s} +1\)

The implementation is straight forward.

Q4: Can you implement the __create_blocks method for the builder of a Multi-Layer Perceptron MLPBuilder using the dictionary of configuration parameters, attributes, initialized in Code Snippet 4?

A4: Here an example of building a sequence of Multi-Layer Perceptron layers using the dictionary of attributes:

Einstein Summation in Geometric Deep Learning

Q1: What is the difference between implicit and explicit subscripts in Einstein summation notation?

A1:

Q2: Can you rewrite the following operations using NumPy's einsum function, given matrices a and b from Code Snippet 1?

\(c=a@b \ \ (Multiplication) \ \ \ \ and \ \ \ \ d=c^{T} \ \ \ (Tranpose)\)

A2:

a @ b:
[[0.7  2.25]
 [2.   4.75]]
c.T:
[[0.7  2.  ]
 [2.25 4.75]]

Q3: Can you use PyTorch's einsum function to compute the gradient of the function:

\(\nabla f \ \ \ given \ \ \ \ f(x, y)=x^{2} +y^{3} + 1\)

A3:

Q4: What happens if you attempt to compute the dot product of two vectors of different sizes? (Referencing Code Snippet 3).

A4: Here are few lines of code to capture the error message:

The output is an error message

Failed with operands could not be broadcast together with remapped shapes [original->remapped]: (3,)->(3,) (4,)->(4,)

Q5: Can you implement a simple validation function to check whether a matrix transpose is correct?

A5: You need to remember that a transpose of a transpose of a matrix is the original matrix.

\((A^{T})^{T} = A^{T}\)

Taming PyTorch Geometric for Graph Neural Networks

Q1: What does edge_index represent in the context of graph data (type Data)?

A1: The edge_index attribute is a 2-row, multi-column tensor that represents the graph's edge connections in PyTorch Geometric. Each column defines an edge, where:

The first row contains the source nodes.
The second row contains the corresponding target nodes.

It serves as the sparse representation of the adjacency matrix, encoding the connectivity of the graph.

Q2: How can you create a Data instance for the given graph using PyTorch Geometric?

A2:

Q3: What is the role of data.train_mask in graph-based learning?

A3: In PyTorch Geometric, the data.train_mask attribute is a boolean mask used to identify which nodes belong to the training set in node classification tasks.

train_mask[n] = True The nth node in the dataset is used for training
train_mask[n] = False The nth node in the dataset is not used for training

The same logic apply for the data.val_mask filter.

Q4: Between GraphConv and GCNConv, which Graph Convolutional operator offers greater stability?

A4: GCNConv is usually more stable.

Practical Introduction to Lie Groups in Python

Q1: Which field(s) benefit the most from for Lie Geometry?

A1:

Robotics: Invariance in rigid body motion
Molecular research: Equivariance in translation and rotation of molecules
Computer vision: Translation, scaling and rotation tnvariance of frames and images

Q2: What are the two conditions that defines a special orthogonal group?

A2:

Orthogonality of fundamental generators of rotations along X, Y and Z axes
Preservation of rotation and orientation

Q3: What is the size of a SE3 element?

A3: 4 x 4 matrix as follows:

\(SE(3)=\left\{ \begin{vmatrix} R & t \\ 0 & 1 \end{vmatrix} \in \mathbb{R}^{4 \ast 4} \ \ | \ R \in SO(3), t \in \mathbb{R}^{3} \right\}\)

Q4: Can you write a code snippet to compute the SO3 element for a 3x3 matrix representing a unit generator of rotation along Z-axis and infer its corresponding algebra element? Can you verify that the computed algebra element is almost identical to the original matrix?

A4: Here is an example of implementation in Python

… and its output …

Algebra element:
[[ 0. -1.  0.]
 [ 1.  0.  0.]
 [ 0.  0.  0.]]
SO3 group element:
[[ 9.19666190e-01 -9.97270378e-01 -3.85335888e-17]
 [ 9.97270378e-01  9.19666190e-01 -3.85335888e-17]
 [ 0.00000000e+00  0.00000000e+00  1.00000000e+00]]
Lie algebra:
[[-1.62568234e-17 -1.00000000e+00  0.00000000e+00]
 [ 1.00000000e+00 -1.62568234e-17  2.15939404e-17]
 [ 0.00000000e+00  0.00000000e+00  0.00000000e+00]]

… and a visualization:

Q5: Can you implement the generation of a SO3 element from a combination of 3 unit generators of rotations along X, Y and Z axes?

A5: An example of linear combination of the 3 fundamental generators of rotation (3x3 matrices).

Demystifying Graph Sampling & Walk Methods

Q1. Is a graph sampling-based inductive learning method conceptually similar to: Breadth-first search (BFS)? Depth-first search (DFS)?

A1: The method for graph sampling-based inductive learning is similar to Depth-first search

Q2. In the context of edge prediction on a graph, what does batch_size represent?

A2: For task related to classification or regression of graph edges, the batch_size is the number of edges contained in a mini-batch

Q3. How can you differentiate between training and validation data when using a neighbor sampling loader?

A3: The training data is specified through the configuration attribute, input_nodes data.train_mask for training data and data.val_mask for validation data. It is common practice to shuffle training data as illustrated below.

 train_loader = NeighborLoader(self.data,
                               num_neighbors=num_neighbors,
                               batch_size=batch_size,
                               replace=replace,
                               drop_last=False,
                               shuffle=True,
                               num_workers=num_workers,
                               input_nodes=self.data.train_mask)

Q4. Which of the following num_neighbors configurations is most suitable for a neighbor sampler: [4, 8, 8], [8, 4, 2], [8, 4, 4, 2]?

A4:

[4, 8, 8] The first hop contains less nodes than the subsequent hops causing instability as it prioritize long-range dependencies.
[8, 4, 4, 2] The 4th hops in the random walk increases the computational cost for training while potentially degrading the performance
[8, 4, 2] Best option, although [8, 4] may result in similar performance for less computational cost

Q5. Can you provide a code snippet to create a neighbor sampling loader for the Facebook dataset using PyTorch, following the structure of Code Snippet #2?

A5:

Plug & Play Training for Graph Convolutional Networks

Q1: What are the advantages of using a declarative format to define training configurations and model parameters?

A1: Some of the benefits …

Reduces the risk of introducing new bugs
Lowers the barrier for data scientists with limited Python programming skills
Eliminates the need to retest existing models and training implementation

Q2: Can you implement class weight computation to balance classes based on their instance counts, given graph data?

A2: An example of implementation

Q3: Which additional hyper-parameters would you consider adding to the attribute list in Code Snippet 2?

A3:

residual: Use of residual (skip) connections
scheduler: Learning rate scheduler (e.g., cosine, step, plateau)

Q4: What alternative metric would you recommend for evaluating node classification performance on the Flickr dataset?

A4:

Area Under ROC Curve: Measures the trade-off between true positive rate and false positive rate.
Area Under Precision-Recall Curve: Using sensitivity and specificity.
Useful in imbalanced datasets.
Average Precision: Area under the precision-recall curve, weighted by recall steps.
Brier Score: Measures the mean squared difference between predicted probabilities and actual outcomes.
Specificity: Measures the ability to correctly identify negatives.

Q5: Can you update the JSON model definition in Code Snippet 10 to include a pooling layer of type TopKPooling?

A5:

{
  'model_id': 'MyModel',
  'gconv_blocks': [
    {
      'block_id': 'G_Conv_1',
      'conv_layer': GraphConv(in_channels=796, 
                              out_channels=384),
      'num_channels': 384,
      'activation': nn.ReLU(),
      'batch_norm': None,
      'pooling': TopKPooling(hidden_channels, ratio=0.4),   # Pooling
      'dropout': 0.25
    },
    {
      'block_id': 'G_Block_2',
      'conv_layer': GraphConv(in_channels=384, 
                              out_channels=384),
      'num_channels': 384,
      'activation': nn.ReLU(),
      'batch_norm': None,
      'pooling':TopKPooling(hidden_channels, ratio=0.4),   # Pooling
      'dropout': 0.25
    }
  ],
  'mlp_blocks': [
    {
      'block_id': 'MyMLP',
      'in_features': hidden_channels,
      'out_features': _num_classes,
      'activation': nn.LogSoftmax(dim=-1),
      'dropout': 0.0
    }
  ]
}

How to Tune a Graph Convolutional Network

Q1 What are the two main categories of Bayesian Optimization?

A1:

Gaussian Process
Tree-based Parzen Estimator

Q2 Why is the advantage of breaking down the hyperparameter search space into training parameters, model architecture, and node sampling methods?

A2: Breaking down the search space into distinct categories—training parameters, model architecture, and sampling strategies for example - enable to isolate the group of parameters that has the most influence on the performance of the model, making the process more manageable.

Q3 Can you implement an __init_hpo method (referenced as code snippet 3) for the Graph Sampling Based Inductive Learning Method [ref 2]? As a reminder, the JSON string describing the sampling parameters is:

{
  'id': 'GraphSAINTRandomWalkSampler',
  'walk_length':3,
  'num_steps': 12,
  'sample_coverage': 100,
  'batch_size': 4096,
  'num_workers': 4
}

A3: Here is an example of implementation of __init_hpo

Q4 Can you modify the objective method (referenced as code snippet 2) so that it minimizes the validation loss?

A4:

You also need to modify the direction of the study:

Q5: Can you name one limitation of the Optuna hyperparameter optimization library?

A5: There are 2 known limitations

The list of categorical values supported in Optuna is limited to discrete scalar values such as strings, numbers, or simple objects.
objective function has to be implemented as either a global or class (static) method.

Designing Graph Neural Networks from Homophily Ratios

Q1: What distinguishes the node homophily ratio from the edge homophily ratio?

A1:

Edge homophily ratio: The fraction of edges in a graph which connects nodes that have the same class label. It answers the question, How likely is it that a random edge connects same-label nodes?
Node homophily ratio: Edge homophily is normalized across neighborhoods. It answers, How homophilic is the neighborhood of a typical node?

Q2: What are the key attributes that define the complexity of a Graph Neural Network?

A2: Here is a few attributes:

Number of graph convolutional and attention layers
Graph pooling layer
Node or edge neighbor sampling method
Residual connections
Heteromophilic nodes
Inclusion of isolated nodes
Aggregation method
Data transformation prior aggregation

Q3: How can the node homophily ratio be implemented in code?

A3: Here is an example of computation of the node homophily ratio

Q4: What factors contribute to the discrepancy between node and edge homophily ratios?

A4: Actually there are several factors, among them,

Edge homophily tends to be biased toward high-degree nodes
Node homophily typically excludes isolated nodes
Edge homophily is overstated in case of class imbalance (one class contains most of high degree nodes)

You may find additional factors in technical literature and research papers

SE(3): The Lie Group That Moves the World

Q1: What are the two point types used in SE(3)?

A1: SE(3) is both a Lie group structure and a smooth manifold. Therefore, the two point types are ‘matrix’ for Lie algebra and ‘vector’ for operation of the smooth manifold

Q2: The exponential map generates SE(3) elements from a 4×4 Lie algebra matrix. How can we compute the corresponding Lie algebra matrix from a given SE(3) element?

A2: Logarithm map

Q3: Can you provide Python code, using numpy to construct a 4×4 SE(3) matrix from a given 3×3 spatial rotation matrix and a 3D translation vector?

A3:

Q4: Additionally, can you provide code to verify SE(3) multiplication is not commutative?

A4:

Geometry of Closed-form Statistical Manifolds

Q1: What is the shape of tensor representing a point on a statistical manifold defined by n parameters?

A1: The shape of the tensor is the same after the number of parameters. For instance exponential distribution has one parameter, rate so the tensor has one value. The shape of a point on the normal distribution manifold would have to components.

Q2: What is the Fisher information metric of the normal distribution when the standard deviation is fixed and not treated as a parameter?

A2: Using the formula [B} on the Normal distribution N(μ,σ2) the metric is

\(g(\theta)=\mathbb{E}\left [\left ( \frac{\partial \ log \ p(x|\theta)}{\partial \theta} \right ) ^{2} \right ]=\frac{1}{\sigma^{2}}\)

Q3: Can you implement the Fisher information metric for a normal distribution with a constant, fixed standard deviation?

A3:

Q4: Can you compute the Fisher-Rao metric manually for the exponential distribution manifold?

A4:

\(\begin{matrix} pdf: \ \ \ \ p(x|\theta)=\theta e^{-\theta.x} \\ log \ p(x|\theta|) = log \ \theta - \theta.x \\ \frac{\partial log \ p(x|\theta|) }{\partial \theta}= \frac{1}{\theta}-x \\ \mathbb{E} \left [ \left ( \frac{\partial log \ p(x|\theta|) }{\partial \theta} \right )^2 \right ] = \frac{1}{\sigma^2} \end{matrix}\)

Exercises Answers 🛠️

Table of Contents

Introduction to Geometric Deep Learning

Insights into Logistic Regression on Riemannian Manifolds

Dive into Functional Data Analysis

Uniform Manifold Approximation & Projection

Hands-on Principal Geodesic Analysis

Riemannian Manifolds: 1. Foundation

Riemannian Manifolds: 2. Hands-on with Hypersphere

Insights into k-Means on Riemannian Manifolds

Overview of Geomstats for Geometric Learning

Reusable Neural Blocks in PyTorch & PyG

Modular Deep Learning Models with Neural Blocks

Einstein Summation in Geometric Deep Learning

Taming PyTorch Geometric for Graph Neural Networks

Practical Introduction to Lie Groups in Python

Demystifying Graph Sampling & Walk Methods

Plug & Play Training for Graph Convolutional Networks

How to Tune a Graph Convolutional Network

Designing Graph Neural Networks from Homophily Ratios

SE(3): The Lie Group That Moves the World

Geometry of Closed-form Statistical Manifolds

Discussion about this post