diffusion_map

Routines and Class definitions for the diffusion maps algorithm.

class pydiffmap.diffusion_map.DiffusionMap(kernel_object, alpha=0.5, n_evecs=1, weight_fxn=None, density_fxn=None, bandwidth_normalize=False, oos='nystroem')[source]

Diffusion Map object for data analysis

Parameters:
  • kernel_object (Kernel object.) – Kernel object that outputs the values of the kernel. Must have the method .fit(X) and .compute() methods. Any epsilon desired for normalization should be stored at kernel_object.epsilon_fitted and any bandwidths should be located at kernel_object.bandwidths.
  • alpha (scalar, optional) – Exponent to be used for the left normalization in constructing the diffusion map.
  • n_evecs (int, optional) – Number of diffusion map eigenvectors to return
  • weight_fxn (callable or None, optional) – Callable function that take in a point, and outputs the value of the weight matrix at those points.
  • density_fxn (callable or None, optional) – Callable function that take in X, and outputs the value of the density of X. Used instead of kernel density estimation in the normalisation.
  • bandwidth_normalize (boolean, optional) – If true, normalize the final constructed transition matrix by the bandwidth as described in Berry and Harlim. [1]_
  • oos (‘nystroem’ or ‘power’, optional) – Method to use for out-of-sample extension.

References

[1]T. Berry, and J. Harlim, Applied and Computational Harmonic Analysis 40, 68-96 (2016).
construct_Lmat(X)[source]

Builds the transition matrix, but does NOT compute the eigenvectors. This is useful for applications where the transition matrix itself is the object of interest.

Parameters:X (array-like, shape (n_query, n_features)) – Data upon which to construct the diffusion map.
Returns:self (the object itself)
fit(X)[source]

Fits the data.

Parameters:X (array-like, shape (n_query, n_features)) – Data upon which to construct the diffusion map.
Returns:self (the object itself)
fit_transform(X)[source]

Fits the data and returns diffusion coordinates. equivalent to calling dmap.fit(X).transform(x).

Parameters:X (array-like, shape (n_query, n_features)) – Data upon which to construct the diffusion map.
Returns:phi (numpy array, shape (n_query, n_eigenvectors)) – Transformed value of the given values.
classmethod from_sklearn(alpha=0.5, k=64, kernel_type='gaussian', epsilon='bgh', n_evecs=1, neighbor_params=None, metric='euclidean', metric_params=None, weight_fxn=None, density_fxn=None, bandwidth_type=None, bandwidth_normalize=False, oos='nystroem')[source]

Builds the diffusion map using a kernel constructed using the Scikit-learn nearest neighbor object. Parameters are largely the same as the constructor, but in place of the kernel object it take the following parameters.

Parameters:
  • k (int, optional) – Number of nearest neighbors over which to construct the kernel.
  • kernel_type (string, optional) – Type of kernel to construct. Currently the only option is ‘gaussian’, but more will be implemented.
  • epsilon (string or scalar, optional) – Method for choosing the epsilon. Currently, the only options are to provide a scalar (epsilon is set to the provided scalar) ‘bgh’ (Berry, Giannakis and Harlim), and ‘bgh_generous’ (‘bgh’ method, with answer multiplied by 2.
  • neighbor_params (dict or None, optional) – Optional parameters for the nearest Neighbor search. See scikit-learn NearestNeighbors class for details.
  • metric (string, optional) – Metric for distances in the kernel. Default is ‘euclidean’. The callable should take two arrays as input and return one value indicating the distance between them.
  • metric_params (dict or None, optional) – Optional parameters required for the metric given.
  • bandwidth_type (callable, number, string, or None, optional) – Type of bandwidth to use in the kernel. If None (default), a fixed bandwidth kernel is used. If a callable function, the data is passed to the function, and the bandwidth is output (note that the function must take in an entire dataset, not the points 1-by-1). If a number, e.g. -.25, a kernel density estimate is performed, and the bandwidth is taken to be q**(input_number). For a string input, the input is assumed to be an evaluatable expression in terms of the dimension d, e.g. “-1/(d+2)”. The dimension is then estimated, and the bandwidth is set to q**(evaluated input string).

Examples

# setup neighbor_params list with as many jobs as CPU cores and kd_tree neighbor search. >>> neighbor_params = {‘n_jobs’: -1, ‘algorithm’: ‘kd_tree’} # initialize diffusion map object with the top two eigenvalues being computed, epsilon set to 0.1 # and alpha set to 1.0. >>> mydmap = DiffusionMap.from_sklearn(n_evecs = 2, epsilon = .1, alpha = 1.0, neighbor_params = neighbor_params)

References

[1]T. Berry, and J. Harlim, Applied and Computational Harmonic Analysis 40, 68-96 (2016).
transform(Y)[source]

Performs Nystroem out-of-sample extension to calculate the values of the diffusion coordinates at each given point.

Parameters:Y (array-like, shape (n_query, n_features)) – Data for which to perform the out-of-sample extension.
Returns:phi (numpy array, shape (n_query, n_eigenvectors)) – Transformed value of the given values.
class pydiffmap.diffusion_map.TMDmap(alpha=0.5, k=64, kernel_type='gaussian', epsilon='bgh', n_evecs=1, neighbor_params=None, metric='euclidean', metric_params=None, change_of_measure=None, density_fxn=None, bandwidth_type=None, bandwidth_normalize=False, oos='nystroem')[source]

Implementation of the TargetMeasure diffusion map. This provides a more convenient interface for some hyperparameter selection for the general diffusion object. It takes the same parameters as the base Diffusion Map object. However, rather than taking a weight function, it takes as input a change of measure function.

Parameters:change_of_measure (callable, optional) – Function that takes in a point and evaluates the change-of-measure between the density otherwise stationary to the diffusion map and the desired density.
pydiffmap.diffusion_map.nystroem_oos(dmap_object, Y)[source]

Performs Nystroem out-of-sample extension to calculate the values of the diffusion coordinates at each given point.

Parameters:
  • dmap_object (DiffusionMap object) – Diffusion map upon which to perform the out-of-sample extension.
  • Y (array-like, shape (n_query, n_features)) – Data for which to perform the out-of-sample extension.
Returns:

phi (numpy array, shape (n_query, n_eigenvectors)) – Transformed value of the given values.

pydiffmap.diffusion_map.power_oos(dmap_object, Y)[source]

Performs out-of-sample extension to calculate the values of the diffusion coordinates at each given point using the power-like method.

Parameters:
  • dmap_object (DiffusionMap object) – Diffusion map upon which to perform the out-of-sample extension.
  • Y (array-like, shape (n_query, n_features)) – Data for which to perform the out-of-sample extension.
Returns:

phi (numpy array, shape (n_query, n_eigenvectors)) – Transformed value of the given values.