idtxl package¶

Submodules¶

idtxl.data module¶

Provide data structures for IDTxl analysis.

class idtxl.data.Data(data=None, dim_order='psr', normalise=True, seed=None)[source]¶

Bases: object

Store data for information dynamics estimation.

Data takes a 1- to 3-dimensional array representing realisations of random variables in dimensions: processes, samples (over time), and replications. If necessary, data reshapes provided realisations to fit the format expected by IDTxl, which is a 3-dimensional array with axes representing (process index, sample index, replication index). Indicate the actual order of dimensions in the provided array in a three-character string, e.g. ‘spr’ for an array with realisations over (1) samples in time, (2) processes, (3) replications.

Example:

>>> data_mute = Data()              # initialise empty data object
>>> data_mute.generate_mute_data()  # simulate data from MuTE paper
>>>
>>> # Create data objects with data of various sizes
>>> d = np.arange(10000).reshape((2, 1000, 5))  # 2 procs.,
>>> data_1 = Data(d, dim_order='psr')           # 1000 samples, 5 repl.
>>>
>>> d = np.arange(3000).reshape((3, 1000))  # 3 procs.,
>>> data_2 = Data(d, dim_order='ps')        # 1000 samples
>>>
>>> # Overwrite data in existing object with random data
>>> d = np.arange(5000)
>>> data_2.set_data(data_new, 's')

Note:

Realisations are stored as attribute ‘data’. This can only be set via the ‘set_data()’ method.

Args:

datanumpy array [optional]: 1/2/3-dimensional array with raw data
dim_orderstring [optional]: order of dimensions, accepts any combination of the characters ‘p’, ‘s’, and ‘r’ for processes, samples, and replications; must have the same length as the data dimensionality, e.g., ‘ps’ for a two-dimensional array of data from several processes over time (default=’psr’)
normalisebool [optional]: if True, data gets normalised per process (default=True)
seedint [optional]: can be set to a fixed integer to get repetitive results on the same data with multiple runs of analyses. Otherwise a random seed is set as default.

Attributes:

datanumpy array: realisations, can only be set via ‘set_data’ method
n_processesint: number of processes
n_replicationsint: number of replications
n_samplesint: number of samples in time
normalisebool: if true, all data gets z-standardised per process
initial_statearray: initial state of the seed for shuffled permutations

property data¶: Return data array.

generate_logistic_maps_data(n_samples=1000, n_replications=10, coefficient_matrices=array([[[0.5, 0.], [0.4, 0.5]]]), noise_std=0.1)[source]¶

Generate discrete-time coupled-logistic-maps time series.

Generate data and overwrite the instance’s current data.

The implemented logistic map function is f(x) = 4 * x * (1 - x).

Args:

n_samplesint [optional]: number of samples simulated for each process and replication
n_replicationsint [optional]: number of replications
coefficient_matricesnumpy array [optional]: coefficient matrices: numpy array with dimensions (order, number of processes, number of processes). Each square coefficient matrix corresponds to a lag, starting from lag=1. The total number of provided matrices implicitly determines the order of the stochastic process. (default = np.array([[[0.5, 0], [0.4, 0.5]]]))
noise_stdfloat [optional]: standard deviation of uncorrelated Gaussian noise (default = 0.1)

generate_mute_data(n_samples=1000, n_replications=10)[source]¶

Generate example data for a 5-process network.

Generate example data and overwrite the instance’s current data. The network is used as an example the paper on the MuTE toolbox (Montalto, PLOS ONE, 2014, eq. 14) and was orginally proposed by Baccala & Sameshima (2001). The network consists of five auto-regressive (AR) processes with model orders 2 and the following (non-linear) couplings:

0 -> 1, u = 2 (non-linear) 0 -> 2, u = 3 0 -> 3, u = 2 (non-linear) 3 -> 4, u = 1 4 -> 3, u = 1

References:

Montalto, A., Faes, L., & Marinazzo, D. (2014) MuTE: A MATLAB toolbox to compare established and novel estimators of the multivariate transfer entropy. PLoS ONE 9(10): e109462. https://doi.org/10.1371/journal.pone.0109462
Baccala, L.A. & Sameshima, K. (2001). Partial directed coherence: a new concept in neural structure determination. Biol Cybern 84: 463–474. https://doi.org/10.1007/PL00007990

Args:

n_samplesint: number of samples simulated for each process and replication
n_replicationsint: number of replications

generate_var_data(n_samples=1000, n_replications=10, coefficient_matrices=array([[[0.5, 0.], [0.4, 0.5]]]), noise_std=0.1)[source]¶

Generate discrete-time VAR (vector auto-regressive) time series.

Generate data and overwrite the instance’s current data.

Args:

n_samplesint [optional]: number of samples simulated for each process and replication
n_replicationsint [optional]: number of replications
coefficient_matricesnumpy array [optional]: coefficient matrices: numpy array with dimensions (VAR order, number of processes, number of processes). Each square coefficient matrix corresponds to a lag, starting from lag=1. The total number of provided matrices implicitly determines the order of the VAR process. (default = np.array([[[0.5, 0], [0.4, 0.5]]]))
noise_stdfloat [optional]: standard deviation of uncorrelated Gaussian noise (default = 0.1)

get_realisations(current_value, idx_list, shuffle=False)[source]¶

Return realisations for a list of indices.

Return realisations for indices in list. Optionally, realisations can be shuffled to create surrogate data for statistical testing. For shuffling, data blocks are permuted over replications while their temporal order stays intact within replications:

Original data:

repl. ind.	1 1 1 1	2 2 2 2	3 3 3 3	4 4 4 4	5 5 5 5	…
sample index	1 2 3 4	1 2 3 4	1 2 3 4	1 2 3 4	1 2 3 4	…

Shuffled data:

repl. ind.	3 3 3 3	1 1 1 1	4 4 4 4	2 2 2 2	5 5 5 5	…
sample index	1 2 3 4	1 2 3 4	1 2 3 4	1 2 3 4	1 2 3 4	…

Args:

idx_list: list of tuples: variable indices
current_valuetuple: index of the current value in current analysis, has to have the form (idx process, idx sample); if current_value == idx, all samples for a process are returned
shuffle: bool: if true permute blocks of replications over trials

Returns:

numpy array: realisations with dimensions (no. samples * no.replications) x number of indices
numpy array: replication index for each realisation with dimensions (no. samples * no.replications) x number of indices

get_seed()[source]¶: return the initial seed of the data

get_state()[source]¶: return the current state of the random seed

n_realisations(current_value=None)[source]¶

Number of realisations over samples and replications.

Args:

current_valuetuple [optional]: reference point for calculation of number of realisations (e.g. when using an embedding of length k, we count realisations from the k+1th sample because we loose the first k samples to the embedding); if no current_value is provided, the number of all samples is used

n_realisations_repl()[source]¶: Number of realisations over replications.

n_realisations_samples(current_value=None)[source]¶

Number of realisations over samples.

Args:

current_valuetuple [optional]: reference point for calculation of number of realisations (e.g. when using an embedding of length k, the current value is at sample k + 1; we thus count realisations from the k + 1st sample because we loose the first k samples to the embedding)

permute_replications(current_value, idx_list)[source]¶

Return realisations with permuted replications (time stays intact).

Create surrogate data by permuting realisations over replications while keeping the temporal structure (order of samples) intact. Return realisations for all indices in the list, where an index is expected to have the form (process index, sample index). Realisations are permuted block-wise by permuting the order of replications:

Original data:

repl. ind.	1 1 1 1	2 2 2 2	3 3 3 3	4 4 4 4	5 5 5 5	…
sample index	1 2 3 4	1 2 3 4	1 2 3 4	1 2 3 4	1 2 3 4	…

Permuted data:

repl. ind.	3 3 3 3	1 1 1 1	4 4 4 4	2 2 2 2	5 5 5 5	…
sample index	1 2 3 4	1 2 3 4	1 2 3 4	1 2 3 4	1 2 3 4	…

Args:

current_valuetuple: index of the current_value in the data
idx_listlist of tuples: indices of variables

Returns:

numpy array: permuted realisations with dimensions replications x number of indices
numpy array: replication index for each realisation

Raises:

TypeError if idx_realisations is not a list

permute_samples(current_value, idx_list, perm_settings)[source]¶

Return realisations with permuted samples (repl. stays intact).

Create surrogate data by permuting realisations over samples (time) while keeping the order of replications intact. Surrogates can be created for multiple variables in parallel, where variables are provided as a list of indices. An index is expected to have the form (process index, sample index).

Permuting samples in time is the fall-back option for surrogate data creation. The default method for surrogate data creation is the permutation of replications, while keeping the order of samples in time intact. If the number of replications is too small to allow for a sufficient number of permutations for the generation of surrogate data, permutation of samples in time is chosen instead.

Different permutation strategies can be chosen to permute realisations in time. Note that if data consists of multiple replications, within each replication, samples are shuffled following the same permutation pattern:

Original data:

repl. ind.	1 1 1 1 1 1 1 1	2 2 2 2 2 2 2 2	3 3 3 3 3 3 3 3	…
sample index	1 2 3 4 5 6 7 8	1 2 3 4 5 6 7 8	1 2 3 4 5 6 7 8	…

Circular shift by a random number of samples, e.g. 4 samples:

repl. ind.	1 1 1 1 1 1 1 1	2 2 2 2 2 2 2 2	3 3 3 3 3 3 3 3	…
sample index	5 6 7 8 1 2 3 4	5 6 7 8 1 2 3 4	5 6 7 8 1 2 3 4	…

Permute blocks of 3 samples:

repl. ind.	1 1 1 1 1 1 1 1	2 2 2 2 2 2 2 2	3 3 3 3 3 3 3 3	…
sample index	4 5 6 7 8 1 2 3	4 5 6 7 8 1 2 3	4 5 6 7 8 1 2 3	…

Permute data locally within a range of 4 samples:

repl. ind.	1 1 1 1 1 1 1 1	2 2 2 2 2 2 2 2	3 3 3 3 3 3 3 3	…
sample index	1 2 4 3 8 5 6 7	1 2 4 3 8 5 6 7	1 2 4 3 8 5 6 7	…

Random permutation:

repl. ind.	1 1 1 1 1 1 1 1	2 2 2 2 2 2 2 2	3 3 3 3 3 3 3 3	…
sample index	4 2 5 7 1 3 2 6	4 2 5 7 1 3 2 6	4 2 5 7 1 3 2 6	…

Args:

current_valuetuple

index of the current_value in the data

idx_listlist of tuples

indices of variables

perm_settingsdict

settings specifying the allowed permutations:

perm_type : str permutation type, can be
- ‘random’: swaps samples at random,
- ‘circular’: shifts time series by a random number of samples
- ‘block’: swaps blocks of samples,
- ‘local’: swaps samples within a given range, or
additional settings depending on the perm_type (n is the number of samples):
- if perm_type == ‘circular’:
  
  ‘max_shift’int
  the maximum number of samples for shifting (e.g., number of samples / 2)
- if perm_type == ‘block’:
  
  ‘block_size’int
  no. samples per block (e.g., number of samples / 10)
  
  ‘perm_range’int
  range in which blocks can be swapped (e.g., number of samples / block_size)
- if perm_type == ‘local’:
  
  ‘perm_range’int
  range in samples over which realisations can be permuted (e.g., number of samples / 10)

Returns:

numpy array: permuted realisations with dimensions replications x number of indices
numpy array: sample index for each realisation

Raises:

TypeError if idx_realisations is not a list

Note:

This permutation scheme is the fall-back option if surrogate data can not be created by shuffling replications because the number of replications is too small to generate the requested number of permutations.

set_data(data, dim_order)[source]¶

Overwrite data in an existing Data object.

Args:

datanumpy array: 1- to 3-dimensional array of realisations
dim_orderstring: order of dimensions, accepts any combination of the characters ‘p’, ‘s’, and ‘r’ for processes, samples, and replications; must have the same length as number of dimensions in data

slice_permute_replications(process)[source]¶

Return data slice with permuted replications (time stays intact).

Create surrogate data by permuting realisations over replications while keeping the temporal structure (order of samples) intact. Return realisations for all indices in the list, where an index is expected to have the form (process index, sample index). Realisations are permuted block-wise by permuting the order of replications

slice_permute_samples(process, perm_settings)[source]¶

Return slice of data with permuted samples (repl. stays intact).

Create surrogate data by permuting data in a slice over samples (time) while keeping the order of replications intact. Return slice for the entry specified by ‘process’. Realisations are permuted according to the settings specified in perm_settings:

Original data:

repl. ind.	1 1 1 1 1 1 1 1	2 2 2 2 2 2 2 2	3 3 3 3 3 3 3 3	…
sample index	1 2 3 4 5 6 7 8	1 2 3 4 5 6 7 8	1 2 3 4 5 6 7 8	…

Circular shift by 2, 6, and 4 samples:

repl. ind.	1 1 1 1 1 1 1 1	2 2 2 2 2 2 2 2	3 3 3 3 3 3 3 3	…
sample index	7 8 1 2 3 4 5 6	3 4 5 6 7 8 1 2	5 6 7 8 1 2 3 4	…

Permute blocks of 3 samples:

repl. ind.	1 1 1 1 1 1 1 1	2 2 2 2 2 2 2 2	3 3 3 3 3 3 3 3	…
sample index	4 5 6 7 8 1 2 3	1 2 3 7 8 4 5 6	7 8 4 5 6 1 2 3	…

Permute data locally within a range of 4 samples:

repl. ind.	1 1 1 1 1 1 1 1	2 2 2 2 2 2 2 2	3 3 3 3 3 3 3 3	…
sample index	1 2 4 3 8 5 6 7	4 1 2 3 5 7 8 6	3 1 2 4 8 5 6 7	…

Random permutation:

repl. ind.	1 1 1 1 1 1 1 1	2 2 2 2 2 2 2 2	3 3 3 3 3 3 3 3	…
sample index	4 2 5 7 1 3 2 6	7 5 3 4 2 1 8 5	1 2 4 3 6 8 7 5	…

Permuting samples is the fall-back option for surrogate creation if the number of replications is too small to allow for a sufficient number of permutations for the generation of surrogate data.

Args:

processint

process for which to return data slice

perm_settingsdict

settings specifying the allowed permutations:

perm_type : str permutation type, can be
- ‘circular’: shifts time series by a random number of samples
- ‘block’: swaps blocks of samples,
- ‘local’: swaps samples within a given range, or
- ‘random’: swaps samples at random,
additional settings depending on the perm_type (n is the number of samples):
- if perm_type == ‘circular’:
  
  ‘max_shift’int
  the maximum number of samples for shifting (default=n/2)
- if perm_type == ‘block’:
  
  ‘block_size’int
  no. samples per block (default=n/10)
  
  ‘perm_range’int
  range in which blocks can be swapped (default=max)
- if perm_type == ‘local’:
  
  ‘perm_range’int
  range in samples over which realisations can be permuted (default=n/10)

Returns:

numpy array: data slice with data permuted over samples with dimensions samples x number of replications
numpy array: index of permuted samples

Note:

This permutation scheme is the fall-back option if the number of replications is too small to allow a sufficient number of permutations for the generation of surrogate data.

idtxl.bivariate_te module¶

Perform network inference using multivarate transfer entropy.

Estimate multivariate transfer entropy (TE) for network inference using a greedy approach with maximum statistics to generate a non-uniform embedding (Faes, 2011; Lizier, 2012).

Note:: Written for Python 3.4+

class idtxl.bivariate_te.BivariateTE[source]¶

Bases: idtxl.network_inference.NetworkInferenceTE, idtxl.network_inference.NetworkInferenceBivariate

Perform network inference using bivariate transfer entropy.

Perform network inference using bivariate transfer entropy (TE). To perform network inference call analyse_network() on the whole network or a set of nodes or call analyse_single_target() to estimate TE for a single target. See docstrings of the two functions for more information.

References:

Schreiber, T. (2000). Measuring Information Transfer. Phys Rev Lett, 85(2), 461–464. http://doi.org/10.1103/PhysRevLett.85.461
Vicente, R., Wibral, M., Lindner, M., & Pipa, G. (2011). Transfer entropy-a model-free measure of effective connectivity for the neurosciences. J Comp Neurosci, 30(1), 45–67. http://doi.org/10.1007/s10827-010-0262-3
Lizier, J. T., & Rubinov, M. (2012). Multivariate construction of effective computational networks from observational data. Max Planck Institute: Preprint. Retrieved from http://www.mis.mpg.de/preprints/2012/preprint2012_25.pdf
Faes, L., Nollo, G., & Porta, A. (2011). Information-based detection of nonlinear Granger causality in multivariate processes via a nonuniform embedding technique. Phys Rev E, 83, 1–15. http://doi.org/10.1103/PhysRevE.83.051112

Attributes:

source_setlist: indices of source processes tested for their influence on the target
targetlist: index of target process
settingsdict: analysis settings
current_valuetuple: index of the current value in TE estimation, (idx process, idx sample)
selected_vars_fulllist of tuples: samples in the full conditional set, (idx process, idx sample)
selected_vars_sourceslist of tuples: source samples in the conditional set, (idx process, idx sample)
selected_vars_targetlist of tuples: target samples in the conditional set, (idx process, idx sample)
pvalue_omnibusfloat: p-value of the omnibus test
pvalues_sign_sourcesnumpy array: array of p-values for TE from individual sources to the target
statistic_omnibusfloat: joint TE from all sources to the target
statistic_sign_sourcesnumpy array: raw TE values from individual sources to the target
sign_ominbusbool: statistical significance of the over-all TE

analyse_network(settings, data, targets='all', sources='all')[source]¶

Find bivariate transfer entropy between all nodes in the network.

Estimate bivariate transfer entropy (TE) between all nodes in the network or between selected sources and targets.

Note:: For a detailed description of the algorithm and settings see documentation of the analyse_single_target() method and references in the class docstring.

Example:

>>> data = Data()
>>> data.generate_mute_data(100, 5)
>>> settings = {
>>>     'cmi_estimator':  'JidtKraskovCMI',
>>>     'n_perm_max_stat': 200,
>>>     'n_perm_min_stat': 200,
>>>     'n_perm_omnibus': 500,
>>>     'n_perm_max_seq': 500,
>>>     'max_lag': 5,
>>>     'min_lag': 4
>>>     }
>>> network_analysis = BivariateTE()
>>> results = network_analysis.analyse_network(settings, data)

Args:

settingsdict

parameters for estimation and statistical testing, see documentation of analyse_single_target() for details, settings can further contain

verbose : bool [optional] - toggle console output (default=True)

dataData instance

raw data for analysis

targetslist of int | ‘all’ [optional]

index of target processes (default=’all’)

sourceslist of int | list of list | ‘all’ [optional]

indices of source processes for each target (default=’all’); if ‘all’, all network nodes excluding the target node are considered as potential sources and tested; if list of int, the source specified by each int is tested as a potential source for the target with the same index or a single target; if list of list, sources specified in each inner list are tested for the target with the same index

Returns:

ResultsNetworkInference instance: results of network inference, see documentation of ResultsNetworkInference()

analyse_single_target(settings, data, target, sources='all')[source]¶

Find bivariate transfer entropy between sources and a target.

Find bivariate transfer entropy (TE) between all potential source processes and the target process. Uses bivariate, non-uniform embedding found through information maximisation.

Bivariate TE is calculated in four steps:

find all relevant variables in the target processes’ own past, by iteratively adding candidate variables that have significant conditional mutual information (CMI) with the current value (conditional on all variables that were added previously)
find all relevant variables in the single source processes’ pasts (again by finding all candidates with significant CMI); treat each potential source process separately, i.e., the CMI is calculated with respect to already selected variables from the target’s past and from the current processes’ past only
prune the final conditional set for each link (i.e., each process-target pairing): test the CMI between each variable in the final set and the current value, conditional on all other variables in the final set of the current link
statistics on the final set of sources (test for over-all transfer between the final conditional set and the current value, and for significant transfer of all individual variables in the set)

Note:: For a further description of the algorithm see references in the class docstring.

Example:

>>> data = Data()
>>> data.generate_mute_data(100, 5)
>>> settings = {
>>>     'cmi_estimator':  'JidtKraskovCMI',
>>>     'n_perm_max_stat': 200,
>>>     'n_perm_min_stat': 200,
>>>     'n_perm_omnibus': 500,
>>>     'n_perm_max_seq': 500,
>>>     'max_lag': 5,
>>>     'min_lag': 4
>>>     }
>>> target = 0
>>> sources = [1, 2, 3]
>>> network_analysis = BivariateTE()
>>> results = network_analysis.analyse_single_target(settings,
>>>                                                  data, target,
>>>                                                  sources)

Args:

settingsdict

parameters for estimation and statistical testing:

cmi_estimator : str - estimator to be used for CMI calculation (for estimator settings see the documentation in the estimators_* modules)
max_lag_sources : int - maximum temporal search depth for candidates in the sources’ past in samples
min_lag_sources : int - minimum temporal search depth for candidates in the sources’ past in samples
max_lag_target : int [optional] - maximum temporal search depth for candidates in the target’s past in samples (default=same as max_lag_sources)
tau_sources : int [optional] - spacing between candidates in the sources’ past in samples (default=1)
tau_target : int [optional] - spacing between candidates in the target’s past in samples (default=1)
n_perm_* : int - number of permutations, where * can be ‘max_stat’, ‘min_stat’, ‘omnibus’, and ‘max_seq’ (default=500)
alpha_* : float - critical alpha level for statistical significance, where * can be ‘max_stats’, ‘min_stats’, and ‘omnibus’ (default=0.05)
add_conditionals : list of tuples | str [optional] - force the estimator to add these conditionals when estimating TE; can either be a list of variables, where each variable is described as (idx process, lag wrt to current value) or can be a string: ‘faes’ for Faes-Method (see references)
permute_in_time : bool [optional] - force surrogate creation by shuffling realisations in time instead of shuffling replications; see documentation of Data.permute_samples() for further settings (default=False)
verbose : bool [optional] - toggle console output (default=True)
write_ckp : bool [optional] - enable checkpointing, writes analysis state to disk every time a variable is selected; resume crashed analysis using network_analysis.resume_checkpoint() (default=False)
filename_ckp : str [optional] - checkpoint file name (without extension) (default=’./idtxl_checkpoint’)

dataData instance

raw data for analysis

targetint

index of target process

sourceslist of int | int | ‘all’ [optional]

single index or list of indices of source processes (default=’all’), if ‘all’, all network nodes excluding the target node are considered as potential sources

Returns:

ResultsNetworkInference instance: results of network inference, see documentation of ResultsNetworkInference()

idtxl.bivariate_mi module¶

Perform network inference using bivarate mutual information.

Estimate bivariate mutual information (MI) for network inference using a greedy approach with maximum statistics to generate a non-uniform embedding (Faes, 2011; Lizier, 2012).

Note:: Written for Python 3.4+

class idtxl.bivariate_mi.BivariateMI[source]¶

Bases: idtxl.network_inference.NetworkInferenceMI, idtxl.network_inference.NetworkInferenceBivariate

Perform network inference using bivariate mutual information.

Perform network inference using bivariate mutual information (MI). To perform network inference call analyse_network() on the whole network or a set of nodes or call analyse_single_target() to estimate MI for a single target. See docstrings of the two functions for more information.

References:

Lizier, J. T., & Rubinov, M. (2012). Multivariate construction of effective computational networks from observational data. Max Planck Institute: Preprint. Retrieved from http://www.mis.mpg.de/preprints/2012/preprint2012_25.pdf
Faes, L., Nollo, G., & Porta, A. (2011). Information-based detection of nonlinear Granger causality in multivariate processes via a nonuniform embedding technique. Phys Rev E, 83, 1–15. http://doi.org/10.1103/PhysRevE.83.051112

Attributes:

source_setlist: indices of source processes tested for their influence on the target
targetlist: index of target process
settingsdict: analysis settings
current_valuetuple: index of the current value in MI estimation, (idx process, idx sample)
selected_vars_fulllist of tuples: samples in the full conditional set, (idx process, idx sample)
selected_vars_sourceslist of tuples: source samples in the conditional set, (idx process, idx sample)
selected_vars_targetlist of tuples: target samples in the conditional set, (idx process, idx sample)
pvalue_omnibusfloat: p-value of the omnibus test
pvalues_sign_sourcesnumpy array: array of p-values for MI from individual sources to the target
mi_omnibusfloat: joint MI from all sources to the target
mi_sign_sourcesnumpy array: raw MI values from individual sources to the target
sign_ominbusbool: statistical significance of the over-all MI

analyse_network(settings, data, targets='all', sources='all')[source]¶

Find bivariate mutual information between all nodes in the network.

Estimate bivariate mutual information (MI) between all nodes in the network or between selected sources and targets.

Note:: For a detailed description of the algorithm and settings see documentation of the analyse_single_target() method and references in the class docstring.

Example:

>>> data = Data()
>>> data.generate_mute_data(100, 5)
>>> # The algorithm uses a conditional mutual information to
>>> # construct a non-uniform embedding, hence a CMI- not MI-
>>> # estimator has to be specified:
>>> settings = {
>>>     'cmi_estimator':  'JidtKraskovCMI',
>>>     'n_perm_max_stat': 200,
>>>     'n_perm_min_stat': 200,
>>>     'n_perm_omnibus': 500,
>>>     'n_perm_max_seq': 500,
>>>     'max_lag': 5,
>>>     'min_lag': 4
>>>     }
>>> network_analysis = BivariateMI()
>>> results = network_analysis.analyse_network(settings, data)

Args:

settingsdict

parameters for estimation and statistical testing, see documentation of analyse_single_target() for details, settings can further contain

verbose : bool [optional] - toggle console output (default=True)

dataData instance

raw data for analysis

targetslist of int | ‘all’ [optional]

index of target processes (default=’all’)

sourceslist of int | list of list | ‘all’ [optional]

indices of source processes for each target (default=’all’); if ‘all’, all network nodes excluding the target node are considered as potential sources and tested; if list of int, the source specified by each int is tested as a potential source for the target with the same index or a single target; if list of list, sources specified in each inner list are tested for the target with the same index

Returns:

dict: results for each target, see documentation of analyse_single_target()

analyse_single_target(settings, data, target, sources='all')[source]¶

Find bivariate mutual information between sources and a target.

Find bivariate mutual information (MI) between all potential source processes and the target process. Uses bivariate, non-uniform embedding found through information maximisation

MI is calculated in three steps:

find all relevant variables in a single source processes’ past, by iteratively adding candidate variables that have significant conditional mutual information (CMI) with the current value (conditional on all variables that were added previously)
prune the final conditional set for each link (i.e., each process-target pairing): test the CMI between each variable in the final set and the current value, conditional on all other variables in the final set of the current link; treat each potential source process separately, i.e., the CMI is calculated with respect to already selected variables the current processes’ past only
statistics on the final set of sources (test for over-all transfer between the final conditional set and the current value, and for significant transfer of all individual variables in the set)

Note:: For a further description of the algorithm see references in the class docstring.

Example:

>>> data = Data()
>>> data.generate_mute_data(100, 5)
>>> # The algorithm uses a conditional mutual information to
>>> # construct a non-uniform embedding, hence a CMI- not MI-
>>> # estimator has to be specified:
>>> settings = {
>>>     'cmi_estimator':  'JidtKraskovCMI',
>>>     'n_perm_max_stat': 200,
>>>     'n_perm_min_stat': 200,
>>>     'n_perm_omnibus': 500,
>>>     'n_perm_max_seq': 500,
>>>     'max_lag': 5,
>>>     'min_lag': 4
>>>     }
>>> target = 0
>>> sources = [1, 2, 3]
>>> network_analysis = BivariateMI()
>>> results = network_analysis.analyse_single_target(settings,
>>>                                                  data, target,
>>>                                                  sources)

Args:

settingsdict

parameters for estimation and statistical testing:

cmi_estimator : str - estimator to be used for CMI calculation (for estimator settings see the documentation in the estimators_* modules)
max_lag_sources : int - maximum temporal search depth for candidates in the sources’ past in samples
min_lag_sources : int - minimum temporal search depth for candidates in the sources’ past in samples
tau_sources : int [optional] - spacing between candidates in the sources’ past in samples (default=1)
n_perm_* : int - number of permutations, where * can be ‘max_stat’, ‘min_stat’, ‘omnibus’, and ‘max_seq’ (default=500)
alpha_* : float - critical alpha level for statistical significance, where * can be ‘max_stats’, ‘min_stats’, and ‘omnibus’ (default=0.05)
add_conditionals : list of tuples | str [optional] - force the estimator to add these conditionals when estimating MI; can either be a list of variables, where each variable is described as (idx process, lag wrt to current value) or can be a string: ‘faes’ for Faes-Method (see references)
permute_in_time : bool [optional] - force surrogate creation by shuffling realisations in time instead of shuffling replications; see documentation of Data.permute_samples() for further settings (default=False)
verbose : bool [optional] - toggle console output (default=True)
write_ckp : bool [optional] - enable checkpointing, writes analysis state to disk every time a variable is selected; resume crashed analysis using network_analysis.resume_checkpoint() (default=False)
filename_ckp : str [optional] - checkpoint file name (without extension) (default=’./idtxl_checkpoint’)

dataData instance

raw data for analysis

targetint

index of target process

sourceslist of int | int | ‘all’ [optional]

single index or list of indices of source processes (default=’all’), if ‘all’, all network nodes excluding the target node are considered as potential sources

Returns:

dict: results consisting of sets of selected variables as (full set, variables from the sources’ past), pvalues and MI for each selected variable, the current value for this analysis, results for omnibus test (joint MI between all selected source variables and the target, omnibus MI, p-value, and significance); NOTE that all variables are listed as tuples (process, lag wrt. current value)

idtxl.bivariate_pid module¶

Estimate partial information decomposition (PID).

Estimate PID for two source and one target process using different estimators.

Note:: Written for Python 3.4+

class idtxl.bivariate_pid.BivariatePID[source]¶

Bases: idtxl.single_process_analysis.SingleProcessAnalysis

Perform partial information decomposition for individual processes.

Perform partial information decomposition (PID) for two source processes and one target process in the network. Estimate unique, shared, and synergistic information in the two sources about the target. Call analyse_network() on the whole network or a set of nodes or call analyse_single_target() to estimate PID for a single process. See docstrings of the two functions for more information.

References:

Williams, P. L., & Beer, R. D. (2010). Nonnegative Decomposition of Multivariate Information, 1–14. Retrieved from http://arxiv.org/abs/1004.2515
Bertschinger, N., Rauh, J., Olbrich, E., Jost, J., & Ay, N. (2014). Quantifying Unique Information. Entropy, 16(4), 2161–2183. http://doi.org/10.3390/e16042161

Attributes:

targetint: index of target process
sourcesarray type: pair of indices of source processes
settingsdict: analysis settings
resultsdict: estimated PID

analyse_network(settings, data, targets, sources)[source]¶

Estimate partial information decomposition for network nodes.

Estimate partial information decomposition (PID) for multiple nodes in the network.

Note:: For a detailed description of the algorithm and settings see documentation of the analyse_single_target() method and references in the class docstring.

Example:

>>> n = 20
>>> alph = 2
>>> x = np.random.randint(0, alph, n)
>>> y = np.random.randint(0, alph, n)
>>> z = np.logical_xor(x, y).astype(int)
>>> data = Data(np.vstack((x, y, z)), 'ps', normalise=False)
>>> settings = {
>>>     'lags_pid': [[1, 1], [3, 2], [0, 0]],
>>>     'alpha': 0.1,
>>>     'alph_s1': alph,
>>>     'alph_s2': alph,
>>>     'alph_t': alph,
>>>     'max_unsuc_swaps_row_parm': 60,
>>>     'num_reps': 63,
>>>     'max_iters': 1000,
>>>     'pid_estimator': 'SydneyPID'}
>>> targets = [0, 1, 2]
>>> sources = [[1, 2], [0, 2], [0, 1]]
>>> pid_analysis = BivariatePID()
>>> results = pid_analysis.analyse_network(settings, data, targets,
>>>                                        sources)

Args:

settingsdict

parameters for estimation and statistical testing, see documentation of analyse_single_target() for details, can contain

lags_pid : list of lists of ints [optional] - lags in samples between sources and target (default=[[1, 1], [1, 1] …])

dataData instance

raw data for analysis

targetslist of int

index of target processes

sourceslist of lists

indices of the two source processes for each target, e.g., [[0, 2], [1, 0]], must have the same length as targets

Returns:

ResultsPID instance: results of network inference, see documentation of ResultsPID()

analyse_single_target(settings, data, target, sources)[source]¶

Estimate partial information decomposition for a network node.

Estimate partial information decomposition (PID) for a target node in the network.

Note:: For a description of the algorithm and the method see references in the class and estimator docstrings.

Example:

>>> n = 20
>>> alph = 2
>>> x = np.random.randint(0, alph, n)
>>> y = np.random.randint(0, alph, n)
>>> z = np.logical_xor(x, y).astype(int)
>>> data = Data(np.vstack((x, y, z)), 'ps', normalise=False)
>>> settings = {
>>>     'alpha': 0.1,
>>>     'alph_s1': alph,
>>>     'alph_s2': alph,
>>>     'alph_t': alph,
>>>     'max_unsuc_swaps_row_parm': 60,
>>>     'num_reps': 63,
>>>     'max_iters': 1000,
>>>     'pid_calc_name': 'SydneyPID',
>>>     'lags_pid': [2, 3]}
>>> pid_analysis = BivariatePID()
>>> results = pid_analysis.analyse_single_target(settings=settings,
>>>                                              data=data,
>>>                                              target=0,
>>>                                              sources=[1, 2])

Args: settings : dict parameters for estimator use and statistics:

pid_estimator : str - estimator to be used for PID estimation (for estimator settings see the documentation in the estimators_pid modules)

lags_pid : list of ints [optional] - lags in samples between sources and target (default=[1, 1])

verbose : bool [optional] - toggle console output (default=True)

dataData instance
raw data for analysis

targetint
index of target processes

sourceslist of ints
indices of the two source processes for the target

Returns: ResultsPID instance results of: network inference, see documentation of ResultsPID()

idtxl.multivariate_te module¶

Perform network inference using multivarate transfer entropy.

Estimate multivariate transfer entropy (TE) for network inference using a greedy approach with maximum statistics to generate a non-uniform embedding (Faes, 2011; Lizier, 2012).

Note:: Written for Python 3.4+

class idtxl.multivariate_te.MultivariateTE[source]¶

Bases: idtxl.network_inference.NetworkInferenceTE, idtxl.network_inference.NetworkInferenceMultivariate

Perform network inference using multivariate transfer entropy.

Perform network inference using multivariate transfer entropy (TE). To perform network inference call analyse_network() on the whole network or a set of nodes or call analyse_single_target() to estimate TE for a single target. See docstrings of the two functions for more information.

References:

Schreiber, T. (2000). Measuring Information Transfer. Phys Rev Lett, 85(2), 461–464. http://doi.org/10.1103/PhysRevLett.85.461
Vicente, R., Wibral, M., Lindner, M., & Pipa, G. (2011). Transfer entropy-a model-free measure of effective connectivity for the neurosciences. J Comp Neurosci, 30(1), 45–67. http://doi.org/10.1007/s10827-010-0262-3
Lizier, J. T., & Rubinov, M. (2012). Multivariate construction of effective computational networks from observational data. Max Planck Institute: Preprint. Retrieved from http://www.mis.mpg.de/preprints/2012/preprint2012_25.pdf
Faes, L., Nollo, G., & Porta, A. (2011). Information-based detection of nonlinear Granger causality in multivariate processes via a nonuniform embedding technique. Phys Rev E, 83, 1–15. http://doi.org/10.1103/PhysRevE.83.051112

Attributes:

source_setlist: indices of source processes tested for their influence on the target
targetlist: index of target process
settingsdict: analysis settings
current_valuetuple: index of the current value in TE estimation, (idx process, idx sample)
selected_vars_fulllist of tuples: samples in the full conditional set, (idx process, idx sample)
selected_vars_sourceslist of tuples: source samples in the conditional set, (idx process, idx sample)
selected_vars_targetlist of tuples: target samples in the conditional set, (idx process, idx sample)
pvalue_omnibusfloat: p-value of the omnibus test
pvalues_sign_sourcesnumpy array: array of p-values for TE from individual sources to the target
statistic_omnibusfloat: joint TE from all sources to the target
statistic_sign_sourcesnumpy array: raw TE values from individual sources to the target
sign_ominbusbool: statistical significance of the over-all TE

analyse_network(settings, data, targets='all', sources='all')[source]¶

Find multivariate transfer entropy between all nodes in the network.

Estimate multivariate transfer entropy (TE) between all nodes in the network or between selected sources and targets.

Note:

For a detailed description of the algorithm and settings see documentation of the analyse_single_target() method and references in the class docstring.

Example:

>>> data = Data()
>>> data.generate_mute_data(100, 5)
>>> settings = {
>>>     'cmi_estimator':  'JidtKraskovCMI',
>>>     'n_perm_max_stat': 200,
>>>     'n_perm_min_stat': 200,
>>>     'n_perm_omnibus': 500,
>>>     'n_perm_max_seq': 500,
>>>     'max_lag_sources': 5,
>>>     'min_lag_sources': 2
>>>     }
>>> network_analysis = MultivariateTE()
>>> results = network_analysis.analyse_network(settings, data)

Args:

settingsdict

parameters for estimation and statistical testing, see documentation of analyse_single_target() for details, settings can further contain

verbose : bool [optional] - toggle console output (default=True)
fdr_correction : bool [optional] - correct results on the network level, see documentation of stats.network_fdr() for details (default=True)

dataData instance

raw data for analysis

targetslist of int | ‘all’ [optional]

index of target processes (default=’all’)

sourceslist of int | list of list | ‘all’ [optional]

indices of source processes for each target (default=’all’); if ‘all’, all network nodes excluding the target node are considered as potential sources and tested; if list of int, the source specified by each int is tested as a potential source for the target with the same index or a single target; if list of list, sources specified in each inner list are tested for the target with the same index

Returns:

ResultsNetworkInference instance: results of network inference, see documentation of ResultsNetworkInference()

analyse_single_target(settings, data, target, sources='all')[source]¶

Find multivariate transfer entropy between sources and a target.

Find multivariate transfer entropy (TE) between all source processes and the target process. Uses multivariate, non-uniform embedding found through information maximisation. Multivariate TE is calculated in four steps:

find all relevant variables in the target processes’ own past, by iteratively adding candidate variables that have significant conditional mutual information (CMI) with the current value (conditional on all variables that were added previously)
find all relevant variables in the source processes’ pasts (again by finding all candidates with significant CMI)
prune the final conditional set by testing the CMI between each variable in the final set and the current value, conditional on all other variables in the final set
statistics on the final set of sources (test for over-all transfer between the final conditional set and the current value, and for significant transfer of all individual variables in the set)

Note:: For a further description of the algorithm see references in the class docstring.

Example:

>>> data = Data()
>>> data.generate_mute_data(100, 5)
>>> settings = {
>>>     'cmi_estimator':  'JidtKraskovCMI',
>>>     'n_perm_max_stat': 200,
>>>     'n_perm_min_stat': 200,
>>>     'n_perm_omnibus': 500,
>>>     'n_perm_max_seq': 500,
>>>     'max_lag_sources': 5,
>>>     'min_lag_sources': 2
>>>     }
>>> target = 0
>>> sources = [1, 2, 3]
>>> network_analysis = MultivariateTE()
>>> results = network_analysis.analyse_single_target(settings,
>>>                                                  data, target,
>>>                                                  sources)

Args:

settingsdict

parameters for estimation and statistical testing:

cmi_estimator : str - estimator to be used for CMI calculation (for estimator settings see the documentation in the estimators_* modules)
max_lag_sources : int - maximum temporal search depth for candidates in the sources’ past in samples
min_lag_sources : int - minimum temporal search depth for candidates in the sources’ past in samples
max_lag_target : int [optional] - maximum temporal search depth for candidates in the target’s past in samples (default=same as max_lag_sources)
tau_sources : int [optional] - spacing between candidates in the sources’ past in samples (default=1)
tau_target : int [optional] - spacing between candidates in the target’s past in samples (default=1)
n_perm_* : int [optional] - number of permutations, where * can be ‘max_stat’, ‘min_stat’, ‘omnibus’, and ‘max_seq’ (default=500)
alpha_* : float [optional] - critical alpha level for statistical significance, where * can be ‘max_stats’, ‘min_stats’, ‘omnibus’, and ‘max_seq’ (default=0.05)
add_conditionals : list of tuples | str [optional] - force the estimator to add these conditionals when estimating TE; can either be a list of variables, where each variable is described as (idx process, lag wrt to current value) or can be a string: ‘faes’ for Faes-Method (see references)
permute_in_time : bool [optional] - force surrogate creation by shuffling realisations in time instead of shuffling replications; see documentation of Data.permute_samples() for further settings (default=False)
verbose : bool [optional] - toggle console output (default=True)
write_ckp : bool [optional] - enable checkpointing, writes analysis state to disk every time a variable is selected; resume crashed analysis using network_analysis.resume_checkpoint() (default=False)
filename_ckp : str [optional] - checkpoint file name (without extension) (default=’./idtxl_checkpoint’)

dataData instance

raw data for analysis

targetint

index of target process

sourceslist of int | int | ‘all’ [optional]

single index or list of indices of source processes (default=’all’), if ‘all’, all network nodes excluding the target node are considered as potential sources

Returns:

ResultsNetworkInference instance: results of network inference, see documentation of ResultsNetworkInference()

getit()[source]¶

idtxl.multivariate_mi module¶

Perform network inference using multivarate mutual information.

Estimate multivariate mutual information (MI) for network inference using a greedy approach with maximum statistics to generate a non-uniform embedding (Faes, 2011; Lizier, 2012).

Note:: Written for Python 3.4+

class idtxl.multivariate_mi.MultivariateMI[source]¶

Bases: idtxl.network_inference.NetworkInferenceMI, idtxl.network_inference.NetworkInferenceMultivariate

Perform network inference using multivariate mutual information.

Perform network inference using multivariate mutual information (MI). To perform network inference call analyse_network() on the whole network or a set of nodes or call analyse_single_target() to estimate MI for a single target. See docstrings of the two functions for more information.

References:

Lizier, J. T., & Rubinov, M. (2012). Multivariate construction of effective computational networks from observational data. Max Planck Institute: Preprint. Retrieved from http://www.mis.mpg.de/preprints/2012/preprint2012_25.pdf
Faes, L., Nollo, G., & Porta, A. (2011). Information-based detection of nonlinear Granger causality in multivariate processes via a nonuniform embedding technique. Phys Rev E, 83, 1–15. http://doi.org/10.1103/PhysRevE.83.051112

Attributes:

source_setlist: indices of source processes tested for their influence on the target
targetlist: index of target process
settingsdict: analysis settings
current_valuetuple: index of the current value in MI estimation, (idx process, idx sample)
selected_vars_fulllist of tuples: samples in the full conditional set, (idx process, idx sample)
selected_vars_sourceslist of tuples: source samples in the conditional set, (idx process, idx sample)
pvalue_omnibusfloat: p-value of the omnibus test
pvalues_sign_sourcesnumpy array: array of p-values for MI from individual sources to the target
mi_omnibusfloat: joint MI from all sources to the target
mi_sign_sourcesnumpy array: raw MI values from individual sources to the target
sign_ominbusbool: statistical significance of the over-all MI

analyse_network(settings, data, targets='all', sources='all')[source]¶

Find multivariate mutual information between nodes in the network.

Estimate multivariate mutual information (MI) between all nodes in the network or between selected sources and targets.

Note:: For a detailed description of the algorithm and settings see documentation of the analyse_single_target() method and references in the class docstring.

Example:

>>> data = Data()
>>> data.generate_mute_data(100, 5)
>>> # The algorithm uses a conditional mutual information to
>>> # construct a non-uniform embedding, hence a CMI- not MI-
>>> # estimator has to be specified:
>>> settings = {
>>>     'cmi_estimator':  'JidtKraskovCMI',
>>>     'n_perm_max_stat': 200,
>>>     'n_perm_min_stat': 200,
>>>     'n_perm_omnibus': 500,
>>>     'n_perm_max_seq': 500,
>>>     'max_lag_sources': 5,
>>>     'min_lag_sources': 2
>>>     }
>>> network_analysis = MultivariateMI()
>>> results = network_analysis.analyse_network(settings, data)

Args:

settingsdict

parameters for estimation and statistical testing, see documentation of analyse_single_target() for details, settings can further contain

verbose : bool [optional] - toggle console output (default=True)
fdr_correction : bool [optional] - correct results on the network level, see documentation of stats.network_fdr() for details (default=True)

dataData instance

raw data for analysis

targetslist of int | ‘all’ [optional]

index of target processes (default=’all’)

sourceslist of int | list of list | ‘all’ [optional]

indices of source processes for each target (default=’all’); if ‘all’, all network nodes excluding the target node are considered as potential sources and tested; if list of int, the source specified by each int is tested as a potential source for the target with the same index or a single target; if list of list, sources specified in each inner list are tested for the target with the same index

Returns:

dict: results for each target, see documentation of analyse_single_target(); results FDR-corrected, see documentation of stats.network_fdr()

analyse_single_target(settings, data, target, sources='all')[source]¶

Find multivariate mutual information between sources and a target.

Find multivariate mutual information (MI) between all source processes and the target process. Uses multivariate, non-uniform embedding found through information maximisation .

Multivariate MI is calculated in four steps (see Lizier and Faes for details):

Note:: For a further description of the algorithm see references in the class docstring.

Find all relevant samples in the source processes’ past, by iteratively adding candidate samples that have significant conditional mutual information (CMI) with the current value (conditional on all samples that were added previously)
Prune the final conditional set by testing the CMI between each sample in the final set and the current value, conditional on all other samples in the final set
Statistics on the final set of sources (test for over-all transfer between the final conditional set and the current value, and for significant transfer of all individual samples in the set)

Example:

>>> data = Data()
>>> data.generate_mute_data(100, 5)
>>> # The algorithm uses a conditional mutual information to
>>> # construct a non-uniform embedding, hence a CMI- not MI-
>>> # estimator has to be specified:
>>> settings = {
>>>     'cmi_estimator':  'JidtKraskovCMI',
>>>     'n_perm_max_stat': 200,
>>>     'n_perm_min_stat': 200,
>>>     'n_perm_omnibus': 500,
>>>     'n_perm_max_seq': 500,
>>>     'max_lag_sources': 5,
>>>     'min_lag_sources': 2
>>>     }
>>> target = 0
>>> sources = [1, 2, 3]
>>> network_analysis = MultivariateMI()
>>> results = network_analysis.analyse_single_target(settings,
>>>                                                  data, target,
>>>                                                  sources)

Args:

settingsdict

parameters for estimation and statistical testing:

cmi_estimator : str - estimator to be used for CMI calculation (for estimator settings see the documentation in the estimators_* modules)
max_lag_sources : int - maximum temporal search depth for candidates in the sources’ past in samples
min_lag_sources : int - minimum temporal search depth for candidates in the sources’ past in samples
tau_sources : int [optional] - spacing between candidates in the sources’ past in samples (default=1)
n_perm_* : int [optional] - number of permutations, where * can be ‘max_stat’, ‘min_stat’, ‘omnibus’, and ‘max_seq’ (default=500)
alpha_* : float [optional] - critical alpha level for statistical significance, where * can be ‘max_stats’, ‘min_stats’, ‘omnibus’, and ‘max_seq’ (default=0.05)
add_conditionals : list of tuples | str [optional] - force the estimator to add these conditionals when estimating MI; can either be a list of variables, where each variable is described as (idx process, lag wrt to current value) or can be a string: ‘faes’ for Faes-Method (see references)
permute_in_time : bool [optional] - force surrogate creation by shuffling realisations in time instead of shuffling replications; see documentation of Data.permute_samples() for further settings (default=False)
verbose : bool [optional] - toggle console output (default=True)
write_ckp : bool [optional] - enable checkpointing, writes analysis state to disk every time a variable is selected; resume crashed analysis using network_analysis.resume_checkpoint() (default=False)
filename_ckp : str [optional] - checkpoint file name (without extension) (default=’./idtxl_checkpoint’)

dataData instance

raw data for analysis

targetint

index of target process

sourceslist of int | int | ‘all’ [optional]

single index or list of indices of source processes (default=’all’), if ‘all’, all network nodes excluding the target node are considered as potential sources

Returns:

dict: results consisting of sets of selected variables as (full set, variables from the sources’ past), pvalues and MI for each selected variable, the current value for this analysis, results for omnibus test (joint MI between all selected source variables and the target, omnibus MI, p-value, and significance); NOTE that all variables are listed as tuples (process, lag wrt. current value)

idtxl.multivariate_pid module¶

Estimate partial information decomposition (PID).

Estimate PID for multiple sources (up to 4 sources) and one target process using SxPID estimator.

Note:: Written for Python 3.4+

class idtxl.multivariate_pid.MultivariatePID[source]¶

Bases: idtxl.single_process_analysis.SingleProcessAnalysis

Perform partial information decomposition for individual processes.

Perform partial information decomposition (PID) for multiple source processes (up to 4 sources) and a target process in the network. Estimate unique, shared, and synergistic information in the multiple sources about the target. Call analyse_network() on the whole network or a set of nodes or call analyse_single_target() to estimate PID for a single process. See docstrings of the two functions for more information.

References:

Williams, P. L., & Beer, R. D. (2010). Nonnegative Decomposition of Multivariate Information, 1–14. Retrieved from http://arxiv.org/abs/1004.2515
Makkeh, A. & Gutknecht, A. & Wibral, M. (2020). A Differentiable measure for shared information. 1- 27 Retrieved from http://arxiv.org/abs/2002.03356

Attributes:

targetint: index of target process
sourcesarray type: multiple of indices of source processes
settingsdict: analysis settings
resultsdict: estimated PID

analyse_network(settings, data, targets, sources)[source]¶

Estimate partial information decomposition for network nodes.

Estimate, for multiple nodes (target processes), the partial information decomposition (PID) for multiple source processes (up to 4 sources) and each of these target processes in the network.

Note:: For a detailed description of the algorithm and settings see documentation of the analyse_single_target() method and references in the class docstring.

Example:

>>> n = 20
>>> alph = 2
>>> s1 = np.random.randint(0, alph, n)
>>> s2 = np.random.randint(0, alph, n)
>>> s3 = np.random.randint(0, alph, n)
>>> target1 = np.logical_xor(s1, s2).astype(int)
>>> target  = np.logical_xor(target1, s3).astype(int)
>>> data = Data(np.vstack((s1, s2, s3, target)), 'ps',
>>> normalise=False)
>>> settings = {
>>>     'lags_pid': [[1, 1, 1], [3, 2, 7]],
>>>     'verbose': False,
>>>     'pid_estimator': 'SxPID'}
>>> targets = [0, 1]
>>> sources = [[1, 2, 3], [0, 2, 3]]
>>> pid_analysis = MultivariatePID()
>>> results = pid_analysis.analyse_network(settings, data, targets,
>>>                                        sources)

Args:

settingsdict

parameters for estimation and statistical testing, see documentation of analyse_single_target() for details, can contain

lags_pid : list of lists of ints [optional] - lags in samples between sources and target (default=[[1, 1, …, 1], [1, 1, …, 1], …])

dataData instance

raw data for analysis

targetslist of int

index of target processes

sourceslist of lists

indices of the multiple source processes for each target, e.g., [[0, 1, 2], [1, 0, 3]], all must lists be of the same lenght and list of lists must have the same length as targets

Returns:

ResultsMultivariatePID instance: results of network inference, see documentation of ResultsMultivariatePID()

analyse_single_target(settings, data, target, sources)[source]¶

Estimate partial information decomposition for a network node.

Estimate partial information decomposition (PID) for multiple source processes (up to 4 sources) and a target process in the network.

Note:: For a description of the algorithm and the method see references in the class and estimator docstrings.

Example:

>>> n = 20
>>> alph = 2
>>> s1 = np.random.randint(0, alph, n)
>>> s2 = np.random.randint(0, alph, n)
>>> s3 = np.random.randint(0, alph, n)
>>> target1 = np.logical_xor(s1, s2).astype(int)
>>> target  = np.logical_xor(target1, s3).astype(int)
>>> data = Data(np.vstack((s1, s2, s3, target)), 'ps',
>>> normalise=False)
>>> settings = {
>>>     'verbose' : false,
>>>     'pid_estimator': 'SxPID',
>>>     'lags_pid': [2, 3, 1]}
>>> pid_analysis = MultivariatePID()
>>> results = pid_analysis.analyse_single_target(settings=settings,
>>>                                              data=data,
>>>                                              target=0,
>>>                                              sources=[1, 2, 3])

Args: settings : dict parameters for estimator use and statistics:

pid_estimator : str - estimator to be used for PID estimation (for estimator settings see the documentation in the estimators_pid modules)

lags_pid : list of ints [optional] - lags in samples between sources and target (default=[1, 1, …, 1])

verbose : bool [optional] - toggle console output (default=True)

dataData instance
raw data for analysis

targetint
index of target processes

sourceslist of ints
indices of the multiple source processes for the target

Returns: ResultsMultivariatePID instance results of: network inference, see documentation of ResultsPID()

idtxl.active_information_storage module¶

Analysis of AIS in a network of processes.

Analysis of active information storage (AIS) in individual processes of a network. The algorithm uses non-uniform embedding as described in Faes (2011).

Note:: Written for Python 3.4+

class idtxl.active_information_storage.ActiveInformationStorage[source]¶

Bases: idtxl.single_process_analysis.SingleProcessAnalysis

Estimate active information storage in individual processes.

Estimate active information storage (AIS) in individual processes of the network. To perform AIS estimation call analyse_network() on the whole network or a set of nodes or call analyse_single_process() to estimate AIS for a single process. See docstrings of the two functions for more information.

References:

Lizier, J. T., Prokopenko, M., & Zomaya, A. Y. (2012). Local measures of information storage in complex distributed computation. Inform Sci, 208, 39–54. http://doi.org/10.1016/j.ins.2012.04.016
Wibral, M., Lizier, J. T., Vögler, S., Priesemann, V., & Galuske, R. (2014). Local active information storage as a tool to understand distributed neural information processing. Front Neuroinf, 8, 1. http://doi.org/10.3389/fninf.2014.00001
Faes, L., Nollo, G., & Porta, A. (2011). Information-based detection of nonlinear Granger causality in multivariate processes via a nonuniform embedding technique. Phys Rev E, 83, 1–15. http://doi.org/10.1103/PhysRevE.83.051112

Attributes:

process_setlist: list with indices of analyzed processes
settingsdict: analysis settings
current_valuetuple: index of the current value in AIS estimation, (idx process, idx sample)
selected_vars_fulllist of tuples: samples in the past state, (idx process, idx sample)
aisfloat: raw AIS value
signbool: true if AIS is significant
pvalue: float: p-value of AIS

analyse_network(settings, data, processes='all')[source]¶

Estimate active information storage for multiple network processes.

Estimate active information storage for all or a subset of processes in the network.

Note:: For a detailed description of the algorithm and settings see documentation of the analyse_single_process() method and references in the class docstring.

Example:

>>> data = Data()
>>> data.generate_mute_data(100, 5)
>>> settings = {
>>>     'cmi_estimator': 'JidtKraskovCMI',
>>>     'n_perm_max_stat': 200,
>>>     'n_perm_min_stat': 200,
>>>     'max_lag': 5,
>>>     'tau': 1
>>>     }
>>> processes = [1, 2, 3]
>>> network_analysis = ActiveInformationStorage()
>>> results = network_analysis.analyse_network(settings, data,
>>>                                            processes)

Args:

settingsdict

parameters for estimation and statistical testing, see documentation of analyse_single_target() for details, settings can further contain

verbose : bool [optional] - toggle console output (default=True)
fdr_correction : bool [optional] - correct results on the network level, see documentation of stats.ais_fdr() for details (default=True)

dataData instance

raw data for analysis

processeslist of int | ‘all’

index of processes (default=’all’); if ‘all’, AIS is estimated for all processes; if list of int, AIS is estimated for processes specified in the list.

Returns:

ResultsSingleProcessAnalysis instance: results of network AIS estimation, see documentation of ResultsSingleProcessAnalysis()

analyse_single_process(settings, data, process)[source]¶

Estimate active information storage for a single process.

Estimate active information storage for one process in the network. Uses non-uniform embedding found through information maximisation. This is done in three steps (see Lizier and Faes for details):

Find all relevant samples in the processes’ own past, by iteratively adding candidate samples that have significant conditional mutual information (CMI) with the current value (conditional on all samples that were added previously)
Prune the final conditional set by testing the CMI between each sample in the final set and the current value, conditional on all other samples in the final set
Calculate AIS using the final set of candidates as the past state (calculate MI between samples in the past and the current value); test for statistical significance using a permutation test

Note:

For a further description of the algorithm see references in the class docstring.

Args:

settingsdict

parameters for estimator use and statistics:

cmi_estimator : str - estimator to be used for CMI and MI calculation (for estimator settings see the documentation in the estimators_* modules)
max_lag : int - maximum temporal search depth for candidates in the processes’ past in samples
tau : int [optional] - spacing between candidates in the sources’ past in samples (default=1)
n_perm_* : int [optional] - number of permutations, where * can be ‘max_stat’, ‘min_stat’, ‘mi’ (default=500)
alpha_* : float [optional] - critical alpha level for statistical significance, where * can be ‘max_stat’, ‘min_stat’, ‘mi’ (default=0.05)
add_conditionals : list of tuples | str [optional] - force the estimator to add these conditionals when estimating TE; can either be a list of variables, where each variable is described as (idx process, lag wrt to current value) or can be a string: ‘faes’ for Faes-Method (see references)
permute_in_time : bool [optional] - force surrogate creation by shuffling realisations in time instead of shuffling replications; see documentation of Data.permute_samples() for further settings (default=False)
verbose : bool [optional] - toggle console output (default=True)
write_ckp : bool [optional] - enable checkpointing, writes analysis state to disk every time a variable is selected; resume crashed analysis using network_analysis.resume_checkpoint() (default=False)
filename_ckp : str [optional] - checkpoint file name (without extension) (default=’./idtxl_checkpoint’)

dataData instance

raw data for analysis

processint

index of process

Returns:

ResultsSingleProcessAnalysis instance: results of AIS estimation, see documentation of ResultsSingleProcessAnalysis()

idtxl.embedding_optimization_ais_Rudelt module¶

Optimization of embedding parameters of spike times using the history dependence estimators

class idtxl.embedding_optimization_ais_Rudelt.OptimizationRudelt(settings=None)[source]¶

Bases: object

Optimization of embedding parameters of spike times using the history dependence estimators

References:

[1]: L. Rudelt, D. G. Marx, M. Wibral, V. Priesemann: Embedding
optimization reveals long-lasting history dependence in neural spiking activity, 2021, PLOS Computational Biology, 17(6)

[2]: https://github.com/Priesemann-Group/hdestimator

implemented in idtxl by Michael Lindner, Göttingen 2021

Args:

settingsdict

estimation_methodstring
The method to be used to estimate the history dependence ‘bbc’ or ‘shuffling’.
embedding_step_sizefloat
Step size delta t (in seconds) with which the window is slid through the data. (default: 0.005)
embedding_number_of_bins_setlist of integer values
Set of values for d, the number of bins in the embedding. (default: [1, 2, 3, 4, 5])
embedding_past_range_setlist of floating-point values
Set of values for T, the past range (in seconds) to be used for embeddings. (default: [0.005, 0.00561, 0.00629, 0.00706, 0.00792, 0.00889, 0.00998, 0.01119, 0.01256, 0.01409, 0.01581, 0.01774, 0.01991, 0.02233, 0.02506, 0.02812, 0.03155, 0.0354, 0.03972, 0.04456, 0.05, 0.0561, 0.06295, 0.07063, 0.07924, 0.08891, 0.09976, 0.11194, 0.12559, 0.14092, 0.15811, 0.17741, 0.19905, 0.22334, 0.25059, 0.28117, 0.31548, 0.35397, 0.39716, 0.44563, 0.5, 0.56101, 0.62946, 0.70627, 0.79245, 0.88914, 0.99763, 1.11936, 1.25594, 1.40919, 1.58114, 1.77407, 1.99054, 2.23342, 2.50594, 2.81171, 3.15479, 3.53973, 3.97164, 4.45625, 5.0])
embedding_scaling_exponent_setdict
Set of values for kappa, the scaling exponent for the bins in the embedding. Should be a python-dictionary with the three entries ‘number_of_scalings’, ‘min_first_bin_size’ and ‘min_step_for_scaling’. defaults: {‘number_of_scalings’: 10, ‘min_first_bin_size’: 0.005, ‘min_step_for_scaling’: 0.01})
bbc_tolerancefloat
The tolerance for the Bayesian Bias Criterion. Influences which embeddings are discarded from the analysis. (default: 0.05)
return_averaged_Rbool
Return R_tot as the average over R(T) for T in [T_D, T_max], instead of R_tot = R(T_D). If set to True, the setting for number_of_bootstraps_R_tot (see below) is ignored and set to 0 and CI bounds are not calculated. (default: True)
timescale_minimum_past_rangefloat
Minimum past range T_0 (in seconds) to take into consideration for the estimation of the information timescale tau_R. (default: 0.01)
number_of_bootstraps_R_maxint
The number of bootstrap re-shuffles that should be used to determine the optimal embedding. (Bootstrap the estimates of R_max to determine R_tot.) These are computed during the ‘history-dependence’ task because they are essential to obtain R_tot. (default: 250)
number_of_bootstraps_R_totint
The number of bootstrap re-shuffles that should be used to estimate the confidence interval of the optimal embedding. (Bootstrap the estimates of R_tot = R(T_D) to obtain a confidence interval for R_tot.). These are computed during the ‘confidence-intervals’ task. The setting return_averaged_R (see above) needs to be set to False for this setting to take effect. (default: 250)
number_of_bootstraps_nonessentialint
The number of bootstrap re-shuffles that should be used to estimate the confidence intervals for embeddings other than the optimal one. (Bootstrap the estimates of R(T) for all other T.) (These are not necessary for the main analysis and therefore default to 0.)
symbol_block_lengthint
The number of symbols that should be drawn in each block for bootstrap resampling If it is set to None (recommended), the length is automatically chosen, based on heuristics (default: None)
bootstrap_CI_use_sdbool
Most of the time we observed normally-distributed bootstrap replications, so it is sufficient (and more efficient) to compute confidence intervals based on the standard deviation (default: True)
bootstrap_CI_percentile_lofloat
The lower percentile for the confidence interval. This has no effect if bootstrap_CI_use_sd is set to True (default: 2.5)
bootstrap_CI_percentile_hifloat
The upper percentiles for the confidence interval. This has no effect if bootstrap_CI_use_sd is set to True (default: 97.5)
analyse_auto_MIbool
perform calculation of auto mutual information of the spike train (default: True) If set to True:
- auto_MI_bin_size_setlist of floating-point values
  Set of values for the sizes of the bins (in seconds). (default: [0.005, 0.01, 0.025, 0.05, 0.25, 0.5])
- auto_MI_max_delayint
  The maximum delay (in seconds) between the past bin and the response. (default: 5)
visualizationbool
create .eps output image showing the optimization values and graphs for the history dependence and the auto mutual information (default: False) if set to True:
- output_pathString
  Path where the .eps images should be saved
- output_prefixString
  Prefix of the output images e.g. <output_prefix>_process0.eps
debug: bool
show values while calculating (default: False)

analyse_auto_MI(spike_times)[source]¶: Get the auto MI for the spike times. If it is available from file, load it, else compute it.

check_inputs()[source]¶: Check input settings for completeness

compute_CIs(data, target_R='R_max', symbol_block_length=None)[source]¶

Compute bootstrap replications of the history dependence estimate which can be used to obtain confidence intervals.

Args:

datadata_spiketime object: Input data
target_RString: One of ‘R_max’, ‘R_tot’ or ‘nonessential’. If set to R_max, replications of R are produced for the T at which R is maximised. If set to R_tot, replications of R are produced for T = T_D (cf get_temporal_depth_T_D). If set to nonessential, replications of R are produced for each T (one embedding per T, cf get_embeddings_that_maximise_R). These are not otherwise used in the analysis and are probably only useful if the resulting plot is visually inspected, so in most cases it can be set to zero.
symbol_block_lengthint: The number of symbols that should be drawn in each block for bootstrap resampling If it is set to None (recommended), the length is automatically chosen, based on heuristics

get_R_tot(return_averaged_R=False, **kwargs)[source]¶

get_auto_MI(spike_times, bin_size, number_of_delays)[source]¶: Compute the auto mutual information in the neuron’s activity, a measure closely related to history dependence.

get_bootstrap_history_dependence(data, embedding, number_of_bootstraps, symbol_block_length=None)[source]¶: For a given embedding, return bootstrap replications for R.

get_embeddings(embedding_past_range_set, embedding_number_of_bins_set, embedding_scaling_exponent_set)[source]¶: Get all combinations of parameters T, d, k, based on the sets of selected parameters.

get_embeddings_that_maximise_R(bbc_tolerance=None, dependent_var='T', get_as_list=False)[source]¶

For each T (or d), get the embedding for which R is maximised.

For the bbc estimator, here the bbc_tolerance is applied, ie get the unbiased embeddings that maximise R.

get_history_dependence(data, process)[source]¶: Estimate the history dependence for each embedding to all given processes.

get_information_timescale_tau_R()[source]¶: Get the information timescale tau_R, a characteristic timescale of history dependence similar to an autocorrelation time.

get_past_range(number_of_bins_d, first_bin_size, scaling_k)[source]¶: Get the past range T of the embedding, based on the parameters d, tau_1 and k.

get_set_of_scalings(past_range_T, number_of_bins_d, number_of_scalings, min_first_bin_size, min_step_for_scaling)[source]¶: Get scaling exponents such that the uniform embedding as well as the embedding for which the first bin has a length of min_first_bin_size (in seconds), as well as linearly spaced scaling factors in between, such that in total number_of_scalings scalings are obtained.

get_temporal_depth_T_D(get_R_thresh=False)[source]¶

Get the temporal depth T_D, the past range for the ‘optimal’ embedding parameters.

Given the maximal history dependence R at each past range T, (cf get_embeddings_that_maximise_R), first find the smallest T at which R is maximised (cf get_max_R_T). If bootstrap replications for this R are available, get the smallest T at which this R minus one standard deviation of the bootstrap estimates is attained.

optimize(data, processes='all')[source]¶

Optimize the embedding parameters of spike time data using the Rudelt history dependence estimator.

References:

[1]: L. Rudelt, D. G. Marx, M. Wibral, V. Priesemann: Embedding
optimization reveals long-lasting history dependence in neural spiking activity, 2021, PLOS Computational Biology, 17(6)

[2]: https://github.com/Priesemann-Group/hdestimator

implemented in idtxl by Michael Lindner, Göttingen 2021

Args:

dataData_spiketime instance: raw data for analysis
processeslist of int: index of processes; spike times are optimized all processes specified in the list separately.

Returns: # ——————————————————————————————————– TODO

ResultsSingleProcessRudelt instance

results of Rudelt optimization, see documentation of ResultsSingleProcessRudelt()

if visulization in settings was set True (see class OptimizationRudelt):

.eps images are created for each optimized process containing:

optimized values for the process
graph for the history dependence
graph for auto mutual information (if calculated)

optimize_single_run(data, process)[source]¶

optimizes a single realisation of spike time data given the process number

Args:

dataData_spiketime instance: raw data for analysis
processint: index of process;

Returns:

DotDict

with the following keys

Processint: Process that was optimized
estimation_methodString: Estimation method that was used for optimization
T_Dfloat: Estimated optimal value for the temporal depth TD
tau_R :: Information timescale tau_R, a characteristic timescale of history dependence similar to an autocorrelation time.
R_totfloat: Estimated value for the total history dependence Rtot,
AIS_totfloat: Estimated value for the total active information storage
opt_number_of_bins_dint: Number of bins d for the embedding that yields (R̂tot ,T̂D)
opt_scaling_kint: Scaling exponent κ for the embedding that yields (R̂tot , T̂D)
opt_first_bin_sizeint: Size of the first bin τ1 for the embedding that yields (R̂tot , T̂D ),
history_dependencearray with floating-point values: Estimated history dependence for each embedding
firing_ratefloat: Firing rate of the neuron/ spike train
recording_lengthfloat: Length of the recording (in seconds)
H_spikingfloat: Entropy of the spike times

if analyse_auto_MI was set to True additionally:

auto_MIdict: numpy array of MI values for each delay
auto_MI_delayslist of int: list of delays depending on the given auto_MI_bin_sizes and auto_MI_max_delay

remove_subresults_single_process()[source]¶: delete results from self from single process

idtxl.estimators_Rudelt module¶

Provide HDE estimators.

class idtxl.estimators_Rudelt.RudeltAbstractEstimator(settings=None)[source]¶

Bases: idtxl.estimator.Estimator

Abstract class for implementation of nsb and plugin estimators from Rudelt.

Abstract class for implementation of nsb and plugin estimators, child classes implement estimators for mutual information (MI) .

References:

[1]: L. Rudelt, D. G. Marx, M. Wibral, V. Priesemann: Embedding
optimization reveals long-lasting history dependence in neural spiking activity, 2021, PLOS Computational Biology, 17(6)

[2]: https://github.com/Priesemann-Group/hdestimator

implemented in idtxl by Michael Lindner, Göttingen 2021

Args:

settingsdict

embedding_step_sizefloat [optional]
Step size delta t (in seconds) with which the window is slid through the data (default = 0.005).
normalisebool [optional]
rebase spike times to zero (default=True)
return_averaged_Rbool [optional]
If set to True, compute R̂tot as the average over R̂(T ) for T ∈ [T̂D, Tmax ] instead of R̂tot = R(T̂D ). If set to True, the setting for number_of_bootstraps_R_tot is ignored and set to 0 (default=True)

get_median_number_of_spikes_per_bin(raw_symbols)[source]¶: Given raw symbols (in which the number of spikes per bin are counted, ie not necessarily binary quantity), get the median number of spikes for each bin, among all symbols obtained by the embedding.

get_multiplicities(symbol_counts, alphabet_size)[source]¶

Get the multiplicities of some given symbol counts.

To estimate the entropy of a system, it is only important how often a symbol/ event occurs (the probability that it occurs), not what it represents. Therefore, computations can be simplified by summarizing symbols by their frequency, as represented by the multiplicities.

get_past_range(number_of_bins_d, first_bin_size, scaling_k)[source]¶: Get the past range T of the embedding, based on the parameters d, tau_1 and k.

get_raw_symbols(spike_times, embedding, first_bin_size)[source]¶: Get the raw symbols (in which the number of spikes per bin are counted, ie not necessarily binary quantity), as obtained by applying the embedding.

get_symbol_counts(symbol_array)[source]¶: Count how often symbols occur

get_window_delimiters(number_of_bins_d, scaling_k, first_bin_size)[source]¶

Get delimiters of the window, used to describe the embedding. The window includes both the past embedding and the response.

The delimiters are times, relative to the first bin, that separate two consequent bins.

is_analytic_null_estimator()[source]¶

Indicate if estimator supports analytic surrogates.

Return true if the estimator implements estimate_surrogates_analytic() where data is formatted as per the estimate method for this estimator.

Returns:: bool

is_parallel()[source]¶

Indicate if estimator supports parallel estimation over chunks.

Return true if the supports parallel estimation over chunks, where a chunk is one independent data set.

Returns:: bool

symbol_array_to_binary(spikes_in_window, number_of_bins_d)[source]¶: Given an array of 1s and 0s, representing spikes and the absence thereof, read the array as a binary number to obtain a (base 10) integer.

symbol_binary_to_array(symbol_binary, number_of_bins_d)[source]¶: Given a binary representation of a symbol (cf symbol_array_to_binary), convert it back into its array-representation.

class idtxl.estimators_Rudelt.RudeltAbstractNSBEstimator(settings=None)[source]¶

Bases: idtxl.estimators_Rudelt.RudeltAbstractEstimator

Abstract class for implementation of NSB estimators from Rudelt.

Abstract class for implementation of Nemenman-Shafee-Bialek (NSB) estimators, child classes implement nsb estimators for mutual information (MI).

implemented in idtxl by Michael Lindner, Göttingen 2021

References:

[1]: L. Rudelt, D. G. Marx, M. Wibral, V. Priesemann: Embedding
optimization reveals long-lasting history dependence in neural spiking activity, 2021, PLOS Computational Biology, 17(6)

[2]: I. Nemenman, F. Shafee, W. Bialek: Entropy and inference,
revisited. In T.G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, Cambridge, MA, 2002. MIT Press.

Args:

settingsdict

embedding_step_sizefloat [optional]
Step size delta t (in seconds) with which the window is slid through the data (default = 0.005).
normalisebool [optional]
rebase spike times to zero (default=True)
return_averaged_Rbool [optional]
If set to True, compute R̂tot as the average over R̂(T ) for T ∈ [T̂D, Tmax ] instead of R̂tot = R(T̂D ). If set to True, the setting for number_of_bootstraps_R_tot is ignored and set to 0 (default=True)

H1(beta, mk, K, N)[source]¶

Compute the first moment (expectation value) of the entropy H.

H is the entropy one obtains with a symmetric Dirichlet prior with concentration parameter beta and a multinomial likelihood.

alpha_ML(mk, K1, N)[source]¶: Compute first guess for the beta_MAP (cf get_beta_MAP) parameter via the posterior of a Dirichlet process.

d2_log_rho(beta, mk, K, N)[source]¶: Second derivate of the logarithm of the Dirichlet multinomial likelihood.

d2_log_rho_xi(beta, mk, K, N)[source]¶: Second derivative of the logarithm of the nsb (unnormalized) posterior.

d2_xi(beta, K)[source]¶: Second derivative of xi(beta) (cf d_xi).

d3_xi(beta, K)[source]¶: Third derivative of xi(beta) (cf d_xi).

d_log_rho(beta, mk, K, N)[source]¶: First derivate of the logarithm of the Dirichlet multinomial likelihood.

d_log_rho_xi(beta, mk, K, N)[source]¶: First derivative of the logarithm of the nsb (unnormalized) posterior.

d_xi(beta, K)[source]¶

First derivative of xi(beta).

xi(beta) is the entropy of the system when no data has been observed. d_xi is the prior for the nsb estimator

get_beta_MAP(mk, K, N)[source]¶

Get the maximum a posteriori (MAP) value for beta.

Provides the location of the peak, around which we integrate.

beta_MAP is the value for beta for which the posterior of the NSB estimator is maximised (or, equivalently, of the logarithm thereof, as computed here).

get_integration_bounds(mk, K, N)[source]¶

Find the integration bounds for the estimator.

Typically it is a delta-like distribution so it is sufficient to integrate around this peak. (If not this function is not called.)

log_likelihood_DP_alpha(a, K1, N)[source]¶: Alpha-dependent terms of the log-likelihood of a Dirichlet Process.

nsb_entropy(mk, K, N)[source]¶

Estimate the entropy of a system using the NSB estimator.

Parameters

mk – multiplicities
K – number of possible symbols/ state space of the system
N – total number of observed symbols

rho(beta, mk, K, N)[source]¶

rho(beta, data) is the Dirichlet multinomial likelihood.

rho(beta, data) together with the d_xi(beta) make up the posterior for the nsb estimator

unnormalized_posterior(beta, mk, K, N)[source]¶

The (unnormalized) posterior in the nsb estimator.

Product of the likelihood rho and the prior d_xi; the normalizing factor is given by the marginal likelihood

class idtxl.estimators_Rudelt.RudeltBBCEstimator(settings=None)[source]¶

Bases: idtxl.estimators_Rudelt.RudeltAbstractEstimator

Bayesian bias criterion (BBC) Estimator using NSB and Plugin estimator

Calculate the mutual information (MI) of one variable depending on its past using nsb and plugin estimator and check if bias criterion is passed. See parent class for references.

implemented in idtxl by Michael Lindner, Göttingen 2021

Args:

settingsdict

embedding_step_sizefloat [optional]
Step size delta t (in seconds) with which the window is slid through the data (default = 0.005).
normalisebool [optional]
rebase spike times to zero (default=True)
return_averaged_Rbool [optional]
If set to True, compute R̂tot as the average over R̂(T ) for T ∈ [T̂D, Tmax ] instead of R̂tot = R(T̂D ). If set to True, the setting for number_of_bootstraps_R_tot is ignored and set to 0 (default=True)

bayesian_bias_criterion(R_nsb, R_plugin, bbc_tolerance)[source]¶

Get whether the Bayesian bias criterion (bbc) is passed.

Parameters

R_nsb – history dependence computed with NSB estimator
R_plugin – history dependence computed with plugin estimator
bbc_tolerance – tolerance for the Bayesian bias criterion

estimate(symbol_array, past_symbol_array, current_symbol_array, bbc_tolerance=None)[source]¶

Calculate the mutual information (MI) of one variable depending on its past using nsb and plugin estimator and check if bias criterion is passed/

Args:

symbol_array1D numpy array: realisations of symbols based on current and past states. (first output of get_realisations_symbol from data_spiketimes object)
past_symbol_arraynumpy array: realisations of symbols based on current and past states. (first output of get_realisations_symbol from data_spiketimes object)
current_symbol_arraynumpy array: realisations of symbols based on current and past states. (first output of get_realisations_symbol from data_spiketimes object)

Returns:

I (float): MI (AIS)
R (float): MI / H_uncond (History dependence)
bbc_term (float): bbc tolerance-independent term of the Bayesian bias criterion (bbc)

get_bbc_term(R_nsb, R_plugin)[source]¶

Get the bbc tolerance-independent term of the Bayesian bias criterion (bbc).

Parameters

R_nsb – history dependence computed with NSB estimator
R_plugin – history dependence computed with plugin estimator

class idtxl.estimators_Rudelt.RudeltNSBEstimatorSymbolsMI(settings=None)[source]¶

Bases: idtxl.estimators_Rudelt.RudeltAbstractNSBEstimator

History dependence NSB estimator

Calculate the mutual information (MI) of one variable depending on its past using NSB estimator. See parent class for references.

implemented in idtxl by Michael Lindner, Göttingen 2021

Args:

settingsdict

embedding_step_sizefloat [optional]
Step size delta t (in seconds) with which the window is slid through the data (default = 0.005).
normalisebool [optional]
rebase spike times to zero (default=True)
return_averaged_Rbool [optional]
If set to True, compute R̂tot as the average over R̂(T ) for T ∈ [T̂D, Tmax ] instead of R̂tot = R(T̂D ). If set to True, the setting for number_of_bootstraps_R_tot is ignored and set to 0 (default=True)

estimate(symbol_array, past_symbol_array, current_symbol_array)[source]¶

Estimate mutual information using NSB estimator.

Args:

symbol_array1D numpy array: realisations of symbols based on current and past states. (first output of get_realisations_symbol from data_spiketimes object)
past_symbol_arraynumpy array: realisations of symbols based on current and past states. (first output of get_realisations_symbol from data_spiketimes object)
current_symbol_arraynumpy array: realisations of symbols based on current and past states. (first output of get_realisations_symbol from data_spiketimes object)

Returns:

I (float): MI (AIS)
R (float): MI / H_uncond (History dependence)

nsb_estimator(symbol_counts, past_symbol_counts, alphabet_size, alphabet_size_past, H_uncond)[source]¶: Estimate the entropy of a system using the NSB estimator.

class idtxl.estimators_Rudelt.RudeltPluginEstimatorSymbolsMI(settings=None)[source]¶

Bases: idtxl.estimators_Rudelt.RudeltAbstractEstimator

Plugin History dependence estimator

Calculate the mutual information (MI) of one variable depending on its past using plugin estimator. See parent class for references.

implemented in idtxl by Michael Lindner, Göttingen 2021

Args:

settingsdict

embedding_step_sizefloat [optional] - Step size delta t (in seconds) with which the window is slid
through the data (default = 0.005).
normalise : bool [optional] - rebase spike times to zero (default=True)
return_averaged_R : bool [optional] - rebase spike times to zero (default=True)

estimate(symbol_array, past_symbol_array, current_symbol_array)[source]¶

Estimate mutual information using plugin estimator.

Args:

symbol_array1D numpy array: realisations of symbols based on current and past states. (first output of get_realisations_symbol from data_spiketimes object)
past_symbol_arraynumpy array: realisations of symbols based on current and past states. (first output of get_realisations_symbol from data_spiketimes object)
current_symbol_arraynumpy array: realisations of symbols based on current and past states. (first output of get_realisations_symbol from data_spiketimes object)

Returns:

I (float): MI (AIS)
R (float): MI / H_uncond (History dependence)

plugin_entropy(mk, N)[source]¶

Estimate the entropy of a system using the Plugin estimator.

(In principle this is the same function as utl.get_shannon_entropy, only here it is a function of the multiplicities, not the probabilities.)

Parameters

mk – multiplicities
N – total number of observed symbols

plugin_estimator(symbol_counts, past_symbol_counts, alphabet_size, alphabet_size_past, H_uncond)[source]¶: Estimate the entropy of a system using the BBC estimator.

class idtxl.estimators_Rudelt.RudeltShufflingEstimator(settings=None)[source]¶

Bases: idtxl.estimators_Rudelt.RudeltAbstractEstimator

Estimate the history dependence in a spike train using the shuffling estimator.

See parent class for references.

implemented in idtxl by Michael Lindner, Göttingen 2021

estimate(symbol_array)[source]¶

Estimate the history dependence in a spike train using the shuffling estimator.

Args:

symbol_array1D numpy array
realisations of symbols based on current and past states. (first output of get_realisations_symbol from data_spiketimes object)

Returns:

I (float): MI (AIS)
R (float): MI / H_uncond (History dependence)

get_H0_X_past_cond_X(marginal_probabilities, number_of_bins_d, P_X_uncond)[source]¶: Compute H_0(X_past | X), the estimate of the entropy for the past symbols given a response, under the assumption that activity in the past contributes independently towards the response.

get_H0_X_past_cond_X_eq_x(marginal_probabilities, number_of_bins_d)[source]¶: Compute H_0(X_past | X = x), cf get_H0_X_past_cond_X.

get_H_X_past_cond_X(P_X_uncond, P_X_past_cond_X)[source]¶: Compute H(X_past | X), the plug-in estimate of the conditional entropy for the past symbols, conditioned on the response X, given their probabilities.

get_H_X_past_uncond(P_X_past_uncond)[source]¶: Compute H(X_past), the plug-in estimate of the entropy for the past symbols, given their probabilities.

get_P_X_past_cond_X(past_symbol_counts, number_of_symbols)[source]¶: Compute P(X_past | X), the probability of the past activity conditioned on the response X using the plug-in estimator.

get_P_X_past_uncond(past_symbol_counts, number_of_symbols)[source]¶: Compute P(X_past), the probability of the past activity using the plug-in estimator.

get_P_X_uncond(number_of_symbols)[source]¶: Compute P(X), the probability of the current activity using the plug-in estimator.

get_marginal_frequencies_of_spikes_in_bins(symbol_counts, number_of_bins_d)[source]¶: Compute for each past bin 1…d the sum of spikes found in that bin across all observed symbols.

get_shuffled_symbol_counts(symbol_counts, past_symbol_counts, number_of_bins_d, number_of_symbols)[source]¶: Simulate new data by, for each past bin 1…d, permutating the activity across all observed past_symbols (for a given response X). The marginal probability of observing a spike given the response is thus preserved for each past bin.

shuffling_MI(symbol_counts, number_of_bins_d)[source]¶

Estimate the mutual information between current and past activity in a spike train using the shuffling estimator.

To obtain the shuffling estimate, compute the plug-in estimate and a correction term to reduce its bias.

For the plug-in estimate:

Extract the past_symbol_counts from the symbol_counts.
I_plugin = H(X_past) - H(X_past | X)

Notation:

X: current activity, aka response
X_past: past activity
P_X_uncond: P(X)
P_X_past_uncond: P(X_past)
P_X_past_cond_X: P(X_past | X)
H_X_past_uncond: H(X_past)
H_X_past_cond_X: H(X_past | X)
I_plugin: plugin estimate of I(X_past; X)

For the correction term:

Simulate additional data under the assumption that activity
in the past contributes independently towards the current activity.
Compute the entropy under the assumptions of the model, which
due to its simplicity is easy to sample and the estimate unbiased
Compute the entropy using the plug-in estimate, whose bias is
similar to that of the plug-in estimate on the original data
Compute the correction term as the difference between the
unbiased and biased terms

Notation:

P0_sh_X_past_cond_X: P_0,sh(X_past | X), equiv. to P(X_past | X) on the shuffled data
H0_X_past_cond_X: H_0(X_past | X), based on the model of independent contributions
H0_sh_X_past_cond_X: H_0,sh(X_past | X), based on
P0_sh_X_past_cond_X, ie the plug-in estimate
I_corr: the correction term to reduce the bias of I_plugin

Args:

symbol_countsiterable: the activity of a spike train is embedded into symbols, whose occurrences are counted (cf emb.get_symbol_counts)
number_of_bins_dint: the number of bins of the embedding

idtxl.estimators_jidt module¶

Provide JIDT estimators.

class idtxl.estimators_jidt.JidtDiscrete(settings)[source]¶

Bases: idtxl.estimators_jidt.JidtEstimator

Abstract class for implementation of discrete JIDT-estimators.

Abstract class for implementation of plug-in JIDT-estimators for discrete data. Child classes implement estimators for mutual information (MI), conditional mutual information (CMI), actice information storage (AIS), and transfer entropy (TE). See parent class for references.

Set common estimation parameters for discrete JIDT-estimators. For usage of these estimators see documentation for the child classes.

Args:

settingsdict [optional]

set estimator parameters:

debug : bool [optional] - return debug information when calling JIDT (default=False)
local_values : bool [optional] - return local TE instead of average TE (default=False)
discretise_method : str [optional] - if and how to discretise incoming continuous data, can be ‘max_ent’ for maximum entropy binning, ‘equal’ for equal size bins, and ‘none’ if no binning is required (default=’none’)

Note:

Discrete JIDT estimators require the data’s alphabet size for instantiation. Hence, opposed to the Kraskov and Gaussian estimators, the JAVA class is added to the object instance, while for Kraskov/ Gaussian estimators an instance of that class is added (because for the latter, objects can be instantiated independent of data properties).

estimate_surrogates_analytic(n_perm=200, **data)[source]¶

Return estimate of the analytical surrogate distribution.

This method must be implemented because this class’ is_analytic_null_estimator() method returns true.

Args:

n_permsint [optional]: number of permutations (default=200)
datanumpy arrays: realisations of random variables required for the calculation (varies between estimators, e.g. 2 variables for MI, 3 for CMI). Formatted as per the estimate method for this estimator.

Returns:

float | numpy array: n_perm surrogates of the average MI/CMI/TE over all samples under the null hypothesis of no relationship between var1 and var2 (in the context of conditional)

abstract get_analytic_distribution(**data)[source]¶

Return a JIDT AnalyticNullDistribution object.

Required so that our estimate_surrogates_analytic method can use the common_estimate_surrogates_analytic() method, where data is formatted as per the estimate method for this estimator.

Args:

datanumpy arrays: realisations of random variables required for the calculation (varies between estimators, e.g. 2 variables for MI, 3 for CMI). Formatted as per the estimate method for this estimator.

Returns:

Java object: JIDT calculator that was used here

is_analytic_null_estimator()[source]¶

Indicate if estimator supports analytic surrogates.

Return true if the estimator implements estimate_surrogates_analytic() where data is formatted as per the estimate method for this estimator.

Returns:: bool

class idtxl.estimators_jidt.JidtDiscreteAIS(settings)[source]¶

Bases: idtxl.estimators_jidt.JidtDiscrete

Calculate AIS with JIDT’s discrete-variable implementation.

Calculate the active information storage (AIS) for one process. Call JIDT via jpype and use the discrete estimator. See parent class for references.

Results are returned in bits.

Args:

settingsdict

set estimator parameters:

history : int - number of samples in the target’s past used as embedding (>= 0)
debug : bool [optional] - return debug information when calling JIDT (default=False)
local_values : bool [optional] - return local TE instead of average TE (default=False)
discretise_method : str [optional] - if and how to discretise incoming continuous data, can be ‘max_ent’ for maximum entropy binning, ‘equal’ for equal size bins, and ‘none’ if no binning is required (default=’none’)
n_discrete_bins : int [optional] - number of discrete bins/ levels or the base of each dimension of the discrete variables (default=2). If set, this parameter overwrites/sets alph. (>= 2)
alph : int [optional] - number of discrete bins/levels for var1 (default=2 , or the value set for n_discrete_bins). (>= 2)

estimate(process, return_calc=False)[source]¶

Estimate active information storage.

Args:

processnumpy array: realisations as either a 2D numpy array where array dimensions represent [realisations x variable dimension] or a 1D array representing [realisations], array type can be float (requires discretisation) or int
return_calcboolean: return the calculator used here as well as the numeric calculated value(s)

Returns:

float | numpy array: average AIS over all samples or local AIS for individual samples if ‘local_values’=True
Java object: JIDT calculator that was used here. Only returned if return_calc was set.

Raises:

ex.JidtOutOfMemoryError: Raised when JIDT object cannot be instantiated due to mem error

get_analytic_distribution(process)[source]¶

Return a JIDT AnalyticNullDistribution object.

Required so that our estimate_surrogates_analytic method can use the common_estimate_surrogates_analytic() method, where data is formatted as per the estimate method for this estimator.

Args:

processnumpy array: realisations as either a 2D numpy array where array dimensions represent [realisations x variable dimension] or a 1D array representing [realisations], array type can be float (requires discretisation) or int

Returns:

Java object: JIDT calculator that was used here

class idtxl.estimators_jidt.JidtDiscreteCMI(settings=None)[source]¶

Bases: idtxl.estimators_jidt.JidtDiscrete

Calculate CMI with JIDT’s implementation for discrete variables.

Calculate the conditional mutual information between two variables given the third. Call JIDT via jpype and use the discrete estimator. See parent class for references.

Results are returned in bits.

Args:

settingsdict [optional]

sets estimation parameters:

debug : bool [optional] - return debug information when calling JIDT (default=False)
local_values : bool [optional] - return local TE instead of average TE (default=False)
discretise_method : str [optional] - if and how to discretise incoming continuous data, can be ‘max_ent’ for maximum entropy binning, ‘equal’ for equal size bins, and ‘none’ if no binning is required (default=’none’)
n_discrete_bins : int [optional] - number of discrete bins/ levels or the base of each dimension of the discrete variables (default=2). If set, this parameter overwrites/sets alph1, alph2 and alphc
alph1 : int [optional] - number of discrete bins/levels for var1 (default=2, or the value set for n_discrete_bins)
alph2 : int [optional] - number of discrete bins/levels for var2 (default=2, or the value set for n_discrete_bins)
alphc : int [optional] - number of discrete bins/levels for conditional (default=2, or the value set for n_discrete_bins)

estimate(var1, var2, conditional=None, return_calc=False)[source]¶

Estimate conditional mutual information.

Args:

var1numpy array: realisations of first variable, either a 2D numpy array where array dimensions represent [realisations x variable dimension] or a 1D array representing [realisations], array type can be float (requires discretisation) or int
var2numpy array: realisations of the second variable (similar to var1)
conditionalnumpy array [optional]: realisations of the conditioning variable (similar to var), if no conditional is provided, return MI between var1 and var2
return_calcboolean: return the calculator used here as well as the numeric calculated value(s)

Returns:

float | numpy array: average CMI over all samples or local CMI for individual samples if ‘local_values’=True
Java object: JIDT calculator that was used here. Only returned if return_calc was set.

Raises:

ex.JidtOutOfMemoryError: Raised when JIDT object cannot be instantiated due to mem error

get_analytic_distribution(var1, var2, conditional=None)[source]¶

Return a JIDT AnalyticNullDistribution object.

Required so that our estimate_surrogates_analytic method can use the common_estimate_surrogates_analytic() method, where data is formatted as per the estimate method for this estimator.

Args:

var1numpy array: realisations of first variable, either a 2D numpy array where array dimensions represent [realisations x variable dimension] or a 1D array representing [realisations], array type can be float (requires discretisation) or int
var2numpy array: realisations of the second variable (similar to var1)
conditionalnumpy array [optional]: realisations of the conditioning variable (similar to var), if no conditional is provided, return MI between var1 and var2

Returns:

Java object: JIDT calculator that was used here

class idtxl.estimators_jidt.JidtDiscreteMI(settings=None)[source]¶

Bases: idtxl.estimators_jidt.JidtDiscrete

Calculate MI with JIDT’s discrete-variable implementation.

Calculate the mutual information (MI) between two variables. Call JIDT via jpype and use the discrete estimator. See parent class for references.

Results are returned in bits.

Args:

settingsdict [optional]

sets estimation parameters:

debug : bool [optional] - return debug information when calling JIDT (default=False)
local_values : bool [optional] - return local TE instead of average TE (default=False)
discretise_method : str [optional] - if and how to discretise incoming continuous data, can be ‘max_ent’ for maximum entropy binning, ‘equal’ for equal size bins, and ‘none’ if no binning is required (default=’none’)
n_discrete_bins : int [optional] - number of discrete bins/ levels or the base of each dimension of the discrete variables (default=2). If set, this parameter overwrites/sets alph1 and alph2
alph1 : int [optional] - number of discrete bins/levels for var1 (default=2, or the value set for n_discrete_bins)
alph2 : int [optional] - number of discrete bins/levels for var2 (default=2, or the value set for n_discrete_bins)
lag_mi : int [optional] - time difference in samples to calculate the lagged MI between processes (default=0)

estimate(var1, var2, return_calc=False)[source]¶

Estimate mutual information.

Args:

var1numpy array: realisations of first variable, either a 2D numpy array where array dimensions represent [realisations x variable dimension] or a 1D array representing [realisations], array type can be float (requires discretisation) or int
var2numpy array: realisations of the second variable (similar to var1)
return_calcboolean: return the calculator used here as well as the numeric calculated value(s)

Returns:

float | numpy array: average MI over all samples or local MI for individual samples if ‘local_values’=True
Java object: JIDT calculator that was used here. Only returned if return_calc was set.

Raises:

ex.JidtOutOfMemoryError: Raised when JIDT object cannot be instantiated due to mem error

get_analytic_distribution(var1, var2)[source]¶

Return a JIDT AnalyticNullDistribution object.

Required so that our estimate_surrogates_analytic method can use the common_estimate_surrogates_analytic() method, where data is formatted as per the estimate method for this estimator.

Args:

var1numpy array: realisations of first variable, either a 2D numpy array where array dimensions represent [realisations x variable dimension] or a 1D array representing [realisations], array type can be float (requires discretisation) or int
var2numpy array: realisations of the second variable (similar to var1)

Returns:

Java object: JIDT calculator that was used here

class idtxl.estimators_jidt.JidtDiscreteTE(settings)[source]¶

Bases: idtxl.estimators_jidt.JidtDiscrete

Calculate TE with JIDT’s implementation for discrete variables.

Calculate the transfer entropy between two time series processes. Call JIDT via jpype and use the discrete estimator. Transfer entropy is defined as the conditional mutual information between the source’s past state and the target’s current value, conditional on the target’s past. See parent class for references.

Results are returned in bits.

Args:

settingsdict

sets estimation parameters:

history_target : int - number of samples in the target’s past used as embedding. (>= 0)
history_source : int [optional] - number of samples in the source’s past used as embedding (default=same as the target history). (>= 1)
tau_source : int [optional] - source’s embedding delay (default=1). (>= 1)
tau_target : int [optional] - target’s embedding delay (default=1). (>= 1)
source_target_delay : int [optional] - information transfer delay between source and target (default=1) (>= 0)
discretise_method : str [optional] - if and how to discretise incoming continuous data, can be ‘max_ent’ for maximum entropy binning, ‘equal’ for equal size bins, and ‘none’ if no binning is required (default=’none’)
n_discrete_bins : int [optional] - number of discrete bins/ levels or the base of each dimension of the discrete variables (default=2). If set, this parameter overwrites/sets alph1 and alph2. (>= 2)
alph1 : int [optional] - number of discrete bins/levels for source (default=2, or the value set for n_discrete_bins). (>= 2)
alph2 : int [optional] - number of discrete bins/levels for target (default=2, or the value set for n_discrete_bins). (>= 2)
debug : bool [optional] - return debug information when calling JIDT (default=False)
local_values : bool [optional] - return local TE instead of average TE (default=False)

estimate(source, target, return_calc=False)[source]¶

Estimate transfer entropy from a source to a target variable.

Args:

sourcenumpy array: realisations of source variable, either a 2D numpy array where array dimensions represent [realisations x variable dimension] or a 1D array representing [realisations], array type can be float (requires discretisation) or int
targetnumpy array: realisations of target variable (similar to var1)
return_calcboolean: return the calculator used here as well as the numeric calculated value(s)

Returns:

float | numpy array: average TE over all samples or local TE for individual samples if ‘local_values’=True
Java object: JIDT calculator that was used here. Only returned if return_calc was set.

Raises:

ex.JidtOutOfMemoryError: Raised when JIDT object cannot be instantiated due to mem error

get_analytic_distribution(source, target)[source]¶

Return a JIDT AnalyticNullDistribution object.

Required so that our estimate_surrogates_analytic method can use the common_estimate_surrogates_analytic() method, where data is formatted as per the estimate method for this estimator.

Args:

sourcenumpy array: realisations of source variable, either a 2D numpy array where array dimensions represent [realisations x variable dimension] or a 1D array representing [realisations], array type can be float (requires discretisation) or int
targetnumpy array: realisations of target variable (similar to var1)

Returns:

Java object: JIDT calculator that was used here

class idtxl.estimators_jidt.JidtEstimator(settings=None)[source]¶

Bases: idtxl.estimator.Estimator

Abstract class for implementation of JIDT estimators.

Abstract class for implementation of JIDT estimators, child classes implement estimators for mutual information (MI), conditional mutual information (CMI), active information storage (AIS), transfer entropy (TE) using the Kraskov-Grassberger-Stoegbauer estimator for continuous data, plug-in estimators for discrete data, and Gaussian estimators for continuous Gaussian data.

References:

Lizier, Joseph T. (2014). JIDT: an information-theoretic toolkit for studying the dynamics of complex systems. Front Robot AI, 1(11).
Kraskov, A., Stoegbauer, H., & Grassberger, P. (2004). Estimating mutual information. Phys Rev E, 69(6), 066138.
Lizier, Joseph T., Mikhail Prokopenko, and Albert Y. Zomaya. (2012). Local measures of information storage in complex distributed computation. Inform Sci, 208, 39-54.
Schreiber, T. (2000). Measuring information transfer. Phys Rev Lett, 85(2), 461.

Set common estimation parameters for JIDT estimators. For usage of these estimators see documentation for the child classes.

Args:

settingsdict [optional]

set estimator parameters:

debug : bool [optional] - return debug information when calling JIDT (default=False)
local_values : bool [optional] - return local TE instead of average TE (default=False)

is_parallel()[source]¶

Indicate if estimator supports parallel estimation over chunks.

Return true if the supports parallel estimation over chunks, where a chunk is one independent data set.

Returns:: bool

class idtxl.estimators_jidt.JidtGaussian(CalcClass, settings)[source]¶

Bases: idtxl.estimators_jidt.JidtEstimator

Abstract class for implementation of JIDT Gaussian-estimators.

Abstract class for implementation of JIDT Gaussian-estimators, child classes implement estimators for mutual information (MI), conditional mutual information (CMI), actice information storage (AIS), transfer entropy (TE) using JIDT’s Gaussian estimator for continuous data. See parent class for references.

Set common estimation parameters for JIDT Kraskov-estimators. For usage of these estimators see documentation for the child classes.

Results are returned in nats.

Args:

CalcClassJAVA class

JAVA class returned by jpype.JPackage

settingsdict [optional]

set estimator parameters:

debug : bool [optional] - return debug information when calling JIDT (default=False)
local_values : bool [optional] - return local TE instead of average TE (default=False)

estimate_surrogates_analytic(n_perm=200, **data)[source]¶

Estimate the surrogate distribution analytically. This method must be implemented because this class’ is_analytic_null_estimator() method returns true

Args:

n_permsint: number of permutations (default=200)
datanumpy arrays: realisations of random variables required for the calculation (varies between estimators, e.g. 2 variables for MI, 3 for CMI). Formatted as per estimate_parallel for this estimator.

Returns:

float | numpy array: n_perm surrogates of the average MI/CMI/TE over all samples under the null hypothesis of no relationship between var1 and var2 (in the context of conditional)

get_analytic_distribution(**data)[source]¶

Return a JIDT AnalyticNullDistribution object.

Required so that our estimate_surrogates_analytic method can use the common_estimate_surrogates_analytic() method, where data is formatted as per the estimate method for this estimator.

Args:

datanumpy arrays: realisations of random variables required for the calculation (varies between estimators, e.g. 2 variables for MI, 3 for CMI). Formatted as per the estimate method for this estimator.

Returns:

Java object: JIDT calculator that was used here

is_analytic_null_estimator()[source]¶

Indicate if estimator supports analytic surrogates.

Return true if the estimator implements estimate_surrogates_analytic() where data is formatted as per the estimate method for this estimator.

Returns:: bool

class idtxl.estimators_jidt.JidtGaussianAIS(settings)[source]¶

Bases: idtxl.estimators_jidt.JidtGaussian

Calculate active information storage with JIDT’s Gaussian implementation.

Calculate active information storage (AIS) for some process using JIDT’s implementation of the Gaussian estimator. AIS is defined as the mutual information between the processes’ past state and current value.

The past state needs to be defined in the settings dictionary, where a past state is defined as a uniform embedding with parameters history and tau. The history describes the number of samples taken from a processes’ past, tau describes the embedding delay, i.e., the spacing between every two samples from the processes’ past.

See parent class for references.Results are returned in nats.

Args:

settingsdict

sets estimation parameters:

history : int - number of samples in the processes’ past used as embedding
tau : int [optional] - the processes’ embedding delay (default=1)
debug : bool [optional] - return debug information when calling JIDT (default=False)
local_values : bool [optional] - return local TE instead of average TE (default=False)

Note:

Some technical details: JIDT normalises over realisations, IDTxl normalises over raw data once, outside the AIS estimator to save computation time. The Theiler window ignores trial boundaries. The AIS estimator does add noise to the data as a default. To make analysis runs replicable set noise_level to 0.

estimate(process)[source]¶

Estimate active information storage.

Args:

processnumpy array: realisations of first variable, either a 2D numpy array where array dimensions represent [realisations x variable dimension] or a 1D array representing [realisations]

Returns:

float | numpy array: average AIS over all samples or local AIS for individual samples if ‘local_values’=True

class idtxl.estimators_jidt.JidtGaussianCMI(settings=None)[source]¶

Bases: idtxl.estimators_jidt.JidtGaussian

Calculate conditional mutual infor with JIDT’s Gaussian implementation.

Computes the differential conditional mutual information of two multivariate sets of observations, conditioned on another, assuming that the probability distribution function for these observations is a multivariate Gaussian distribution. Call JIDT via jpype and use ConditionalMutualInfoCalculatorMultiVariateGaussian estimator. If no conditional is given (is None), the function returns the mutual information between var1 and var2.

See parent class for references. Results are returned in nats.

Args:

settingsdict [optional]

sets estimation parameters:

debug : bool [optional] - return debug information when calling JIDT (default=False)
local_values : bool [optional] - return local TE instead of average TE (default=False)

Note:

Some technical details: JIDT normalises over realisations, IDTxl normalises over raw data once, outside the CMI estimator to save computation time. The Theiler window ignores trial boundaries. The CMI estimator does add noise to the data as a default. To make analysis runs replicable set noise_level to 0.

estimate(var1, var2, conditional=None)[source]¶

Estimate conditional mutual information.

Args:

var1numpy array: realisations of first variable, either a 2D numpy array where array dimensions represent [realisations x variable dimension] or a 1D array representing [realisations]
var2numpy array: realisations of the second variable (similar to var1)
conditionalnumpy array [optional]: realisations of the conditioning variable (similar to var), if no conditional is provided, return MI between var1 and var2

Returns:

float | numpy array: average CMI over all samples or local CMI for individual samples if ‘local_values’=True

get_analytic_distribution(var1, var2, conditional=None)[source]¶

Return a JIDT AnalyticNullDistribution object.

Required so that our estimate_surrogates_analytic method can use the common_estimate_surrogates_analytic() method, where data is formatted as per the estimate method for this estimator.

Args:

var1numpy array: realisations of first variable, either a 2D numpy array where array dimensions represent [realisations x variable dimension] or a 1D array representing [realisations]
var2numpy array: realisations of the second variable (similar to var1)
conditionalnumpy array [optional]: realisations of the conditioning variable (similar to var), if no conditional is provided, return MI between var1 and var2

Returns:

Java object: JIDT calculator that was used here

class idtxl.estimators_jidt.JidtGaussianMI(settings=None)[source]¶

Bases: idtxl.estimators_jidt.JidtGaussian

Calculate mutual information with JIDT’s Gaussian implementation.

Calculate the mutual information between two variables. Call JIDT via jpype and use the Gaussian estimator. See parent class for references.

Results are returned in nats.

Args:

settingsdict [optional]

sets estimation parameters:

debug : bool [optional] - return debug information when calling JIDT (default=False)
local_values : bool [optional] - return local TE instead of average TE (default=False)
lag_mi : int [optional] - time difference in samples to calculate the lagged MI between processes (default=0)

Note:

Some technical details: JIDT normalises over realisations, IDTxl normalises over raw data once, outside the MI estimator to save computation time. The Theiler window ignores trial boundaries. The MI estimator does add noise to the data as a default. To make analysis runs replicable set noise_level to 0.

estimate(var1, var2)[source]¶

Estimate mutual information.

Args:

var1numpy array: realisations of first variable, either a 2D numpy array where array dimensions represent [realisations x variable dimension] or a 1D array representing [realisations]
var2numpy array: realisations of the second variable (similar to var1)

Returns:

float | numpy array: average MI over all samples or local MI for individual samples if ‘local_values’=True

class idtxl.estimators_jidt.JidtGaussianTE(settings)[source]¶

Bases: idtxl.estimators_jidt.JidtGaussian

Calculate transfer entropy with JIDT’s Gaussian implementation.

Calculate transfer entropy between a source and a target variable using JIDT’s implementation of the Gaussian estimator. Transfer entropy is defined as the conditional mutual information between the source’s past state and the target’s current value, conditional on the target’s past.

Past states need to be defined in the settings dictionary, where a past state is defined as a uniform embedding with parameters history and tau. The history describes the number of samples taken from a variable’s past, tau descrices the embedding delay, i.e., the spacing between every two samples from the processes’ past.

See parent class for references. Results are returned in nats.

Args:

settingsdict

sets estimation parameters:

history_target : int - number of samples in the target’s past used as embedding
history_source : int [optional] - number of samples in the source’s past used as embedding (default=same as the target history)
tau_source : int [optional] - source’s embedding delay (default=1)
tau_target : int [optional] - target’s embedding delay (default=1)
source_target_delay : int [optional] - information transfer delay between source and target (default=1)
debug : bool [optional] - return debug information when calling JIDT (default=False)
local_values : bool [optional] - return local TE instead of average TE (default=False)

Note:

Some technical details: JIDT normalises over realisations, IDTxl normalises over raw data once, outside the CMI estimator to save computation time. The Theiler window ignores trial boundaries. The CMI estimator does add noise to the data as a default. To make analysis runs replicable set noise_level to 0.

estimate(source, target)[source]¶

Estimate transfer entropy from a source to a target variable.

Args:

sourcenumpy array: realisations of source variable, either a 2D numpy array where array dimensions represent [realisations x variable dimension] or a 1D array representing [realisations]
var2numpy array: realisations of target variable (similar to var1)

Returns:

float | numpy array: average TE over all samples or local TE for individual samples if ‘local_values’=True

class idtxl.estimators_jidt.JidtKraskov(CalcClass, settings=None)[source]¶

Bases: idtxl.estimators_jidt.JidtEstimator

Abstract class for implementation of JIDT Kraskov-estimators.

Abstract class for implementation of JIDT Kraskov-estimators, child classes implement estimators for mutual information (MI), conditional mutual information (CMI), actice information storage (AIS), transfer entropy (TE) using the Kraskov-Grassberger-Stoegbauer estimator for continuous data. See parent class for references.

Set common estimation parameters for JIDT Kraskov-estimators. For usage of these estimators see documentation for the child classes.

Args:

CalcClassJAVA class

JAVA class returned by jpype.JPackage

settingsdict [optional]

set estimator parameters:

debug : bool [optional] - return debug information when calling JIDT (default=False)
local_values : bool [optional] - return local TE instead of average TE (default=False)
kraskov_k : int [optional] - no. nearest neighbours for KNN search (default=4)
normalise : bool [optional] - z-standardise data (default=False)
theiler_t : int [optional] - no. next temporal neighbours ignored in KNN and range searches (default=0)
noise_level : float [optional] - random noise added to the data (default=1e-8)
num_threads : int | str [optional] - number of threads used for estimation (default=’USE_ALL’, note that this uses all available threads on the current machine)
algorithm_num : int [optional] - which Kraskov algorithm (1 or 2) to use (default=1). Only applied at this method for TE and AIS (is already applied for MI/CMI). Note that default algorithm of 1 here is different to the default ALG_NUM argument for the JIDT AIS KSG estimator.

is_analytic_null_estimator()[source]¶

Indicate if estimator supports analytic surrogates.

Return true if the estimator implements estimate_surrogates_analytic() where data is formatted as per the estimate method for this estimator.

Returns:: bool

class idtxl.estimators_jidt.JidtKraskovAIS(settings)[source]¶

Bases: idtxl.estimators_jidt.JidtKraskov

Calculate active information storage with JIDT’s Kraskov implementation.

Calculate active information storage (AIS) for some process using JIDT’s implementation of the Kraskov type 1 estimator. AIS is defined as the mutual information between the processes’ past state and current value.

The past state needs to be defined in the settings dictionary, where a past state is defined as a uniform embedding with parameters history and tau. The history describes the number of samples taken from a processes’ past, tau describes the embedding delay, i.e., the spacing between every two samples from the processes’ past.

See parent class for references. Results are returned in nats.

Args:

settingsdict

sets estimation parameters:

history : int - number of samples in the processes’ past used as embedding
tau : int [optional] - the processes’ embedding delay (default=1)
debug : bool [optional] - return debug information when calling JIDT (default=False)
local_values : bool [optional] - return local TE instead of average TE (default=False)
kraskov_k : int [optional] - no. nearest neighbours for KNN search (default=4)
normalise : bool [optional] - z-standardise data (default=False)
theiler_t : int [optional] - no. next temporal neighbours ignored in KNN and range searches (default=0)
noise_level : float [optional] - random noise added to the data (default=1e-8)
num_threads : int | str [optional] - number of threads used for estimation (default=’USE_ALL’, note that this uses all available threads on the current machine)
algorithm_num : int [optional] - which Kraskov algorithm (1 or 2) to use (default=1)

Note:

Some technical details: JIDT normalises over realisations, IDTxl normalises over raw data once, outside the AIS estimator to save computation time. The Theiler window ignores trial boundaries. The AIS estimator does add noise to the data as a default. To make analysis runs replicable set noise_level to 0.

estimate(process)[source]¶

Estimate active information storage.

Args:

processnumpy array: realisations of first variable, either a 2D numpy array where array dimensions represent [realisations x variable dimension] or a 1D array representing [realisations]

Returns:

float | numpy array: average AIS over all samples or local AIS for individual samples if ‘local_values’=True

class idtxl.estimators_jidt.JidtKraskovCMI(settings=None)[source]¶

Bases: idtxl.estimators_jidt.JidtKraskov

Calculate conditional mutual inform with JIDT’s Kraskov implementation.

Calculate the conditional mutual information (CMI) between three variables. Call JIDT via jpype and use the Kraskov 1 estimator. If no conditional is given (is None), the function returns the mutual information between var1 and var2. See parent class for references.

Results are returned in nats.

Args:

settingsdict [optional]

set estimator parameters:

debug : bool [optional] - return debug information when calling JIDT (default=False)
local_values : bool [optional] - return local TE instead of average TE (default=False)
kraskov_k : int [optional] - no. nearest neighbours for KNN search (default=4)
normalise : bool [optional] - z-standardise data (default=False)
theiler_t : int [optional] - no. next temporal neighbours ignored in KNN and range searches (default=0)
noise_level : float [optional] - random noise added to the data (default=1e-8)
num_threads : int | str [optional] - number of threads used for estimation (default=’USE_ALL’, note that this uses all available threads on the current machine)
algorithm_num : int [optional] - which Kraskov algorithm (1 or 2) to use (default=1)

Note:

Some technical details: JIDT normalises over realisations, IDTxl normalises over raw data once, outside the CMI estimator to save computation time. The Theiler window ignores trial boundaries. The CMI estimator does add noise to the data as a default. To make analysis runs replicable set noise_level to 0.

estimate(var1, var2, conditional=None)[source]¶

Estimate conditional mutual information.

Args:

var1numpy array: realisations of first variable, either a 2D numpy array where array dimensions represent [realisations x variable dimension] or a 1D array representing [realisations]
var2numpy array: realisations of the second variable (similar to var1)
conditionalnumpy array [optional]: realisations of the conditioning variable (similar to var), if no conditional is provided, return MI between var1 and var2

Returns:

float | numpy array: average CMI over all samples or local CMI for individual samples if ‘local_values’=True

class idtxl.estimators_jidt.JidtKraskovMI(settings=None)[source]¶

Bases: idtxl.estimators_jidt.JidtKraskov

Calculate mutual information with JIDT’s Kraskov implementation.

Calculate the mutual information between two variables. Call JIDT via jpype and use the Kraskov 1 estimator. See parent class for references.

Results are returned in nats.

Args:

settingsdict [optional]

sets estimation parameters:

debug : bool [optional] - return debug information when calling JIDT (default=False)
local_values : bool [optional] - return local TE instead of average TE (default=False)
kraskov_k : int [optional] - no. nearest neighbours for KNN search (default=4)
normalise : bool [optional] - z-standardise data (default=False)
theiler_t : int [optional] - no. next temporal neighbours ignored in KNN and range searches (default=0)
noise_level : float [optional] - random noise added to the data (default=1e-8)
num_threads : int | str [optional] - number of threads used for estimation (default=’USE_ALL’, note that this uses all available threads on the current machine)
algorithm_num : int [optional] - which Kraskov algorithm (1 or 2) to use (default=1)
lag_mi : int [optional] - time difference in samples to calculate the lagged MI between processes (default=0)

Note:

Some technical details: JIDT normalises over realisations, IDTxl normalises over raw data once, outside the MI estimator to save computation time. The Theiler window ignores trial boundaries. The MI estimator does add noise to the data as a default. To make analysis runs replicable set noise_level to 0.

estimate(var1, var2)[source]¶

Estimate mutual information.

Args:

var1numpy array: realisations of first variable, either a 2D numpy array where array dimensions represent [realisations x variable dimension] or a 1D array representing [realisations]
var2numpy array: realisations of the second variable (similar to var1)

Returns:

float | numpy array: average MI over all samples or local MI for individual samples if ‘local_values’=True

class idtxl.estimators_jidt.JidtKraskovTE(settings)[source]¶

Bases: idtxl.estimators_jidt.JidtKraskov

Calculate transfer entropy with JIDT’s Kraskov implementation.

Calculate transfer entropy between a source and a target variable using JIDT’s implementation of the Kraskov type 1 estimator. Transfer entropy is defined as the conditional mutual information between the source’s past state and the target’s current value, conditional on the target’s past.

Past states need to be defined in the settings dictionary, where a past state is defined as a uniform embedding with parameters history and tau. The history describes the number of samples taken from a variable’s past, tau descrices the embedding delay, i.e., the spacing between every two samples from the processes’ past.

See parent class for references. Results are returned in nats.

Args:

settingsdict

sets estimation parameters:

history_target : int - number of samples in the target’s past used as embedding
history_source : int [optional] - number of samples in the source’s past used as embedding (default=same as the target history)
tau_source : int [optional] - source’s embedding delay (default=1)
tau_target : int [optional] - target’s embedding delay (default=1)
source_target_delay : int [optional] - information transfer delay between source and target (default=1)
debug : bool [optional] - return debug information when calling JIDT (default=False)
local_values : bool [optional] - return local TE instead of average TE (default=False)
algorithm_num : int [optional] - which Kraskov algorithm (1 or 2) to use (default=1)

Note:

Some technical details: JIDT normalises over realisations, IDTxl normalises over raw data once, outside the CMI estimator to save computation time. The Theiler window ignores trial boundaries. The CMI estimator does add noise to the data as a default. To make analysis runs replicable set noise_level to 0.

estimate(source, target)[source]¶

Estimate transfer entropy from a source to a target variable.

Args:

sourcenumpy array: realisations of source variable, either a 2D numpy array where array dimensions represent [realisations x variable dimension] or a 1D array representing [realisations]
var2numpy array: realisations of target variable (similar to var1)

Returns:

float | numpy array: average TE over all samples or local TE for individual samples if ‘local_values’=True

idtxl.estimators_jidt.common_estimate_surrogates_analytic(estimator, n_perm=200, **data)[source]¶

Estimate the surrogate distribution analytically for JidtEstimator.

Estimate the surrogate distribution analytically for a JidtEstimator which is_analytic_null_estimator(), by sampling estimates at random p-values in the analytic distribution.

Args:

estimatora JidtEstimator object, which returns True to a call to: its is_analytic_null_estimator() method
n_permsint: number of permutations (default=200)
datanumpy arrays: realisations of random variables required for the calculation (varies between estimators, e.g. 2 variables for MI, 3 for CMI)

Returns:

float | numpy array: n_perm surrogates of the average MI/CMI/TE over all samples under the null hypothesis of no relationship between var1 and var2 (in the context of conditional)

idtxl.estimators_opencl module¶

class idtxl.estimators_opencl.OpenCLKraskov(settings=None)[source]¶

Bases: idtxl.estimator.Estimator

Abstract class for implementation of OpenCL estimators.

Abstract class for implementation of OpenCL estimators, child classes implement estimators for mutual information (MI) and conditional mutual information (CMI) using the Kraskov-Grassberger-Stoegbauer estimator for continuous data.

References:

Kraskov, A., Stoegbauer, H., & Grassberger, P. (2004). Estimating mutual information. Phys Rev E, 69(6), 066138.
Lizier, Joseph T., Mikhail Prokopenko, and Albert Y. Zomaya. (2012). Local measures of information storage in complex distributed computation. Inform Sci, 208, 39-54.
Schreiber, T. (2000). Measuring information transfer. Phys Rev Lett, 85(2), 461.

Estimators can be used to perform multiple, independent searches in parallel. Each of these parallel searches is called a ‘chunk’. To search multiple chunks, provide point sets as 2D arrays, where the first dimension represents samples or points, and the second dimension represents the points’ dimensions. Concatenate chunk data in the first dimension and pass the number of chunks to the estimators. Chunks must be of equal size.

Set common estimation parameters for OpenCL estimators. For usage of these estimators see documentation for the child classes.

Args:

settingsdict [optional]

set estimator parameters:

gpuid : int [optional] - device ID used for estimation (if more than one device is available on the current platform) (default=0)
kraskov_k : int [optional] - no. nearest neighbours for KNN search (default=4)
normalise : bool [optional] - z-standardise data (default=False)
theiler_t : int [optional] - no. next temporal neighbours ignored in KNN and range searches (default=0)
noise_level : float [optional] - random noise added to the data (default=1e-8)
padding : bool [optional] - pad data to a length that is a multiple of 1024, workaround for a
debug : bool [optional] - calculate intermediate results, i.e. neighbour counts from range searches and KNN distances, print debug output to console (default=False)
return_counts : bool [optional] - return intermediate results, i.e. neighbour counts from range searches and KNN distances (default=False)

is_analytic_null_estimator()[source]¶

Indicate if estimator supports analytic surrogates.

Return true if the estimator implements estimate_surrogates_analytic() where data is formatted as per the estimate method for this estimator.

Returns:: bool

is_parallel()[source]¶

Indicate if estimator supports parallel estimation over chunks.

Return true if the supports parallel estimation over chunks, where a chunk is one independent data set.

Returns:: bool

class idtxl.estimators_opencl.OpenCLKraskovCMI(settings=None)[source]¶

Bases: idtxl.estimators_opencl.OpenCLKraskov

Calculate conditional mutual inform with OpenCL Kraskov implementation.

Calculate the conditional mutual information (CMI) between three variables using OpenCL GPU-code. If no conditional is given (is None), the function returns the mutual information between var1 and var2. See parent class for references.

Results are returned in nats.

Args:

settingsdict [optional]

set estimator parameters:

gpuid : int [optional] - device ID used for estimation (if more than one device is available on the current platform) (default=0)
kraskov_k : int [optional] - no. nearest neighbours for KNN search (default=4)
normalise : bool [optional] - z-standardise data (default=False)
theiler_t : int [optional] - no. next temporal neighbours ignored in KNN and range searches (default=0)
noise_level : float [optional] - random noise added to the data (default=1e-8)
debug : bool [optional] - return intermediate results, i.e. neighbour counts from range searches and KNN distances (default=False)
return_counts : bool [optional] - return intermediate results, i.e. neighbour counts from range searches and KNN distances (default=False)

estimate(var1, var2, conditional=None, n_chunks=1)[source]¶

Estimate conditional mutual information.

If conditional is None, the mutual information between var1 and var2 is calculated.

Args:

var1numpy array: realisations of first variable, either a 2D numpy array where array dimensions represent [(realisations * n_chunks) x variable dimension] or a 1D array representing [realisations], array type should be int32
var2numpy array: realisations of the second variable (similar to var1)
conditionalnumpy array: realisations of conditioning variable (similar to var1)
n_chunksint: number of data chunks, no. data points has to be the same for each chunk

Returns:

float | numpy array: average CMI over all samples or local CMI for individual samples if ‘local_values’=True
numpy arrays: distances and neighborhood counts for var1 and var2 if debug=True and return_counts=True

class idtxl.estimators_opencl.OpenCLKraskovMI(settings=None)[source]¶

Bases: idtxl.estimators_opencl.OpenCLKraskov

Calculate mutual information with OpenCL Kraskov implementation.

Calculate the mutual information (MI) between two variables using OpenCL GPU-code. See parent class for references.

Results are returned in nats.

Args:

settingsdict [optional]

set estimator parameters:

gpuid : int [optional] - device ID used for estimation (if more than one device is available on the current platform) (default=0)
kraskov_k : int [optional] - no. nearest neighbours for KNN search (default=4)
normalise : bool [optional] - z-standardise data (default=False)
theiler_t : int [optional] - no. next temporal neighbours ignored in KNN and range searches (default=0)
noise_level : float [optional] - random noise added to the data (default=1e-8)
debug : bool [optional] - return intermediate results, i.e. neighbour counts from range searches and KNN distances (default=False)
return_counts : bool [optional] - return intermediate results, i.e. neighbour counts from range searches and KNN distances (default=False)
lag_mi : int [optional] - time difference in samples to calculate the lagged MI between processes (default=0)

estimate(var1, var2, n_chunks=1)[source]¶

Estimate mutual information.

Args:

var1numpy array: realisations of first variable, either a 2D numpy array where array dimensions represent [(realisations * n_chunks) x variable dimension] or a 1D array representing [realisations], array type should be int32
var2numpy array: realisations of the second variable (similar to var1)
n_chunksint: number of data chunks, no. data points has to be the same for each chunk

Returns:

float | numpy array: average MI over all samples or local MI for individual samples if ‘local_values’=True
numpy arrays: distances and neighborhood counts for var1 and var2 if debug=True and return_counts=True

idtxl.estimators_mpi module¶

class idtxl.estimators_mpi.MPIEstimator(est, settings)[source]¶

Bases: idtxl.estimator.Estimator

MPI Wrapper for arbitrary Estimator implementations

Make sure to have an “if __name__==’__main__’:” guard in your main script to avoid infinite recursion!

To use MPI, add MPI=True to the Estimator settings dictionary and optionally provide max_workers

Call using mpiexec:

>>> mpiexec -n 1 -usize <max workers + 1> python <python script>

or, if MPI does not support spawning new workers (i.e. MPI version < 2)

>>> mpiexec -n <max workers + 1> python -m mpi4py.futures <python script>

Call using slurm:

>>> srun -n $SLURM_NTASKS --mpi=pmi2 python -m mpi4py.futures <python script>

estimate(*, n_chunks=1, **data)[source]¶

Distributes the given chunks of a task to Estimators on worker ranks using MPI.

Needs to be called with kwargs only.

Args:

n_chunksint [optional]: Number of chunks to split the data into, default=1.
datadict[str, Sequence]: Dictionary of random variable realizations

Returns:

numpy array: Estimates of information-theoretic quantities as np.double values

estimate_surrogates_analytic(**data)[source]¶

Forward analytic estimation to the base Estimator.

Analytic estimation is assumed to have shorter runtime and is thus performed on rank 0 alone for now.

is_analytic_null_estimator()[source]¶: Test if the base Estimator is an analytic null estimator.

is_parallel()[source]¶

Indicate if estimator supports parallel estimation over chunks.

Return true if the supports parallel estimation over chunks, where a chunk is one independent data set.

Returns:: bool

idtxl.estimators_python module¶

class idtxl.estimators_python.PythonKraskovCMI(settings)[source]¶

Bases: idtxl.estimator.Estimator

Estimate conditional mutual information using Kraskov’s first estimator.

Args:

settingsdict [optional]

set estimator parameters:

kraskov_k : int [optional] - no. nearest neighbours for KNN search (default=4)
base : float - base of returned values (default=np=e)
normalise : bool [optional] - z-standardise data (default=False)
noise_level : float [optional] - random noise added to the data (default=1e-8)
rng_seed : int | None [optional] - random seed if noise level > 0
num_threads : int | str [optional] - number of threads used for estimation (default=’USE_ALL’, note that this uses all available threads on the current machine)
knn_finder : str [optional] - knn algorithm to use, can be ‘scipy_kdtree’ (default), ‘sklearn_kdtree’, or ‘sklearn_balltree’

estimate(var1: numpy.ndarray, var2: numpy.ndarray, conditional=None)[source]¶: Estimate conditional mutual information between var1 and var2, given conditional.

is_analytic_null_estimator()[source]¶

Indicate if estimator supports analytic surrogates.

Return true if the estimator implements estimate_surrogates_analytic() where data is formatted as per the estimate method for this estimator.

Returns:: bool

is_parallel()[source]¶

Indicate if estimator supports parallel estimation over chunks.

Return true if the supports parallel estimation over chunks, where a chunk is one independent data set.

Returns:: bool

idtxl.estimators_multivariate_pid module¶

Multivariate Partical information decomposition for discrete random variables.

This module provides an estimator for multivariate partial information decomposition as proposed in

Makkeh, A. & Gutknecht, A. & Wibral, M. (2020). A Differentiable measure for shared information. 1- 27 Retrieved from http://arxiv.org/abs/2002.03356

class idtxl.estimators_multivariate_pid.SxPID(settings)[source]¶

Bases: idtxl.estimator.Estimator

Estimate partial information decomposition for multiple inputs.

Implementation of the multivariate partial information decomposition (PID) estimator for discrete data with (up to 4 inputs) and one output. The estimator finds shared information, unique information and synergistic information between the multiple inputs s1, s2, …, sn with respect to the output t for each realization (t, s1, …, sn) and then average them according to their distribution weights p(t, s1, …, sn). Both the pointwise (on the realization level) PID and the averaged PID are returned (see the ‘return’ of ‘estimate()’).

The algorithm uses recursion to compute the partial information decomposition.

References:

Makkeh, A. & Wibral, M. (2020). A differentiable pointwise partial Information Decomposition estimator. https://github.com/Abzinger/SxPID.

Args:

settingsdict

estimation parameters (with default parameters)

verbose : bool [optional] - print output to console (default=False)

estimate(s, t)[source]¶

Estimate SxPID from list of sources and a target

Args:

slist of numpy arrays: 1D arrays containing realizations of a discrete random variable
tnumpy array: 1D array containing realizations of a discrete random variable

Returns:

dict

SxPID results, with entries

‘ptw’ -> { realization -> {alpha -> [float, float, float]}}: pointwise decomposition
‘avg’ -> {alpha -> [float, float, float]}: average decomposition

the list of floats is ordered [informative, misinformative, informative - misinformative]

is_analytic_null_estimator()[source]¶

Indicate if estimator supports analytic surrogates.

Return true if the estimator implements estimate_surrogates_analytic() where data is formatted as per the estimate method for this estimator.

Returns:: bool

is_parallel()[source]¶

Indicate if estimator supports parallel estimation over chunks.

Return true if the supports parallel estimation over chunks, where a chunk is one independent data set.

Returns:: bool

idtxl.estimators_pid module¶

Partical information decomposition for discrete random variables.

This module provides an estimator for partial information decomposition as proposed in

Bertschinger, N., Rauh, J., Olbrich, E., Jost, J., & Ay, N. (2014). Quantifying Unique Information. Entropy, 16(4), 2161–2183. http://doi.org/10.3390/e16042161

class idtxl.estimators_pid.SydneyPID(settings)[source]¶

Bases: idtxl.estimator.Estimator

Estimate partial information decomposition of discrete variables.

Fast implementation of the BROJA partial information decomposition (PID) estimator for discrete data (Bertschinger, 2014). The estimator does not require JAVA or GPU modules to run.

The estimator finds shared information, unique information and synergistic information between the two inputs s1 and s2 with respect to the output t.

Improved version with larger initial swaps and checking for convergence of both the unique information from sources 1 and 2. The function counts the empirical observations, calculates probabilities and the initial CMI, then does the vitrualised swaps until it has converged, and finally calculates the PID. The virtualised swaps stage contains two loops. An inner loop which actually does the virtualised swapping, keeping the changes if the CMI decreases; and an outer loop which decreases the size of the probability mass increment the virtualised swapping utilises.

References

Bertschinger, N., Rauh, J., Olbrich, E., Jost, J., & Ay, N. (2014). Quantifying unique information. Entropy, 16(4), 2161–2183. http://doi.org/10.3390/e16042161

Args:

settingsdict

estimation parameters

alph_s1 : int - alphabet size of s1
alph_s2 : int - alphabet size of s2
alph_t : int - alphabet size of t
max_unsuc_swaps_row_parm : int - soft limit for virtualised swaps based on the number of unsuccessful swaps attempted in a row. If there are too many unsuccessful swaps in a row, then it will break the inner swap loop; the outer loop decrements the size of the probability mass increment and then attemps virtualised swaps again with the smaller probability increment. The exact number of unsuccessful swaps allowed before breaking is the total number of possible swaps (given our alphabet sizes) times the control parameter max_unsuc_swaps_row_parm, e.g., if the parameter is set to 3, this gives a high degree of confidence that nearly (if not) all of the possible swaps have been attempted before this soft limit breaks the swap loop.
num_reps : int - number of times the outer loop will halve the size of the probability increment used for the virtualised swaps. This is in direct correspondence with the number of times the empirical data was replicated in your original implementation.
max_iters : int - provides a hard upper bound on the number of times it will attempt to perform virtualised swaps in the inner loop. However, this hard limit is (practically) never used as it should always hit the soft limit defined above (parameter may be removed in the future).
verbose : bool [optional] - print output to console (default=False)

estimate(s1, s2, t)[source]¶

Args:

s1numpy array: 1D array containing realizations of a discrete random variable
s2numpy array: 1D array containing realizations of a discrete random variable
tnumpy array: 1D array containing realizations of a discrete random variable

Returns:

dict: estimated decomposition, contains the joint distribution, unique, shared, and synergistic information

is_analytic_null_estimator()[source]¶

Indicate if estimator supports analytic surrogates.

Return true if the estimator implements estimate_surrogates_analytic() where data is formatted as per the estimate method for this estimator.

Returns:: bool

is_parallel()[source]¶

Indicate if estimator supports parallel estimation over chunks.

Return true if the supports parallel estimation over chunks, where a chunk is one independent data set.

Returns:: bool

class idtxl.estimators_pid.TartuPID(settings)[source]¶

Bases: idtxl.estimator.Estimator

Estimate partial information decomposition for two inputs and one output

Implementation of the partial information decomposition (PID) estimator for discrete data. The estimator finds shared information, unique information and synergistic information between the two inputs s1 and s2 with respect to the output t.

The algorithm uses exponential cone programming and requires the Python package for ECOS: Embedded Cone Solver (https://pypi.python.org/pypi/ecos).

References:

Makkeh, A., Theis, D.O., & Vicente, R. (2017). Bivariate Partial Information Decomposition: The Optimization Perspective. Entropy, 19(10), 530.
Makkeh, A., Theis, D.O., & Vicente, R. (2018). BROJA-2PID: A cone programming based Partial Information Decomposition estimator. Entropy, 20(271), https://github.com/Abzinger/BROJA_2PID.

Args:

settingsdict

estimation parameters (with default parameters)

verbose : bool [optional] - print output to console (default=False)
cone_solver : str [optional] - which cone solver to use (default=’ECOS’)
solver_args : dict [optional] - solver arguments (default={})

estimate(s1, s2, t)[source]¶

Args:

s1numpy array: 1D array containing realizations of a discrete random variable
s2numpy array: 1D array containing realizations of a discrete random variable
tnumpy array: 1D array containing realizations of a discrete random variable

Returns:

dict: estimated decomposition, solver used, numerical error

is_analytic_null_estimator()[source]¶

Indicate if estimator supports analytic surrogates.

Return true if the estimator implements estimate_surrogates_analytic() where data is formatted as per the estimate method for this estimator.

Returns:: bool

is_parallel()[source]¶

Indicate if estimator supports parallel estimation over chunks.

Return true if the supports parallel estimation over chunks, where a chunk is one independent data set.

Returns:: bool

idtxl.idtxl_exceptions module¶

Provide error handling and warnings.

exception idtxl.idtxl_exceptions.AlgorithmExhaustedError(message)[source]¶

Bases: Exception

Exception raised to signal that the estimators can no longer be used for this particular target (e.g. because of memory errors in high dimensions) but that the estimation could continue for others.

Attributes:: message – explanation of the error

exception idtxl.idtxl_exceptions.BROJA_2PID_Exception[source]¶: Bases: Exception

exception idtxl.idtxl_exceptions.JidtOutOfMemoryError(message)[source]¶

Bases: idtxl.idtxl_exceptions.AlgorithmExhaustedError

Exception raised to signal a Java OutOfMemoryException.: It is a child class of AlgorithmExhaustedError.
Attributes:: message – explanation of the error

idtxl.idtxl_exceptions.package_missing(err, message)[source]¶: Report a missing optional package upon import.

idtxl.idtxl_io module¶

Provide I/O functionality.

Provide functions to load and save IDTxl data, provide import functions (e.g., mat-files, FieldTrip) and export functions (e.g., networkx, BrainNet Viewer).

idtxl.idtxl_io.export_brain_net_viewer(adjacency_matrix, mni_coord, file_name, **kwargs)[source]¶

Export network to BrainNet Viewer.

Export networks to BrainNet Viewer (project home page: http://www.nitrc.org/projects/bnv/). BrainNet Viewer is a MATLAB toolbox offering brain network visualisation (e.g., ‘glass’ brains). The function creates text files [file_name].node and [file_name].edge, containing information on node location (in MNI coordinates), directed edges, node color and size.

References:

Xia, M., Wang, J., & He, Y. (2013). BrainNet Viewer: A Network Visualization Tool for Human Brain Connectomics. PLoS ONE 8(7):e68910. https://doi.org/10.1371/journal.pone.0068910

Args:

adjacency_matrixAdjacencyMatrix instance: adjacency matrix to be exported, returned by get_adjacency_matrix() method of Results() class
mni_coordnumpy array: MNI coordinates (x,y,z) of the sources, array with size [n 3], where n is the number of nodes
file_namestr: file name for output files including the file path
labelsarray type of str [optional]: list of node labels of length n, description or label for each node. Note that labels can’t contain spaces (causes BrainNet to crash), the function will remove any spaces from labels (default=no labels)
node_colorarray type of colors [optional]: BrainNet gives you the option to color nodes according to the values in this vector (length n), see BrainNet Manual
node_sizearray type of int [optional]: BrainNet gives you the option to size nodes according to the values in this array (length n), see BrainNet Manual

idtxl.idtxl_io.export_networkx_graph(adjacency_matrix, weights)[source]¶

Export networkx graph object for an inferred network.

Export a weighted, directed graph object from the network of inferred (multivariate) interactions (e.g., multivariate TE), using the networkx class for directed graphs (DiGraph). Multiple options for the weight are available (see documentation of method get_adjacency_matrix for details).

Args:

adjacency_matrixAdjacencyMatrix instances: adjacency matrix to be exported, returned by get_adjacency_matrix() method of Results() class
weightsstr: weights for the adjacency matrix (see documentation of method get_adjacency_matrix for details)
fdrbool [optional]: return FDR-corrected results (default=True)

Returns: DiGraph instance

directed graph of networkx package’s DiGraph() class

idtxl.idtxl_io.export_networkx_source_graph(results, target, sign_sources=True, fdr=True)[source]¶

Export graph object of source variables for a single target.

Export graph object from the network of (multivariate) interactions (e.g., multivariate TE) between single source variables and a target process using the networkx class for directed graphs (DiGraph). The graph shows the information transfer between individual source variables and the target. Each node is a tuple with the following format: (process index, sample index).

Args:

resultsResults() instance: network analysis results
targetint: target index
sign_sourcesbool [optional]: add sources with significant information contribution only (default=True)
fdrbool [optional]: return FDR-corrected results (default=True)

Returns:

DiGraph instance: directed graph of networkx package’s DiGraph() class

idtxl.idtxl_io.import_fieldtrip(file_name, ft_struct_name, file_version, normalise=True)[source]¶

Convert FieldTrip-style MATLAB-file into an IDTxl Data object.

Import a MATLAB structure with fields “trial” (data), “label” (channel labels), “time” (time stamps for data samples), and “fsample” (sampling rate). This structure is the standard file format in the MATLAB toolbox FieldTrip and commonly use to represent neurophysiological data (see also http://www.fieldtriptoolbox.org/reference/ft_datatype_raw). The data is returned as a IDTxl Data() object.

The structure is assumed to be saved as a matlab hdf5 file (“-v7.3’ or higher, .mat) with a SINGLE FieldTrip data structure inside.

Args:

file_namestring: full (matlab) file_name on disk
ft_struct_namestring: variable name of the MATLAB structure that is in FieldTrip format (autodetect will hopefully be possible later …)
file_versionstring: version of the file, e.g. ‘v7.3’ for MATLAB’s 7.3 format
normalisebool [optional]: normalise data after import (default=True)

Returns:

Data() instance: instance of IDTxl Data object, containing data from the ‘trial’ field
list of strings: list of channel labels, corresponding to the ‘label’ field
numpy array: time stamps for samples, corresponding to one entry in the ‘time’ field
int: sampling rate, corresponding to the ‘fsample’ field

idtxl.idtxl_io.import_matarray(file_name, array_name, file_version, dim_order, normalise=True)[source]¶

Read Matlab hdf5 file into IDTxl.

reads a matlab hdf5 file (“-v7.3’ or higher, .mat) or non-hdf5 files with a SINGLE array inside and returns an IDTxl Data() object.

Note:

The import function squeezes the loaded mat-file, i.e., any singleton dimension will be removed. Hence do not enter singleton dimension into the ‘dim_order’, e.g., don’t pass dim_order=’ps’ but dim_order=’s’ if you want to load a 1D-array where entries represent samples recorded from a single channel.

Args:

file_namestring: full (matlab) file_name on disk
array_namestring: variable name of the MATLAB structure to be read
file_versionstring: version of the file, e.g. ‘v7.3’ for MATLAB’s 7.3 format, currently versions ‘v4’, ‘v6’, ‘v7’, and ‘v7’ are supported
dim_orderstring: order of dimensions, accepts any combination of the characters ‘p’, ‘s’, and ‘r’ for processes, samples, and replications; must have the same length as the data dimensionality, e.g., ‘ps’ for a two-dimensional array of data from several processes over time
normalisebool [optional]: normalise data after import (default=True)

Returns:

Data() instance: instance of IDTxl Data object, containing data from the ‘trial’ field

idtxl.idtxl_io.load_json(file_path)[source]¶

Load dictionary saved as JSON file from disk.

Args:

file_pathstr: path to file (including extension)

Returns:

dict

Note: JSON does not recognize numpy data structures and types. Numpy arrays and data types (float, int) are thus converted to Python types and lists. The loaded dictionary may thus contain different data types than the saved one.

idtxl.idtxl_io.load_pickle(name)[source]¶: Load objects that have been saved using Python’s pickle module.

idtxl.idtxl_io.save_json(d, file_path)[source]¶

Save dictionary to disk as JSON file.

Writes dictionary to disk at the specified file path.

Args:

ddict: dictionary to be written to disk
file_pathstr: path to file (including extension)

Note: JSON does not recognize numpy data types, those are converted to basic Python data types first.

idtxl.idtxl_io.save_pickle(obj, name)[source]¶

Save objects using Python’s pickle module.

Note:: pickle.HIGHEST_PROTOCOL is a binary format, which may be inconvenient, but is good for performance. Protocol 0 is a text format.

idtxl.idtxl_utils module¶

Provide IDTxl utility functions.

idtxl.idtxl_utils.argsort_descending(a)[source]¶: Sort array in descending order and return sortind indices.

idtxl.idtxl_utils.autocorrelation(x)[source]¶: Calculate autocorrelation of a vector.

idtxl.idtxl_utils.calculate_mi(corr)[source]¶: Calculate mutual information from correlation coefficient.

idtxl.idtxl_utils.combine_discrete_dimensions(a, numBins)[source]¶

Combine multi-dimensional discrete variable into a single dimension.

Combine all dimensions for a discrete variable down into a single dimensional value for each sample. This is done basically by multiplying each dimension by a different power of the base (numBins).

Adapted from infodynamics.utils.MatrixUtils.computeCombinedValues() from JIDT by J.Lizier.

Args:

anumpy array: data to be combined across all variable dimensions. Dimensions are realisations (samples) x variable dimension
numBinsint: number of discrete levels or bins for each variable dimension

Returns:

numpy array: a univariate array – one entry now for each sample, with all dimensions of the data now combined for that sample

idtxl.idtxl_utils.conflicting_entries(dict_1, dict_2)[source]¶

Test two dictionaries for unequal entries.

Note that only keys that are present in both dicts are compared. If one dictionary contains an entry not present in the other dictionary, the test passes.

idtxl.idtxl_utils.discretise(a, numBins)[source]¶

Discretise continuous data.

Discretise continuous data into discrete values (with 0 as lowest) by evenly partitioning the range of the data, one dimension at a time. Adapted from infodynamics.utils.MatrixUtils.discretise() from JIDT by J. Lizier.

Args:

anumpy array: data to be discretised. Dimensions are realisations x variable dimension
numBinsint: number of discrete levels or bins to partition the data into

Returns:

numpy array: discretised data

idtxl.idtxl_utils.discretise_max_ent(a, numBins)[source]¶

Discretise continuous data using maximum entropy partitioning.

Discretise continuous data into discrete values (with 0 as lowest) by making a maximum entropy partitioning, one dimension at a time. Adapted from infodynamics.utils.MatrixUtils.discretiseMaxEntropy() from JIDT by J. Lizier.

Args:

anumpy array: data to be discretised. Dimensions are realisations x variable dimension
numBinsint: number of discrete levels or bins to partition the data into

Returns:

numpy array: discretised data

idtxl.idtxl_utils.equal_dicts(dict_1, dict_2)[source]¶: Test two dictionaries for equality.

idtxl.idtxl_utils.print_dict(d, indent=4)[source]¶: Use Python’s pretty printer to print dictionaries to the console.

idtxl.idtxl_utils.remove_column(a, j)[source]¶

Remove a column from a numpy array.

This is faster than logical indexing (‘25 times faster’), because it does not make copies, see http://scipy.github.io/old-wiki/pages/PerformanceTips

Args:

anumpy array: 2-dimensional numpy array
iint: column index to be removed

idtxl.idtxl_utils.remove_row(a, i)[source]¶

Remove a row from a numpy array.

This is faster than logical indexing (‘25 times faster’), because it does not make copies, see http://scipy.github.io/old-wiki/pages/PerformanceTips

Args:

anumpy array: 2-dimensional numpy array
iint: row index to be removed

idtxl.idtxl_utils.separate_arrays(idx_all, idx_single, a)[source]¶

Separate a single column from all other columns in a 2D-array.

Return the separated single column and the remaining columns of a 2D- array.

Args:

idx_alllist<Object>: list of variables indicating the full set
idx_single<Object>: single variable indicating the column to be separated, variable must be contained in idx_all
anumpy array: 2D-array with the same length along axis 1 as idx_all (.shape[1] == len(idx_all))

Returns:

numpy array: remaining columns in full array
numpy array: column at single index

idtxl.idtxl_utils.sort_descending(a)[source]¶: Sort array in descending order.

idtxl.idtxl_utils.standardise(a, dimension=0, df=1)[source]¶

Z-standardise a numpy array along a given dimension.

Standardise array along the axis defined in dimension using the denominator (N - df) for the calculation of the standard deviation.

Args:

anumpy array: data to be standardised
dimensionint [optional]: dimension along which array should be standardised
dfint [optional]: degrees of freedom for the denominator of the standard derivation

Returns:

numpy array: standardised data

idtxl.idtxl_utils.swap_chars(s, i_1, i_2)[source]¶

Swap to characters in a string.

Example:

>>> print(swap_chars('heLlotHere', 2, 6))
'heHlotLere'

class idtxl.idtxl_utils.timeout(timeout_duration, exception_message='Timeout')[source]¶

Bases: object

Context manager for a timeout using threading module.

Args:

timeout_duration: float: number of seconds to wait before timeout is triggered
exception_messagestring: message to put in the exception

timeout_handler()[source]¶

idtxl.network_analysis module¶

Parent class for network inference and network comparison.

class idtxl.network_analysis.NetworkAnalysis[source]¶

Bases: object

Provide an analysis setup for network inference or comparison.

The class provides routines to check user input and set defaults.

property current_value¶: Get index of the current_value.

resume_checkpoint(file_path)[source]¶

Resume analysis from a checkpoint saved to disk.

Args:

file_pathstr: path to checkpoint file (excluding extension: .ckp)

property selected_vars_full¶: List of indices of the full conditional set.

property selected_vars_sources¶: List of indices of source samples in the conditional set.

property selected_vars_target¶: List of indices of target samples in the conditional set.

idtxl.network_inference module¶

Parent class for all network inference.

class idtxl.network_inference.NetworkInference[source]¶

Bases: idtxl.network_analysis.NetworkAnalysis

Parent class for network inference algorithms.

Hold variables that are relevant for network inference using for example bivariate and multivariate transfer entropy.

Attributes:

settingsdict: settings for estimation of information theoretic measures and statistical testing, see child classes for documentation
targetint: target process of analysis
current_valuetuple: index of the current value
selected_vars_fulllist of tuples: indices of the full set of random variables to be conditioned on
selected_vars_targetlist of tuples: indices of the set of conditionals coming from the target process
selected_vars_sourceslist of tuples: indices of the set of conditionals coming from source processes

class idtxl.network_inference.NetworkInferenceBivariate[source]¶

Bases: idtxl.network_inference.NetworkInference

Parent class for bivariate network inference algorithms.

class idtxl.network_inference.NetworkInferenceMI[source]¶

Bases: idtxl.network_inference.NetworkInference

Parent class for mutual information network inference algorithms.

class idtxl.network_inference.NetworkInferenceMultivariate[source]¶

Bases: idtxl.network_inference.NetworkInference

Parent class for multivariate network inference algorithms.

class idtxl.network_inference.NetworkInferenceTE[source]¶

Bases: idtxl.network_inference.NetworkInference

Parent class for transfer entropy network inference algorithms.

idtxl.single_process_analysis module¶

Parent class for analysis of single processes in the network.

class idtxl.single_process_analysis.SingleProcessAnalysis[source]¶: Bases: idtxl.network_analysis.NetworkAnalysis

idtxl.network_comparison module¶

Perform inference statistics on groups of data.

class idtxl.network_comparison.NetworkComparison[source]¶

Bases: idtxl.network_analysis.NetworkAnalysis

Set up network comparison between two experimental conditions.

The class provides methods for the comparison of networks inferred from data recorded under two experimental conditions A and B. Four statistical tests are implemented:

units of observation/ comparison type	stats_type	example
replications/ within a subject	dependent	base line (A) vs. task (B)
replications/ within a subject	independent	detect house (A) vs. face (B)
sets of data/ between subjects	dependent	patients (A) vs. matched controls (B)
sets of data/ between subjects	independent	male (A) vs. female (B) participants

Depending on the units of observations, one of two statistics methods can be used: compare_within() and compare_between(). The stats_type is passed as an analysis setting, see the documentation of the two methods for details.

Note that for network inference methods that use an embedding, i.e., a collection of variables in the source, the joint information in all variables about the target is used as a test statistic.

calculate_link_te(data, target, sources='all')[source]¶

Calculate the information transfer for whole links into a target.

Calculate the information transfer for whole links as the joint information transfer from all variables selected for a single source process into the target. The information transfer is calculated conditional on the target’s past and, for multivariate TE, conditional on selected variables from further sources in the network.

If sources is set to ‘all’, a list of information transfer values is returned. If sources is set to a single source index, the information transfer from this source to the target is returned.

Args:

dataData instance: raw data for analysis
targetint: index of target process
sourceslist of ints | ‘all’ [optional]: return estimates for links from selected or all sources into the target (default=’all’)

Returns:

numpy array: information transfer estimate for each link

compare_between(settings, network_set_a, network_set_b, data_set_a, data_set_b)[source]¶

Compare networks inferred under two conditions between subjects.

Compare two sets of networks inferred from two sets of data recorded under different experimental conditions within multiple subjects, i.e., data have been recorded from subjects assigned to one of two experimental conditions (units of observations are subjects).

Args:

settingsdict: parameters for estimation and statistical testing, see documentation of compare_within() for details
network_set_anumpy array of dicts: results from network inference for multiple subjects observed under condition a
network_set_bnumpy array of dicts: results from network inference for multiple subjects observed under condition b
data_anumpy array of Data objects: set of data from which network_set_a was inferred
data_bnumpy array of Data objects: set of data from which network_set_b was inferred

Returns

ResultsNetworkComparison object: results of network inference, see documentation of ResultsNetworkComparison()

compare_links_within(settings, link_a, link_b, network, data)[source]¶

Compare two links within the same network.

Compare two links within the same network. Check if information transfer is different from information transfer in a second link.

Note that both links have to be part of the inferred network, i.e., there has to be significant effective connectivity for both links.

Args:

settingsdict

parameters for estimation and statistical testing

stats_type : str - ‘dependent’ or ‘independent’ for dependent or independent units of observation
cmi_estimator : str - estimator to be used for CMI calculation (for estimator settings see the documentation in the estimators_* modules)
tail_comp : str [optional] - test tail, ‘one’ for one-sided test A > B, ‘two’ for two-sided test (default=’two’)
n_perm_comp : int [optional] - number of permutations (default=500)
alpha_comp : float - critical alpha level for statistical significance (default=0.05)
permute_in_time : bool [optional] - if True, create surrogates by shuffling data over time. See Data.permute_samples() for settings for further options for surrogate creation
verbose : bool [optional] - toggle console output (default=True)

link_aarray type

first link, array type with two entries [source target]

link_barray type

second link, array type with two entries [source target]

networkdict

results from network inference

dataData object

data from which network was inferred

Returns

ResultsNetworkComparison object: results of network inference, see documentation of ResultsNetworkComparison()

compare_within(settings, network_a, network_b, data_a, data_b)[source]¶

Compare networks inferred under two conditions within one subject.

Compare two networks inferred from data recorded under two different experimental conditions within one subject (units of observations are replications of one experimental condition within one subject).

Args:

settingsdict

parameters for estimation and statistical testing

stats_type : str - ‘dependent’ or ‘independent’ for dependent or independent units of observation
cmi_estimator : str - estimator to be used for CMI calculation (for estimator settings see the documentation in the estimators_* modules)
tail_comp : str [optional] - test tail, ‘one’ for one-sided test A > B, ‘two’ for two-sided test (default=’two’)
n_perm_comp : int [optional] - number of permutations (default=500)
alpha_comp : float - critical alpha level for statistical significance (default=0.05)
permute_in_time : bool [optional] - if True, create surrogates by shuffling data over time. See Data.permute_samples() for settings for further options for surrogate creation
verbose : bool [optional] - toggle console output (default=True)

network_adict

results from network inference, condition a

network_bdict

results from network inference, condition b

data_aData object

data from which network_a was inferred

data_bData object

data from which network_b was inferred

Returns

ResultsNetworkComparison object: results of network inference, see documentation of ResultsNetworkComparison()

idtxl.results module¶

Provide results class for IDTxl network analysis.

class idtxl.results.AdjacencyMatrix(n_nodes, weight_type)[source]¶

Bases: object

Adjacency matrix representing inferred networks.

add_edge(i, j, weight)[source]¶: Add weighted edge (i, j) to adjacency matrix.

add_edge_list(i_list, j_list, weights)[source]¶: Add multiple weighted edges (i, j) to adjacency matrix.

get_edge_list()[source]¶

Return list of weighted edges.

Returns

list of tuples: each entry represents one edge in the graph: (i, j, weight)

n_edges()[source]¶

n_nodes()[source]¶: Return number of nodes.

print_matrix()[source]¶: Print weight and edge matrix.

class idtxl.results.DotDict[source]¶

Bases: dict

Dictionary with dot-notation access to values.

Provides the same functionality as a regular dict, but also allows accessing values using dot-notation.

Example:

>>> from idtxl.results import DotDict
>>> d = DotDict({'a': 1, 'b': 2})
>>> d.a
>>> # Out: 1
>>> d['a']
>>> # Out: 1

class idtxl.results.Results(n_nodes, n_realisations, normalised)[source]¶

Bases: object

Parent class for results of network analysis algorithms.

Provide a container for results of network analysis algorithms, e.g., MultivariateTE or ActiveInformationStorage.

Attributes:

settingsdict

settings used for estimation of information theoretic measures and statistical testing

data_propertiesdict

data properties, contains

n_nodes : int - total number of nodes in the network

n_realisations : int - number of samples available for analysis given the settings (e.g., a high maximum lag used in network inference, results in fewer data points available for estimation)

normalised : bool - indicates if data were z-standardised before the estimation

combine_results(*results)[source]¶

Combine multiple (partial) results objects.

Combine a list of partial network analysis results into a single results object (e.g., results from analysis parallelized over processes). Raise an error if duplicate processes occur in partial results, or if analysis settings are not equal.

Note that only conflicting settings cause an error (i.e., settings with equal keys but different values). If additional settings are included in partial results (i.e., settings with different keys) these settings are added to the common settings dictionary.

Remove FDR-corrections from partial results before combining them. FDR- correction performed on the basis of parts of the network is not valid for the combined network.

Args:

resultslist of Results objects: single process analysis results from .analyse_network or .analyse_single_process methods, where each object contains partial results for one or multiple processes

Returns:

dict: combined results object

class idtxl.results.ResultsMultivariatePID(n_nodes, n_realisations, normalised)[source]¶

Bases: idtxl.results.ResultsNetworkAnalysis

Store results of Multivariate Partial Information Decomposition (PID) analysis.

Provide a container for results of Multivariate Partial Information Decomposition (PID) algorithms.

Note that for convenience all dictionaries in this class can additionally be accessed using dot-notation:
>>> res_pid._single_target[2].source_1
or
>>> res_pid._single_target[2].['source_1'].
Attributes:

settingsdict
settings used for estimation of information theoretic measures and statistical testing

data_propertiesdict
data properties, contains

n_nodes : int - total number of nodes in the network

n_realisations : int - number of samples available for analysis given the settings (e.g., a high maximum lag used in network inference, results in fewer data points available for estimation)

normalised : bool - indicates if data were z-standardised before the estimation

targets_analysedlist
list of analysed targets

get_single_target(target)[source]¶

Return results for a single target in the network.

Results for single targets include for each target

source_i : tuple - source variable i
selected_vars_sources : list of tuples - source variables used in PID estimation
avg : dict - avg pid {alpha -> float} where alpha is a redundancy lattice node
ptw : dict of dicts - ptw pid {rlz -> {alpha -> float} } where rlz is a single realisation of the random variables and alpha is a redundancy lattice node
current_value : tuple - current value used for analysis, described by target and sample index in the data
[estimator-specific settings]

Args:

targetint: target id

Returns:

dict: Results for single target. Note that for convenience dictionary entries can either be accessed via keywords (result[‘selected_vars_sources’]) or via dot-notation (result.selected_vars_sources).

class idtxl.results.ResultsNetworkAnalysis(n_nodes, n_realisations, normalised)[source]¶

Bases: idtxl.results.Results

get_single_target(target, fdr=True)[source]¶

Return results for a single target in the network.

Return results for individual processes, contains for each process

Results for single targets include for each target

omnibus_te : float - TE-value for joint information transfer from all sources into the target
omnibus_pval : float - p-value of omnibus information transfer into the target
omnibus_sign : bool - significance of omnibus information transfer wrt. to the alpha_omnibus specified in the settings
selected_vars_sources : list of tuples - source variables with significant information about the current value
selected_vars_target : list of tuples - target variables with significant information about the current value
selected_sources_pval : array of floats - p-value for each selected variable
selected_sources_te : array of floats - TE-value for each selected variable
sources_tested : list of int - list of sources tested for the current target
current_value : tuple - current value used for analysis, described by target and sample index in the data

Setting fdr to True returns FDR-corrected results (Benjamini, 1995).

Args:

targetint: target id
fdrbool [optional]: return FDR-corrected results, see documentation of network inference algorithms and stats.network_fdr (default=True)

Returns:

dict: Results for single target. Note that for convenience dictionary entries can either be accessed via keywords (result[‘selected_vars_sources’]) or via dot-notation (result.selected_vars_sources).

get_target_sources(target, fdr=True)[source]¶

Return list of sources (parents) for given target.

Args:

targetint: target index
fdrbool [optional]: if True, sources are returned for FDR-corrected results (default=True)

property targets_analysed¶: Get index of the current_value.

class idtxl.results.ResultsNetworkComparison(n_nodes, n_realisations, normalised)[source]¶

Bases: idtxl.results.ResultsNetworkAnalysis

Store results of network comparison.

Provide a container for results of network comparison algorithms.

Note that for convenience all dictionaries in this class can additionally be accessed using dot-notation: res_network.settings.cmi_estimator or res_network.settings[‘cmi_estimator’].

Attributes:

settingsdict

settings used for estimation of information theoretic measures and statistical testing

data_propertiesdict

data properties, contains

n_nodes : int - total number of nodes in the network

n_realisations : int - number of samples available for analysis given the settings (e.g., a high maximum lag used in network inference, results in fewer data points available for estimation)

normalised : bool - indicates if data were z-standardised before the estimation

surrogate_distributiondict

for each target, surrogate distributions used for testing of each link into the target

targets_analysedlist

list of analysed targets

abdict

for each target, list of comparison results for all links into the target; True if link in condition A > link in condition B

pvaldict

for each target, list of p-values for all compared links

cmi_diff_absdict

for each target, list of absolute difference in interaction measure for all compared links

data_propertiesdict

information regarding the data used for analysis

settingsdict

settings used for comparison

get_adjacency_matrix(weights='comparison')[source]¶

Return adjacency matrix.

Return adjacency matrix resulting from network inference. Multiple options for the weights are available.

Args:

weightsstr [optional]

can either be

‘union’: all links in the union network, i.e., all links that were tested for a difference

or return information for links with a significant difference

‘comparison’: True for links with a significant difference in
inferred effective connectivity (default)
‘pvalue’: absolute differences in inferred effective
connectivity for significant links
‘diff_abs’: absolute difference

Returns:

AdjacencyMatrix instance

get_single_target(target)[source]¶

Return results for a single target in the network.

Results for single targets include for each target

sources : list of ints - list of sources inferred for the current target (union of sources from both data sets entering the comparison)
selected_vars_sources : list of tuples - source variables with significant information about the current value (union of both conditions)
selected_vars_target : list of tuples - target variables with significant information about the current value (union of both conditions)

Args:

targetint: target id

Returns:

dict: Results for single target. Note that for convenience dictionary entries can either be accessed via keywords (result[‘selected_vars_sources’]) or via dot-notation (result.selected_vars_sources).

get_target_sources(target)[source]¶

Return list of sources (parents) for given target.

Args:

targetint: target index

print_edge_list(weights='comparison')[source]¶

Print results of network comparison to console.

Print results of network comparison to console. Output looks like this:

>>> 0 -> 1, diff_abs = 0.2
>>> 0 -> 2, diff_abs = 0.5
>>> 0 -> 3, diff_abs = 0.7
>>> 3 -> 4, diff_abs = 1.3
>>> 4 -> 3, diff_abs = 0.4

indicating differences in the network inference measure for a link source -> target.

Args:

weightsstr [optional]: weights for the adjacency matrix (see documentation of method get_adjacency_matrix for details)

class idtxl.results.ResultsNetworkInference(n_nodes, n_realisations, normalised)[source]¶

Bases: idtxl.results.ResultsNetworkAnalysis

Store results of network inference.

Provide a container for results of network inference algorithms, e.g., MultivariateTE or Bivariate TE.

Note that for convenience all dictionaries in this class can additionally be accessed using dot-notation:

>>> res_network.settings.cmi_estimator

or

>>> res_network.settings['cmi_estimator'].

Attributes:

settingsdict

settings used for estimation of information theoretic measures and statistical testing

data_propertiesdict

data properties, contains

n_nodes : int - total number of nodes in the network

n_realisations : int - number of samples available for analysis given the settings (e.g., a high maximum lag used in network inference, results in fewer data points available for estimation)

normalised : bool - indicates if data were z-standardised before estimation

targets_analysedlist

list of analysed targets

get_adjacency_matrix(weights, fdr=True)[source]¶

Return adjacency matrix.

Return adjacency matrix resulting from network inference. The adjacency matrix can either be generated from FDR-corrected results or uncorrected results. Multiple options for the weight are available.

Args:

weightsstr

can either be

‘max_te_lag’: the weights represent the source -> target
lag corresponding to the maximum tranfer entropy value (see documentation for method get_target_delays for details)
‘max_p_lag’: the weights represent the source -> target
lag corresponding to the maximum p-value (see documentation for method get_target_delays for details)
‘vars_count’: the weights represent the number of
statistically-significant source -> target lags
‘binary’: return unweighted adjacency matrix with binary
entries
- 1 = significant information transfer;
- 0 = no significant information transfer.

fdrbool [optional]

return FDR-corrected results (default=True)

Returns:

AdjacencyMatrix instance

get_source_variables(fdr=True)[source]¶

Return list of inferred past source variables for all targets.

Return a list of dictionaries, where each dictionary holds the selected past source variables for one analysed target. The list may be used as and input to significant subgraph mining in the postprocessing module.

Args:

fdrbool [optional]: return FDR-corrected results (default=True)

Returns:

list of dicts: selected past source variables for each target

get_target_delays(target, criterion='max_te', fdr=True)[source]¶

Return list of information-transfer delays for a given target.

Return a list of information-transfer delays for a given target. Information-transfer delays are determined by the lag of the variable in a source past that has the highest information transfer into the target process. There are two ways of identifying the variable with maximum information transfer:

use the variable with the highest absolute TE value (highest information transfer),

use the variable with the smallest p-value (highest statistical significance).

Args:

targetint: target index
criterionstr [optional]: use maximum TE value (‘max_te’) or p-value (‘max_p’) to determine the source-target delay (default=’max_te’)
fdrbool [optional]: return FDR-corrected results (default=True)

Returns:

numpy array: information-transfer delays for each source

print_edge_list(weights, fdr=True)[source]¶

Print results of network inference to console.

Print edge list resulting from network inference to console. Output may look like this:

>>> 0 -> 1, max_te_lag = 2
>>> 0 -> 2, max_te_lag = 3
>>> 0 -> 3, max_te_lag = 2
>>> 3 -> 4, max_te_lag = 1
>>> 4 -> 3, max_te_lag = 1

The edge list can either be generated from FDR-corrected results or uncorrected results. Multiple options for the weight are available (see documentation of method get_adjacency_matrix for details).

Args:

weightsstr: link weights (see documentation of method get_adjacency_matrix for details)
fdrbool [optional]: return FDR-corrected results (default=True)

class idtxl.results.ResultsPID(n_nodes, n_realisations, normalised)[source]¶

Bases: idtxl.results.ResultsNetworkAnalysis

Store results of Partial Information Decomposition (PID) analysis.

Provide a container for results of Partial Information Decomposition (PID) algorithms.

Note that for convenience all dictionaries in this class can additionally be accessed using dot-notation:

>>> res_pid._single_target[2].source_1

or

>>> res_pid._single_target[2].['source_1'].

Attributes:

settingsdict

settings used for estimation of information theoretic measures and statistical testing

data_propertiesdict

data properties, contains

n_nodes : int - total number of nodes in the network

n_realisations : int - number of samples available for analysis given the settings (e.g., a high maximum lag used in network inference, results in fewer data points available for estimation)

normalised : bool - indicates if data were z-standardised before the estimation

targets_analysedlist

list of analysed targets

get_single_target(target)[source]¶

Return results for a single target in the network.

Results for single targets include for each target

source_1 : tuple - source variable 1
source_2 : tuple - source variable 2
selected_vars_sources : list of tuples - source variables used in PID estimation
s1_unq : float - unique information in source 1
s2_unq : float - unique information in source 2
syn_s1_s2 : float - synergistic information in sources 1 and 2
shd_s1_s2 : float - shared information in sources 1 and 2
current_value : tuple - current value used for analysis, described by target and sample index in the data
[estimator-specific settings]

Args:

targetint: target id

Returns:

dict: Results for single target. Note that for convenience dictionary entries can either be accessed via keywords (result[‘selected_vars_sources’]) or via dot-notation (result.selected_vars_sources).

class idtxl.results.ResultsSingleProcessAnalysis(n_nodes, n_realisations, normalised)[source]¶

Bases: idtxl.results.Results

Store results of single process analysis.

Provide a container for the results of algorithms for the analysis of individual processes (nodes) in a multivariate stochastic process, e.g., estimation of active information storage.

Note that for convenience all dictionaries in this class can additionally be accessed using dot-notation:

>>> res_network.settings.cmi_estimator

or

>>> res_network.settings['cmi_estimator'].

Attributes:

settingsdict

settings used for estimation of information theoretic measures and statistical testing

data_propertiesdict

data properties, contains

n_nodes : int - total number of nodes in the network

n_realisations : int - number of samples available for analysis given the settings (e.g., a high maximum lag used in network inference, results in fewer data points available for estimation)

normalised : bool - indicates if data were z-standardised before estimation

processes_analysedlist

list of analysed processes

get_significant_processes(fdr=True)[source]¶

Return statistically-significant processes.

Indicates for each process whether AIS is statistically significant (equivalent to the adjacency matrix returned for network inference)

Args:

fdrbool [optional]: return FDR-corrected results, see documentation of network inference algorithms and stats.network_fdr (default=True)

Returns:

numpy array: Statistical significance for each process

get_single_process(process, fdr=True)[source]¶

Return results for a single process in the network.

Return results for individual processes, contains for each process

ais : float - AIS-value for current process

ais_pval : float - p-value of AIS estimate

ais_signbool - significance of AIS estimate wrt. to the
alpha_mi specified in the settings

selected_varlist of tuples - variables with significant
information about the current value of the process that have been added to the processes past state, a variable is described by the index of the process in the data and its lag in samples

current_valuetuple - current value used for analysis,
described by target and sample index in the data

Setting fdr to True returns FDR-corrected results (Benjamini, 1995).

Args:

processint: process id
fdrbool [optional]: return FDR-corrected results, see documentation of network inference algorithms and stats.network_fdr (default=True)

Returns:

dict: results for single process. Note that for convenience dictionary entries can either be accessed via keywords (result[‘selected_vars’]) or via dot-notation (result.selected_vars).

property processes_analysed¶: Get index of the current_value.

class idtxl.results.ResultsSingleProcessRudelt(processes)[source]¶

Bases: object

Store results of single process analysis.

Provides a container for the results Rudelt optimization algorithm. To obtain results for individual processes, call the .get_single_process() method (see docstring for details).

Note that for convenience all dictionaries in this class can additionally be accessed using dot-notation:

>>> res_network.settings.estimation_method

or

>>> res_network.settings['estimation_method'].

Attributes:

settingsdict

settings used for estimation of information theoretic measures

data_propertiesdict

data properties, contains

n_processes : int - total number of processes analysed

processes_analysedlist

list of analysed processes

get_single_process(process)[source]¶

Return results for a single process.

Return results for individual processes, contains for each process

Args:

processint: process id

Returns:

dict

results for single process. Note that for convenience dictionary entries can either be accessed via keywords (result[‘selected_vars’]) or via dot-notation (result.selected_vars). Contains keys

Processint
Process that was optimized
estimation_methodString
Estimation method that was used for optimization
T_Dfloat
Estimated optimal value for the temporal depth TD
tau_R :
Information timescale tau_R, a characteristic timescale of history dependence similar to an autocorrelation time.
R_totfloat
Estimated value for the total history dependence Rtot,
AIS_totfloat
Estimated value for the total active information storage
opt_number_of_bins_dint
Number of bins d for the embedding that yields (R̂tot ,T̂D)
opt_scaling_kint
Scaling exponent κ for the embedding that yields (R̂tot , T̂D)
opt_first_bin_sizeint
Size of the first bin τ1 for the embedding that yields (R̂tot , T̂D ),
history_dependencearray with floating-point values
Estimated history dependence for each embedding
firing_ratefloat
Firing rate of the neuron/ spike train
recording_lengthfloat
Length of the recording (in seconds)
H_spikingfloat
Entropy of the spike times

if analyse_auto_MI was set to True additionally:

auto_MIdict
numpy array of MI values for each delay
auto_MI_delayslist of int
list of delays depending on the given auto_MI_bin_sizes and auto_MI_max_delay

property processes_analysed¶: Get index of the current_value.

idtxl.stats module¶

Provide statistics functions.

idtxl.stats.ais_fdr(settings=None, *results)[source]¶

Perform FDR-correction on results of network AIS estimation.

Perform correction of the false discovery rate (FDR) after estimation of active information storage (AIS) for all processes in the network. FDR correction is applied by correcting the AIS estimate’s omnibus p-values for individual processes/nodes in the network.

Input can be a list of partial results to combine results from parallel analysis.

References:

Genovese, C.R., Lazar, N.A., & Nichols, T. (2002). Thresholding of statistical maps in functional neuroimaging using the false discovery rate. Neuroimage, 15(4), 870-878.

Args:

settingsdict [optional]

parameters for statistical testing with entries:

alpha_fdr : float [optional] - critical alpha level (default=0.05)
fdr_constant : int [optional] - choose one of two constants used for calculating the FDR-thresholds according to Genovese (2002): 1 will divide alpha by 1, 2 will divide alpha by the sum_i(1/i); see the paper for details on the assumptions (default=2)

resultsinstances of ResultsSingleProcessAnalysis

results of network AIS estimation, see documentation of ResultsSingleProcessAnalysis()

Returns:

ResultsSingleProcessAnalysis instance: input results objects pruned of non-significant estimates

idtxl.stats.check_n_perm(n_perm, alpha)[source]¶

Check if no. permutations is big enough to obtain the requested alpha.

Note:: The no. permutations must be big enough to theoretically allow for the detection of a p-value that is smaller than the critical alpha level. Otherwise the permutation test is pointless. The smalles possible p-value is 1/n_perm.

idtxl.stats.max_statistic(analysis_setup, data, candidate_set, te_max_candidate, conditional)[source]¶

Perform maximum statistics for one candidate source.

Test if a transfer entropy value is significantly bigger than the maximum values obtained from surrogates of all remanining candidates.

Args:

analysis_setupMultivariateTE instance

information on the current analysis, can have an optional attribute ‘settings’, a dictionary with parameters for statistical testing:

n_perm_max_stat : int [optional] - number of permutations (default=200)
alpha_max_stat : float [optional] - critical alpha level (default=0.05)
permute_in_time : bool [optional] - generate surrogates by shuffling samples in time instead of shuffling whole replications (default=False)

dataData instance

raw data

candidate_setlist of tuples

list of indices of remaning candidates

te_max_candidatefloat

transfer entropy value to be tested

conditionalnumpy array

realisations of conditional, 2D numpy array where array dimensions represent [realisations x variable dimension]

Returns:

bool: statistical significance
float: the test’s p-value
numpy array: surrogate table

Raises:

ex.AlgorithmExhaustedError: Raised from _create_surrogate_table() when calculation cannot be made

idtxl.stats.max_statistic_sequential(analysis_setup, data)[source]¶

Perform sequential maximum statistics for a set of candidate sources.

Test multivariate/bivariate MI/TE values against surrogates. Test highest TE/MI value against distribution of highest surrogate values, second highest against distribution of second highest, and so forth. Surrogates are created from each candidate in the candidate set, including the candidate that is currently tested. Surrogates are then sorted over candidates. This is repeated n_perm_max_seq times. Stop comparison if a TE/MI value is not significant compared to the distribution of surrogate values of the same rank. All smaller values are considered non-significant as well.

The conditional for estimation of MI/TE is taken from the current set of conditional variables in the analysis setup. For multivariate MI or TE surrogate creation, the full set of conditional variables is used. For bivariate MI or TE surrogate creation, the conditioning set has to be restricted to a subset of the current set of conditional variables: for bivariate MI no conditioning set is required, for bivariate TE only the past variables from the target are required (not the variables selected from other relevant sources).

This function will re-use the surrogate table created in the last min-stats round if that table is in the analysis_setup. This saves the complete calculation of surrogates for this statistic.

Args:

analysis_setupMultivariateTE instance

information on the current analysis, can have an optional attribute settings, a dictionary with parameters for statistical testing:

n_perm_max_seq : int [optional] - number of permutations (default=n_perm_min_stat|500)
alpha_max_seq : float [optional] - critical alpha level (default=0.05)
permute_in_time : bool [optional] - generate surrogates by shuffling samples in time instead of shuffling whole replications (default=False)

dataData instance

raw data

Returns:

numpy array, bool: statistical significance of each source
numpy array, float: the test’s p-values for each source
numpy array, float: TE values for individual sources

idtxl.stats.max_statistic_sequential_bivariate(analysis_setup, data)[source]¶

Perform sequential maximum statistics for a set of candidate sources.

Test multivariate/bivariate MI/TE values against surrogates. Test highest TE/MI value against distribution of highest surrogate values, second highest against distribution of second highest, and so forth. Surrogates are created from each candidate in the candidate set, including the candidate that is currently tested. Surrogates are then sorted over candidates. This is repeated n_perm_max_seq times. Stop comparison if a TE/MI value is not significant compared to the distribution of surrogate values of the same rank. All smaller values are considered non-significant as well.

The conditional for estimation of MI/TE is taken from the current set of conditional variables in the analysis setup. For multivariate MI or TE surrogate creation, the full set of conditional variables is used. For bivariate MI or TE surrogate creation, the conditioning set has to be restricted to a subset of the current set of conditional variables: for bivariate MI no conditioning set is required, for bivariate TE only the past variables from the target are required (not the variables selected from other relevant sources).

This function will re-use the surrogate table created in the last min-stats round if that table is in the analysis_setup. This saves the complete calculation of surrogates for this statistic.

Args:

analysis_setupMultivariateTE instance

information on the current analysis, can have an optional attribute settings, a dictionary with parameters for statistical testing:

n_perm_max_seq : int [optional] - number of permutations (default=n_perm_min_stat|500)
alpha_max_seq : float [optional] - critical alpha level (default=0.05)
permute_in_time : bool [optional] - generate surrogates by shuffling samples in time instead of shuffling whole replications (default=False)

dataData instance

raw data

Returns:

numpy array, bool: statistical significance of each source
numpy array, float: the test’s p-values for each source
numpy array, float: TE values for individual sources

idtxl.stats.mi_against_surrogates(analysis_setup, data)[source]¶

Test estimated mutual information for significance against surrogate data.

Shuffle realisations of the current value (point to be predicted) and re- calculate mutual information (MI) for shuffled data. The actual estimated MI is then compared against this distribution of MI values from surrogate data.

Args:

analysis_setupMultivariateTE instance

information on the current analysis, can have an optional attribute ‘settings’, a dictionary with parameters for statistical testing:

n_perm_mi : int [optional] - number of permutations (default=500)
alpha_mi : float [optional] - critical alpha level (default=0.05)
permute_in_time : bool [optional] - generate surrogates by shuffling samples in time instead of shuffling whole replications (default=False)

dataData instance

raw data

Returns:

float: estimated MI value
bool: statistical significance
float: p_value for estimated MI value

Raises:

ex.AlgorithmExhaustedError: Raised from estimate() methods when calculation cannot be made

idtxl.stats.min_statistic(analysis_setup, data, candidate_set, te_min_candidate, conditional=None)[source]¶

Perform minimum statistics for one candidate source.

Test if a transfer entropy value is significantly bigger than the minimum values obtained from surrogates of all remanining candidates.

Args:

analysis_setupMultivariateTE instance

information on the current analysis, can have an optional attribute ‘settings’, a dictionary with parameters for statistical testing:

n_perm_min_stat : int [optional] - number of permutations (default=500)
alpha_min_stat : float [optional] - critical alpha level (default=0.05)
permute_in_time : bool [optional] - generate surrogates by shuffling samples in time instead of shuffling whole replications (default=False)

dataData instance

raw data

candidate_setlist of tuples

list of indices of remaning candidates

te_min_candidatefloat

transfer entropy value to be tested

conditionalnumpy array [optional]

realisations of conditional, 2D numpy array where array dimensions represent [realisations x variable dimension] (default=None, no conditioning performed)

Returns:

bool: statistical significance
float: the test’s p-value
numpy array: surrogate table

Raises:

ex.AlgorithmExhaustedError: Raised from _create_surrogate_table() when calculation cannot be made

idtxl.stats.network_fdr(settings=None, *results)[source]¶

Perform FDR-correction on results of network inference.

Perform correction of the false discovery rate (FDR) after network analysis. FDR correction can either be applied at the target level (by correcting omnibus p-values) or at the single-link level (by correcting p-values of individual links between single samples and the target).

Input can be a list of partial results to combine results from parallel analysis.

References:

Genovese, C.R., Lazar, N.A., & Nichols, T. (2002). Thresholding of statistical maps in functional neuroimaging using the false discovery rate. Neuroimage, 15(4), 870-878.

Args:

settingsdict [optional]

parameters for statistical testing with entries:

alpha_fdr : float [optional] - critical alpha level (default=0.05)
correct_by_target : bool [optional] - if true correct p-values on on the target level (omnibus test p-values), otherwise correct p_values for individual variables (sequential max stats p-values) (default=True)
fdr_constant : int [optional] - choose one of two constants used for calculating the FDR-thresholds according to Genovese (2002): 1 will divide alpha by 1, 2 will divide alpha by the sum_i(1/i); see the paper for details on the assumptions (default=2)

resultsinstances of ResultsNetworkInference

results of network inference, see documentation of ResultsNetworkInference()

Returns:

ResultsNetworkInference instance: input object pruned of non-significant links

idtxl.stats.omnibus_test(analysis_setup, data)[source]¶

Perform an omnibus test on identified conditional variables.

Test the joint information transfer from all identified sources to the current value conditional on candidates in the target’s past. To test for significance, this is repeated for shuffled realisations of the sources. The distribution of values from shuffled data is then used as test distribution.

Args:

analysis_setupMultivariateTE instance

information on the current analysis, can have an optional attribute ‘settings’, a dictionary with parameters for statistical testing:

n_perm_omnibus : int [optional] - number of permutations (default=500)
alpha_omnibus : float [optional] - critical alpha level (default=0.05)
permute_in_time : bool [optional] - generate surrogates by shuffling samples in time instead of shuffling whole replications (default=False)

dataData instance

raw data

Returns:

bool: statistical significance
float: the test’s p-value
float: the estimated test statistic, i.e., the information transfer from all sources into the target

Raises:

ex.AlgorithmExhaustedError: Raised from estimate() calls when calculation cannot be made

idtxl.stats.syn_shd_against_surrogates(analysis_setup, data)[source]¶

Test the shared/synergistic information in the PID estimate.

Shuffle realisations of the target and re-calculate PID, in particular the synergistic and shared information from shuffled data. The original shared and synergistic information are then compared against the distribution of values calculated from surrogate data.

Args:

analysis_setupPartial_information_decomposition instance

information on the current analysis, should have an Attribute ‘settings’, a dict with optional fields

n_perm : int [optional] - number of permutations (default=500)
alpha : float [optional] - critical alpha level (default=0.05)
permute_in_time : bool [optional] - generate surrogates by shuffling samples in time instead of shuffling whole replications (default=False)

dataData instance

raw data

Returns:

dict: PID estimate from original data
bool: statistical significance of the shared information
float: p-value of the shared information
bool: statistical significance of the synergistic information
float: p-value of the synergistic information

idtxl.stats.unq_against_surrogates(analysis_setup, data)[source]¶

Test the unique information in the PID estimate against surrogate data.

Shuffle realisations of both sources individually and re-calculate PID, in particular the unique information from shuffled data. The original unique information is then compared against the distribution of values calculated from surrogate data.

Args:

analysis_setupPartial_information_decomposition instance

information on the current analysis, should have an Attribute ‘settings’, a dict with optional fields

n_perm : int [optional] - number of permutations (default=500)
alpha : float [optional] - critical alpha level (default=0.05)
permute_in_time : bool [optional] - generate surrogates by shuffling samples in time instead of shuffling whole replications (default=False)

dataData instance

raw data

Returns:

dict: PID estimate from original data
bool: statistical significance of the unique information in source 1
float: p-value of the unique information in source 1
bool: statistical significance of the unique information in source 2
float: p-value of the unique information in source 2

idtxl.visualise_graph module¶

Plot results of network inference.

idtxl.visualise_graph.plot_mute_graph()[source]¶

Plot MuTE example network.

Network of 5 AR-processes, which is used as an example the paper on the MuTE toolbox (Montalto, PLOS ONE, 2014, eq. 14). The network consists of five autoregressive (AR) processes with model orders 2 and les and the following (non-linear) couplings:

>>> 0 -> 1, u = 2
>>> 0 -> 2, u = 3
>>> 0 -> 3, u = 2 (non-linear)
>>> 3 -> 4, u = 1
>>> 4 -> 3, u = 1

Returns:

Figure handle: Figure object from the matplotlib package

idtxl.visualise_graph.plot_network(results, weights, fdr=True)[source]¶

Plot network of multivariate TE between processes.

Plot graph of the network of (multivariate) interactions between processes (e.g., multivariate TE). The function uses the networkx class for directed graphs (DiGraph) internally. Plots a network and adjacency matrix.

Args:

resultsResultsNetworkInference() instance

output of an network inference algorithm

weightsstr

for single network inference, it can either be

‘max_te_lag’: the weights represent the source -> target
lag corresponding to the maximum transfer entropy value (see documentation for method get_target_delays for details)

‘max_p_lag’: the weights represent the source -> target
lag corresponding to the maximum p-value (see documentation for method get_target_delays for details)

‘vars_count’: the weights represent the number of
statistically-significant source -> target lags

‘binary’: return unweighted adjacency matrix with binary
entries

1 = significant information transfer;

0 = no significant information transfer.

for network comparison, it can either be

‘union’: all links in the union network, i.e., all links that were tested for a difference

‘comparison’: True for links with a significant difference in
inferred effective connectivity (default)

‘pvalue’: absolute differences in inferred effective
connectivity for significant links

‘diff_abs’: absolute difference

fdrbool [optional]

print FDR-corrected results (default=True)

Returns:

DiGraph: instance of a directed graph class from the networkx package
Figure: figure handle, Figure object from the matplotlib package

idtxl.visualise_graph.plot_network_comparison(results)[source]¶

Plot results of network comparison.

Plot results of network comparison. Produces a figure with five subplots, where the first plot shows the network graph of the union network, the second plot shows the adjacency matrix of the union network, the third plot shows the qualitative results of the comparison of each link, the fourth plot shows the absolute differences in CMI per link, and the fifth plot shows p-values for each link.

Args:

resultsResultsNetworkComparison() instance: network comparison results

Returns:

DiGraph: instance of a directed graph class from the networkx package
Figure: figure handle, Figure object from the matplotlib package

idtxl.visualise_graph.plot_selected_vars(results, target, sign_sources=True, display_edge_labels=False, fdr=True)[source]¶

Plot network of a target process and single variables.

Plot graph of the network of (multivariate) interactions between source variables and the target. The function uses the networkx class for directed graphs (DiGraph) internally. Plots a network and reduced adjacency matrix.

Args:

resultsResultsNetworkInference() instance: output of an network inference algorithm
targetint: index of target process
sign_sourcesbool [optional]: plot sources with significant information contribution only (default=True)
display_edge_labelsbool [optional]: display TE value on edge lables (default=False)
fdrbool [optional]: print FDR-corrected results (default=True)

Returns:

DiGraph: instance of a directed graph class from the networkx package
Figure: figure handle, Figure object from the matplotlib package

Module contents¶

IDTxl: Information Dynamics Toolkit xl.

IDTxl is a comprehensive software package for efficient inference of networks and their node dynamics from multivariate time series data using information theory. IDTxl provides functionality to estimate the following measures:

For network inference:
- multivariate transfer entropy (TE)/Granger causality (GC)
- multivariate mutual information (MI)
- bivariate TE/GC
- bivariate MI

For analysis of node dynamics:
- active information storage (AIS)
- partial information decomposition (PID)

IDTxl implements estimators for discrete and continuous data with parallel computing engines for both GPU and CPU platforms. Written for Python3.4.3+.

idtxl package¶

Submodules¶

idtxl.data module¶

idtxl.bivariate_te module¶

idtxl.bivariate_mi module¶

idtxl.bivariate_pid module¶

idtxl.multivariate_te module¶

idtxl.multivariate_mi module¶

idtxl.multivariate_pid module¶

idtxl.active_information_storage module¶

idtxl.embedding_optimization_ais_Rudelt module¶

idtxl.estimators_Rudelt module¶

idtxl.estimators_jidt module¶

idtxl.estimators_opencl module¶

idtxl.estimators_mpi module¶

idtxl.estimators_python module¶

idtxl.estimators_multivariate_pid module¶

idtxl.estimators_pid module¶

idtxl.idtxl_exceptions module¶

idtxl.idtxl_io module¶

idtxl.idtxl_utils module¶

idtxl.network_analysis module¶

idtxl.network_inference module¶

idtxl.single_process_analysis module¶

idtxl.network_comparison module¶

idtxl.results module¶

idtxl.stats module¶

idtxl.visualise_graph module¶

Module contents¶

Table of Contents

Previous topic

This Page