idtxl package¶
Submodules¶
idtxl.data module¶
Provide data structures for IDTxl analysis.
- class idtxl.data.Data(data=None, dim_order='psr', normalise=True, seed=None)[source]¶
Bases:
object
Store data for information dynamics estimation.
Data takes a 1- to 3-dimensional array representing realisations of random variables in dimensions: processes, samples (over time), and replications. If necessary, data reshapes provided realisations to fit the format expected by IDTxl, which is a 3-dimensional array with axes representing (process index, sample index, replication index). Indicate the actual order of dimensions in the provided array in a three-character string, e.g. ‘spr’ for an array with realisations over (1) samples in time, (2) processes, (3) replications.
Example:
>>> data_mute = Data() # initialise empty data object >>> data_mute.generate_mute_data() # simulate data from MuTE paper >>> >>> # Create data objects with data of various sizes >>> d = np.arange(10000).reshape((2, 1000, 5)) # 2 procs., >>> data_1 = Data(d, dim_order='psr') # 1000 samples, 5 repl. >>> >>> d = np.arange(3000).reshape((3, 1000)) # 3 procs., >>> data_2 = Data(d, dim_order='ps') # 1000 samples >>> >>> # Overwrite data in existing object with random data >>> d = np.arange(5000) >>> data_2.set_data(data_new, 's')
- Note:
Realisations are stored as attribute ‘data’. This can only be set via the ‘set_data()’ method.
- Args:
- datanumpy array [optional]
1/2/3-dimensional array with raw data
- dim_orderstring [optional]
order of dimensions, accepts any combination of the characters ‘p’, ‘s’, and ‘r’ for processes, samples, and replications; must have the same length as the data dimensionality, e.g., ‘ps’ for a two-dimensional array of data from several processes over time (default=’psr’)
- normalisebool [optional]
if True, data gets normalised per process (default=True)
- seedint [optional]
can be set to a fixed integer to get repetitive results on the same data with multiple runs of analyses. Otherwise a random seed is set as default.
- Attributes:
- datanumpy array
realisations, can only be set via ‘set_data’ method
- n_processesint
number of processes
- n_replicationsint
number of replications
- n_samplesint
number of samples in time
- normalisebool
if true, all data gets z-standardised per process
- initial_statearray
initial state of the seed for shuffled permutations
- property data¶
Return data array.
- generate_logistic_maps_data(n_samples=1000, n_replications=10, coefficient_matrices=array([[[0.5, 0.], [0.4, 0.5]]]), noise_std=0.1)[source]¶
Generate discrete-time coupled-logistic-maps time series.
Generate data and overwrite the instance’s current data.
The implemented logistic map function is f(x) = 4 * x * (1 - x).
- Args:
- n_samplesint [optional]
number of samples simulated for each process and replication
- n_replicationsint [optional]
number of replications
- coefficient_matricesnumpy array [optional]
coefficient matrices: numpy array with dimensions (order, number of processes, number of processes). Each square coefficient matrix corresponds to a lag, starting from lag=1. The total number of provided matrices implicitly determines the order of the stochastic process. (default = np.array([[[0.5, 0], [0.4, 0.5]]]))
- noise_stdfloat [optional]
standard deviation of uncorrelated Gaussian noise (default = 0.1)
- generate_mute_data(n_samples=1000, n_replications=10)[source]¶
Generate example data for a 5-process network.
Generate example data and overwrite the instance’s current data. The network is used as an example the paper on the MuTE toolbox (Montalto, PLOS ONE, 2014, eq. 14) and was orginally proposed by Baccala & Sameshima (2001). The network consists of five auto-regressive (AR) processes with model orders 2 and the following (non-linear) couplings:
0 -> 1, u = 2 (non-linear) 0 -> 2, u = 3 0 -> 3, u = 2 (non-linear) 3 -> 4, u = 1 4 -> 3, u = 1
References:
Montalto, A., Faes, L., & Marinazzo, D. (2014) MuTE: A MATLAB toolbox to compare established and novel estimators of the multivariate transfer entropy. PLoS ONE 9(10): e109462. https://doi.org/10.1371/journal.pone.0109462
Baccala, L.A. & Sameshima, K. (2001). Partial directed coherence: a new concept in neural structure determination. Biol Cybern 84: 463–474. https://doi.org/10.1007/PL00007990
- Args:
- n_samplesint
number of samples simulated for each process and replication
- n_replicationsint
number of replications
- generate_var_data(n_samples=1000, n_replications=10, coefficient_matrices=array([[[0.5, 0.], [0.4, 0.5]]]), noise_std=0.1)[source]¶
Generate discrete-time VAR (vector auto-regressive) time series.
Generate data and overwrite the instance’s current data.
- Args:
- n_samplesint [optional]
number of samples simulated for each process and replication
- n_replicationsint [optional]
number of replications
- coefficient_matricesnumpy array [optional]
coefficient matrices: numpy array with dimensions (VAR order, number of processes, number of processes). Each square coefficient matrix corresponds to a lag, starting from lag=1. The total number of provided matrices implicitly determines the order of the VAR process. (default = np.array([[[0.5, 0], [0.4, 0.5]]]))
- noise_stdfloat [optional]
standard deviation of uncorrelated Gaussian noise (default = 0.1)
- get_realisations(current_value, idx_list, shuffle=False)[source]¶
Return realisations for a list of indices.
Return realisations for indices in list. Optionally, realisations can be shuffled to create surrogate data for statistical testing. For shuffling, data blocks are permuted over replications while their temporal order stays intact within replications:
- Original data:
repl. ind.
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
…
sample index
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
…
- Shuffled data:
repl. ind.
3 3 3 3
1 1 1 1
4 4 4 4
2 2 2 2
5 5 5 5
…
sample index
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
…
- Args:
- idx_list: list of tuples
variable indices
- current_valuetuple
index of the current value in current analysis, has to have the form (idx process, idx sample); if current_value == idx, all samples for a process are returned
- shuffle: bool
if true permute blocks of replications over trials
- Returns:
- numpy array
realisations with dimensions (no. samples * no.replications) x number of indices
- numpy array
replication index for each realisation with dimensions (no. samples * no.replications) x number of indices
- n_realisations(current_value=None)[source]¶
Number of realisations over samples and replications.
- Args:
- current_valuetuple [optional]
reference point for calculation of number of realisations (e.g. when using an embedding of length k, we count realisations from the k+1th sample because we loose the first k samples to the embedding); if no current_value is provided, the number of all samples is used
- n_realisations_samples(current_value=None)[source]¶
Number of realisations over samples.
- Args:
- current_valuetuple [optional]
reference point for calculation of number of realisations (e.g. when using an embedding of length k, the current value is at sample k + 1; we thus count realisations from the k + 1st sample because we loose the first k samples to the embedding)
- permute_replications(current_value, idx_list)[source]¶
Return realisations with permuted replications (time stays intact).
Create surrogate data by permuting realisations over replications while keeping the temporal structure (order of samples) intact. Return realisations for all indices in the list, where an index is expected to have the form (process index, sample index). Realisations are permuted block-wise by permuting the order of replications:
- Original data:
repl. ind.
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
…
sample index
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
…
- Permuted data:
repl. ind.
3 3 3 3
1 1 1 1
4 4 4 4
2 2 2 2
5 5 5 5
…
sample index
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
…
- Args:
- current_valuetuple
index of the current_value in the data
- idx_listlist of tuples
indices of variables
- Returns:
- numpy array
permuted realisations with dimensions replications x number of indices
- numpy array
replication index for each realisation
- Raises:
TypeError if idx_realisations is not a list
- permute_samples(current_value, idx_list, perm_settings)[source]¶
Return realisations with permuted samples (repl. stays intact).
Create surrogate data by permuting realisations over samples (time) while keeping the order of replications intact. Surrogates can be created for multiple variables in parallel, where variables are provided as a list of indices. An index is expected to have the form (process index, sample index).
Permuting samples in time is the fall-back option for surrogate data creation. The default method for surrogate data creation is the permutation of replications, while keeping the order of samples in time intact. If the number of replications is too small to allow for a sufficient number of permutations for the generation of surrogate data, permutation of samples in time is chosen instead.
Different permutation strategies can be chosen to permute realisations in time. Note that if data consists of multiple replications, within each replication, samples are shuffled following the same permutation pattern:
- Original data:
repl. ind.
1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3
…
sample index
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
…
- Circular shift by a random number of samples, e.g. 4 samples:
repl. ind.
1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3
…
sample index
5 6 7 8 1 2 3 4
5 6 7 8 1 2 3 4
5 6 7 8 1 2 3 4
…
- Permute blocks of 3 samples:
repl. ind.
1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3
…
sample index
4 5 6 7 8 1 2 3
4 5 6 7 8 1 2 3
4 5 6 7 8 1 2 3
…
- Permute data locally within a range of 4 samples:
repl. ind.
1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3
…
sample index
1 2 4 3 8 5 6 7
1 2 4 3 8 5 6 7
1 2 4 3 8 5 6 7
…
- Random permutation:
repl. ind.
1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3
…
sample index
4 2 5 7 1 3 2 6
4 2 5 7 1 3 2 6
4 2 5 7 1 3 2 6
…
- Args:
- current_valuetuple
index of the current_value in the data
- idx_listlist of tuples
indices of variables
- perm_settingsdict
settings specifying the allowed permutations:
perm_type : str permutation type, can be
‘random’: swaps samples at random,
‘circular’: shifts time series by a random number of samples
‘block’: swaps blocks of samples,
‘local’: swaps samples within a given range, or
additional settings depending on the perm_type (n is the number of samples):
if perm_type == ‘circular’:
- ‘max_shift’int
the maximum number of samples for shifting (e.g., number of samples / 2)
if perm_type == ‘block’:
- ‘block_size’int
no. samples per block (e.g., number of samples / 10)
- ‘perm_range’int
range in which blocks can be swapped (e.g., number of samples / block_size)
if perm_type == ‘local’:
- ‘perm_range’int
range in samples over which realisations can be permuted (e.g., number of samples / 10)
- Returns:
- numpy array
permuted realisations with dimensions replications x number of indices
- numpy array
sample index for each realisation
- Raises:
TypeError if idx_realisations is not a list
- Note:
This permutation scheme is the fall-back option if surrogate data can not be created by shuffling replications because the number of replications is too small to generate the requested number of permutations.
- set_data(data, dim_order)[source]¶
Overwrite data in an existing Data object.
- Args:
- datanumpy array
1- to 3-dimensional array of realisations
- dim_orderstring
order of dimensions, accepts any combination of the characters ‘p’, ‘s’, and ‘r’ for processes, samples, and replications; must have the same length as number of dimensions in data
- slice_permute_replications(process)[source]¶
Return data slice with permuted replications (time stays intact).
Create surrogate data by permuting realisations over replications while keeping the temporal structure (order of samples) intact. Return realisations for all indices in the list, where an index is expected to have the form (process index, sample index). Realisations are permuted block-wise by permuting the order of replications
- slice_permute_samples(process, perm_settings)[source]¶
Return slice of data with permuted samples (repl. stays intact).
Create surrogate data by permuting data in a slice over samples (time) while keeping the order of replications intact. Return slice for the entry specified by ‘process’. Realisations are permuted according to the settings specified in perm_settings:
- Original data:
repl. ind.
1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3
…
sample index
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
…
- Circular shift by 2, 6, and 4 samples:
repl. ind.
1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3
…
sample index
7 8 1 2 3 4 5 6
3 4 5 6 7 8 1 2
5 6 7 8 1 2 3 4
…
- Permute blocks of 3 samples:
repl. ind.
1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3
…
sample index
4 5 6 7 8 1 2 3
1 2 3 7 8 4 5 6
7 8 4 5 6 1 2 3
…
- Permute data locally within a range of 4 samples:
repl. ind.
1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3
…
sample index
1 2 4 3 8 5 6 7
4 1 2 3 5 7 8 6
3 1 2 4 8 5 6 7
…
- Random permutation:
repl. ind.
1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3
…
sample index
4 2 5 7 1 3 2 6
7 5 3 4 2 1 8 5
1 2 4 3 6 8 7 5
…
Permuting samples is the fall-back option for surrogate creation if the number of replications is too small to allow for a sufficient number of permutations for the generation of surrogate data.
- Args:
- processint
process for which to return data slice
- perm_settingsdict
settings specifying the allowed permutations:
perm_type : str permutation type, can be
‘circular’: shifts time series by a random number of samples
‘block’: swaps blocks of samples,
‘local’: swaps samples within a given range, or
‘random’: swaps samples at random,
additional settings depending on the perm_type (n is the number of samples):
if perm_type == ‘circular’:
- ‘max_shift’int
the maximum number of samples for shifting (default=n/2)
if perm_type == ‘block’:
- ‘block_size’int
no. samples per block (default=n/10)
- ‘perm_range’int
range in which blocks can be swapped (default=max)
if perm_type == ‘local’:
- ‘perm_range’int
range in samples over which realisations can be permuted (default=n/10)
- Returns:
- numpy array
data slice with data permuted over samples with dimensions samples x number of replications
- numpy array
index of permuted samples
- Note:
This permutation scheme is the fall-back option if the number of replications is too small to allow a sufficient number of permutations for the generation of surrogate data.
idtxl.bivariate_te module¶
Perform network inference using multivarate transfer entropy.
Estimate multivariate transfer entropy (TE) for network inference using a greedy approach with maximum statistics to generate a non-uniform embedding (Faes, 2011; Lizier, 2012).
- Note:
Written for Python 3.4+
- class idtxl.bivariate_te.BivariateTE[source]¶
Bases:
idtxl.network_inference.NetworkInferenceTE
,idtxl.network_inference.NetworkInferenceBivariate
Perform network inference using bivariate transfer entropy.
Perform network inference using bivariate transfer entropy (TE). To perform network inference call analyse_network() on the whole network or a set of nodes or call analyse_single_target() to estimate TE for a single target. See docstrings of the two functions for more information.
References:
Schreiber, T. (2000). Measuring Information Transfer. Phys Rev Lett, 85(2), 461–464. http://doi.org/10.1103/PhysRevLett.85.461
Vicente, R., Wibral, M., Lindner, M., & Pipa, G. (2011). Transfer entropy-a model-free measure of effective connectivity for the neurosciences. J Comp Neurosci, 30(1), 45–67. http://doi.org/10.1007/s10827-010-0262-3
Lizier, J. T., & Rubinov, M. (2012). Multivariate construction of effective computational networks from observational data. Max Planck Institute: Preprint. Retrieved from http://www.mis.mpg.de/preprints/2012/preprint2012_25.pdf
Faes, L., Nollo, G., & Porta, A. (2011). Information-based detection of nonlinear Granger causality in multivariate processes via a nonuniform embedding technique. Phys Rev E, 83, 1–15. http://doi.org/10.1103/PhysRevE.83.051112
- Attributes:
- source_setlist
indices of source processes tested for their influence on the target
- targetlist
index of target process
- settingsdict
analysis settings
- current_valuetuple
index of the current value in TE estimation, (idx process, idx sample)
- selected_vars_fulllist of tuples
samples in the full conditional set, (idx process, idx sample)
- selected_vars_sourceslist of tuples
source samples in the conditional set, (idx process, idx sample)
- selected_vars_targetlist of tuples
target samples in the conditional set, (idx process, idx sample)
- pvalue_omnibusfloat
p-value of the omnibus test
- pvalues_sign_sourcesnumpy array
array of p-values for TE from individual sources to the target
- statistic_omnibusfloat
joint TE from all sources to the target
- statistic_sign_sourcesnumpy array
raw TE values from individual sources to the target
- sign_ominbusbool
statistical significance of the over-all TE
- analyse_network(settings, data, targets='all', sources='all')[source]¶
Find bivariate transfer entropy between all nodes in the network.
Estimate bivariate transfer entropy (TE) between all nodes in the network or between selected sources and targets.
- Note:
For a detailed description of the algorithm and settings see documentation of the analyse_single_target() method and references in the class docstring.
Example:
>>> data = Data() >>> data.generate_mute_data(100, 5) >>> settings = { >>> 'cmi_estimator': 'JidtKraskovCMI', >>> 'n_perm_max_stat': 200, >>> 'n_perm_min_stat': 200, >>> 'n_perm_omnibus': 500, >>> 'n_perm_max_seq': 500, >>> 'max_lag': 5, >>> 'min_lag': 4 >>> } >>> network_analysis = BivariateTE() >>> results = network_analysis.analyse_network(settings, data)
- Args:
- settingsdict
parameters for estimation and statistical testing, see documentation of analyse_single_target() for details, settings can further contain
verbose : bool [optional] - toggle console output (default=True)
- dataData instance
raw data for analysis
- targetslist of int | ‘all’ [optional]
index of target processes (default=’all’)
- sourceslist of int | list of list | ‘all’ [optional]
indices of source processes for each target (default=’all’); if ‘all’, all network nodes excluding the target node are considered as potential sources and tested; if list of int, the source specified by each int is tested as a potential source for the target with the same index or a single target; if list of list, sources specified in each inner list are tested for the target with the same index
- Returns:
- ResultsNetworkInference instance
results of network inference, see documentation of ResultsNetworkInference()
- analyse_single_target(settings, data, target, sources='all')[source]¶
Find bivariate transfer entropy between sources and a target.
Find bivariate transfer entropy (TE) between all potential source processes and the target process. Uses bivariate, non-uniform embedding found through information maximisation.
Bivariate TE is calculated in four steps:
find all relevant variables in the target processes’ own past, by iteratively adding candidate variables that have significant conditional mutual information (CMI) with the current value (conditional on all variables that were added previously)
find all relevant variables in the single source processes’ pasts (again by finding all candidates with significant CMI); treat each potential source process separately, i.e., the CMI is calculated with respect to already selected variables from the target’s past and from the current processes’ past only
prune the final conditional set for each link (i.e., each process-target pairing): test the CMI between each variable in the final set and the current value, conditional on all other variables in the final set of the current link
statistics on the final set of sources (test for over-all transfer between the final conditional set and the current value, and for significant transfer of all individual variables in the set)
- Note:
For a further description of the algorithm see references in the class docstring.
Example:
>>> data = Data() >>> data.generate_mute_data(100, 5) >>> settings = { >>> 'cmi_estimator': 'JidtKraskovCMI', >>> 'n_perm_max_stat': 200, >>> 'n_perm_min_stat': 200, >>> 'n_perm_omnibus': 500, >>> 'n_perm_max_seq': 500, >>> 'max_lag': 5, >>> 'min_lag': 4 >>> } >>> target = 0 >>> sources = [1, 2, 3] >>> network_analysis = BivariateTE() >>> results = network_analysis.analyse_single_target(settings, >>> data, target, >>> sources)
- Args:
- settingsdict
parameters for estimation and statistical testing:
cmi_estimator : str - estimator to be used for CMI calculation (for estimator settings see the documentation in the estimators_* modules)
max_lag_sources : int - maximum temporal search depth for candidates in the sources’ past in samples
min_lag_sources : int - minimum temporal search depth for candidates in the sources’ past in samples
max_lag_target : int [optional] - maximum temporal search depth for candidates in the target’s past in samples (default=same as max_lag_sources)
tau_sources : int [optional] - spacing between candidates in the sources’ past in samples (default=1)
tau_target : int [optional] - spacing between candidates in the target’s past in samples (default=1)
n_perm_* : int - number of permutations, where * can be ‘max_stat’, ‘min_stat’, ‘omnibus’, and ‘max_seq’ (default=500)
alpha_* : float - critical alpha level for statistical significance, where * can be ‘max_stats’, ‘min_stats’, and ‘omnibus’ (default=0.05)
add_conditionals : list of tuples | str [optional] - force the estimator to add these conditionals when estimating TE; can either be a list of variables, where each variable is described as (idx process, lag wrt to current value) or can be a string: ‘faes’ for Faes-Method (see references)
permute_in_time : bool [optional] - force surrogate creation by shuffling realisations in time instead of shuffling replications; see documentation of Data.permute_samples() for further settings (default=False)
verbose : bool [optional] - toggle console output (default=True)
write_ckp : bool [optional] - enable checkpointing, writes analysis state to disk every time a variable is selected; resume crashed analysis using network_analysis.resume_checkpoint() (default=False)
filename_ckp : str [optional] - checkpoint file name (without extension) (default=’./idtxl_checkpoint’)
- dataData instance
raw data for analysis
- targetint
index of target process
- sourceslist of int | int | ‘all’ [optional]
single index or list of indices of source processes (default=’all’), if ‘all’, all network nodes excluding the target node are considered as potential sources
- Returns:
- ResultsNetworkInference instance
results of network inference, see documentation of ResultsNetworkInference()
idtxl.bivariate_mi module¶
Perform network inference using bivarate mutual information.
Estimate bivariate mutual information (MI) for network inference using a greedy approach with maximum statistics to generate a non-uniform embedding (Faes, 2011; Lizier, 2012).
- Note:
Written for Python 3.4+
- class idtxl.bivariate_mi.BivariateMI[source]¶
Bases:
idtxl.network_inference.NetworkInferenceMI
,idtxl.network_inference.NetworkInferenceBivariate
Perform network inference using bivariate mutual information.
Perform network inference using bivariate mutual information (MI). To perform network inference call analyse_network() on the whole network or a set of nodes or call analyse_single_target() to estimate MI for a single target. See docstrings of the two functions for more information.
References:
Lizier, J. T., & Rubinov, M. (2012). Multivariate construction of effective computational networks from observational data. Max Planck Institute: Preprint. Retrieved from http://www.mis.mpg.de/preprints/2012/preprint2012_25.pdf
Faes, L., Nollo, G., & Porta, A. (2011). Information-based detection of nonlinear Granger causality in multivariate processes via a nonuniform embedding technique. Phys Rev E, 83, 1–15. http://doi.org/10.1103/PhysRevE.83.051112
- Attributes:
- source_setlist
indices of source processes tested for their influence on the target
- targetlist
index of target process
- settingsdict
analysis settings
- current_valuetuple
index of the current value in MI estimation, (idx process, idx sample)
- selected_vars_fulllist of tuples
samples in the full conditional set, (idx process, idx sample)
- selected_vars_sourceslist of tuples
source samples in the conditional set, (idx process, idx sample)
- selected_vars_targetlist of tuples
target samples in the conditional set, (idx process, idx sample)
- pvalue_omnibusfloat
p-value of the omnibus test
- pvalues_sign_sourcesnumpy array
array of p-values for MI from individual sources to the target
- mi_omnibusfloat
joint MI from all sources to the target
- mi_sign_sourcesnumpy array
raw MI values from individual sources to the target
- sign_ominbusbool
statistical significance of the over-all MI
- analyse_network(settings, data, targets='all', sources='all')[source]¶
Find bivariate mutual information between all nodes in the network.
Estimate bivariate mutual information (MI) between all nodes in the network or between selected sources and targets.
- Note:
For a detailed description of the algorithm and settings see documentation of the analyse_single_target() method and references in the class docstring.
Example:
>>> data = Data() >>> data.generate_mute_data(100, 5) >>> # The algorithm uses a conditional mutual information to >>> # construct a non-uniform embedding, hence a CMI- not MI- >>> # estimator has to be specified: >>> settings = { >>> 'cmi_estimator': 'JidtKraskovCMI', >>> 'n_perm_max_stat': 200, >>> 'n_perm_min_stat': 200, >>> 'n_perm_omnibus': 500, >>> 'n_perm_max_seq': 500, >>> 'max_lag': 5, >>> 'min_lag': 4 >>> } >>> network_analysis = BivariateMI() >>> results = network_analysis.analyse_network(settings, data)
- Args:
- settingsdict
parameters for estimation and statistical testing, see documentation of analyse_single_target() for details, settings can further contain
verbose : bool [optional] - toggle console output (default=True)
- dataData instance
raw data for analysis
- targetslist of int | ‘all’ [optional]
index of target processes (default=’all’)
- sourceslist of int | list of list | ‘all’ [optional]
indices of source processes for each target (default=’all’); if ‘all’, all network nodes excluding the target node are considered as potential sources and tested; if list of int, the source specified by each int is tested as a potential source for the target with the same index or a single target; if list of list, sources specified in each inner list are tested for the target with the same index
- Returns:
- dict
results for each target, see documentation of analyse_single_target()
- analyse_single_target(settings, data, target, sources='all')[source]¶
Find bivariate mutual information between sources and a target.
Find bivariate mutual information (MI) between all potential source processes and the target process. Uses bivariate, non-uniform embedding found through information maximisation
MI is calculated in three steps:
find all relevant variables in a single source processes’ past, by iteratively adding candidate variables that have significant conditional mutual information (CMI) with the current value (conditional on all variables that were added previously)
prune the final conditional set for each link (i.e., each process-target pairing): test the CMI between each variable in the final set and the current value, conditional on all other variables in the final set of the current link; treat each potential source process separately, i.e., the CMI is calculated with respect to already selected variables the current processes’ past only
statistics on the final set of sources (test for over-all transfer between the final conditional set and the current value, and for significant transfer of all individual variables in the set)
- Note:
For a further description of the algorithm see references in the class docstring.
Example:
>>> data = Data() >>> data.generate_mute_data(100, 5) >>> # The algorithm uses a conditional mutual information to >>> # construct a non-uniform embedding, hence a CMI- not MI- >>> # estimator has to be specified: >>> settings = { >>> 'cmi_estimator': 'JidtKraskovCMI', >>> 'n_perm_max_stat': 200, >>> 'n_perm_min_stat': 200, >>> 'n_perm_omnibus': 500, >>> 'n_perm_max_seq': 500, >>> 'max_lag': 5, >>> 'min_lag': 4 >>> } >>> target = 0 >>> sources = [1, 2, 3] >>> network_analysis = BivariateMI() >>> results = network_analysis.analyse_single_target(settings, >>> data, target, >>> sources)
- Args:
- settingsdict
parameters for estimation and statistical testing:
cmi_estimator : str - estimator to be used for CMI calculation (for estimator settings see the documentation in the estimators_* modules)
max_lag_sources : int - maximum temporal search depth for candidates in the sources’ past in samples
min_lag_sources : int - minimum temporal search depth for candidates in the sources’ past in samples
tau_sources : int [optional] - spacing between candidates in the sources’ past in samples (default=1)
n_perm_* : int - number of permutations, where * can be ‘max_stat’, ‘min_stat’, ‘omnibus’, and ‘max_seq’ (default=500)
alpha_* : float - critical alpha level for statistical significance, where * can be ‘max_stats’, ‘min_stats’, and ‘omnibus’ (default=0.05)
add_conditionals : list of tuples | str [optional] - force the estimator to add these conditionals when estimating MI; can either be a list of variables, where each variable is described as (idx process, lag wrt to current value) or can be a string: ‘faes’ for Faes-Method (see references)
permute_in_time : bool [optional] - force surrogate creation by shuffling realisations in time instead of shuffling replications; see documentation of Data.permute_samples() for further settings (default=False)
verbose : bool [optional] - toggle console output (default=True)
write_ckp : bool [optional] - enable checkpointing, writes analysis state to disk every time a variable is selected; resume crashed analysis using network_analysis.resume_checkpoint() (default=False)
filename_ckp : str [optional] - checkpoint file name (without extension) (default=’./idtxl_checkpoint’)
- dataData instance
raw data for analysis
- targetint
index of target process
- sourceslist of int | int | ‘all’ [optional]
single index or list of indices of source processes (default=’all’), if ‘all’, all network nodes excluding the target node are considered as potential sources
- Returns:
- dict
results consisting of sets of selected variables as (full set, variables from the sources’ past), pvalues and MI for each selected variable, the current value for this analysis, results for omnibus test (joint MI between all selected source variables and the target, omnibus MI, p-value, and significance); NOTE that all variables are listed as tuples (process, lag wrt. current value)
idtxl.bivariate_pid module¶
Estimate partial information decomposition (PID).
Estimate PID for two source and one target process using different estimators.
- Note:
Written for Python 3.4+
- class idtxl.bivariate_pid.BivariatePID[source]¶
Bases:
idtxl.single_process_analysis.SingleProcessAnalysis
Perform partial information decomposition for individual processes.
Perform partial information decomposition (PID) for two source processes and one target process in the network. Estimate unique, shared, and synergistic information in the two sources about the target. Call analyse_network() on the whole network or a set of nodes or call analyse_single_target() to estimate PID for a single process. See docstrings of the two functions for more information.
References:
Williams, P. L., & Beer, R. D. (2010). Nonnegative Decomposition of Multivariate Information, 1–14. Retrieved from http://arxiv.org/abs/1004.2515
Bertschinger, N., Rauh, J., Olbrich, E., Jost, J., & Ay, N. (2014). Quantifying Unique Information. Entropy, 16(4), 2161–2183. http://doi.org/10.3390/e16042161
- Attributes:
- targetint
index of target process
- sourcesarray type
pair of indices of source processes
- settingsdict
analysis settings
- resultsdict
estimated PID
- analyse_network(settings, data, targets, sources)[source]¶
Estimate partial information decomposition for network nodes.
Estimate partial information decomposition (PID) for multiple nodes in the network.
- Note:
For a detailed description of the algorithm and settings see documentation of the analyse_single_target() method and references in the class docstring.
Example:
>>> n = 20 >>> alph = 2 >>> x = np.random.randint(0, alph, n) >>> y = np.random.randint(0, alph, n) >>> z = np.logical_xor(x, y).astype(int) >>> data = Data(np.vstack((x, y, z)), 'ps', normalise=False) >>> settings = { >>> 'lags_pid': [[1, 1], [3, 2], [0, 0]], >>> 'alpha': 0.1, >>> 'alph_s1': alph, >>> 'alph_s2': alph, >>> 'alph_t': alph, >>> 'max_unsuc_swaps_row_parm': 60, >>> 'num_reps': 63, >>> 'max_iters': 1000, >>> 'pid_estimator': 'SydneyPID'} >>> targets = [0, 1, 2] >>> sources = [[1, 2], [0, 2], [0, 1]] >>> pid_analysis = BivariatePID() >>> results = pid_analysis.analyse_network(settings, data, targets, >>> sources)
- Args:
- settingsdict
parameters for estimation and statistical testing, see documentation of analyse_single_target() for details, can contain
lags_pid : list of lists of ints [optional] - lags in samples between sources and target (default=[[1, 1], [1, 1] …])
- dataData instance
raw data for analysis
- targetslist of int
index of target processes
- sourceslist of lists
indices of the two source processes for each target, e.g., [[0, 2], [1, 0]], must have the same length as targets
- Returns:
- ResultsPID instance
results of network inference, see documentation of ResultsPID()
- analyse_single_target(settings, data, target, sources)[source]¶
Estimate partial information decomposition for a network node.
Estimate partial information decomposition (PID) for a target node in the network.
- Note:
For a description of the algorithm and the method see references in the class and estimator docstrings.
Example:
>>> n = 20 >>> alph = 2 >>> x = np.random.randint(0, alph, n) >>> y = np.random.randint(0, alph, n) >>> z = np.logical_xor(x, y).astype(int) >>> data = Data(np.vstack((x, y, z)), 'ps', normalise=False) >>> settings = { >>> 'alpha': 0.1, >>> 'alph_s1': alph, >>> 'alph_s2': alph, >>> 'alph_t': alph, >>> 'max_unsuc_swaps_row_parm': 60, >>> 'num_reps': 63, >>> 'max_iters': 1000, >>> 'pid_calc_name': 'SydneyPID', >>> 'lags_pid': [2, 3]} >>> pid_analysis = BivariatePID() >>> results = pid_analysis.analyse_single_target(settings=settings, >>> data=data, >>> target=0, >>> sources=[1, 2])
Args: settings : dict parameters for estimator use and statistics:
pid_estimator : str - estimator to be used for PID estimation (for estimator settings see the documentation in the estimators_pid modules)
lags_pid : list of ints [optional] - lags in samples between sources and target (default=[1, 1])
verbose : bool [optional] - toggle console output (default=True)
- dataData instance
raw data for analysis
- targetint
index of target processes
- sourceslist of ints
indices of the two source processes for the target
- Returns: ResultsPID instance results of
network inference, see documentation of ResultsPID()
idtxl.multivariate_te module¶
Perform network inference using multivarate transfer entropy.
Estimate multivariate transfer entropy (TE) for network inference using a greedy approach with maximum statistics to generate a non-uniform embedding (Faes, 2011; Lizier, 2012).
- Note:
Written for Python 3.4+
- class idtxl.multivariate_te.MultivariateTE[source]¶
Bases:
idtxl.network_inference.NetworkInferenceTE
,idtxl.network_inference.NetworkInferenceMultivariate
Perform network inference using multivariate transfer entropy.
Perform network inference using multivariate transfer entropy (TE). To perform network inference call analyse_network() on the whole network or a set of nodes or call analyse_single_target() to estimate TE for a single target. See docstrings of the two functions for more information.
References:
Schreiber, T. (2000). Measuring Information Transfer. Phys Rev Lett, 85(2), 461–464. http://doi.org/10.1103/PhysRevLett.85.461
Vicente, R., Wibral, M., Lindner, M., & Pipa, G. (2011). Transfer entropy-a model-free measure of effective connectivity for the neurosciences. J Comp Neurosci, 30(1), 45–67. http://doi.org/10.1007/s10827-010-0262-3
Lizier, J. T., & Rubinov, M. (2012). Multivariate construction of effective computational networks from observational data. Max Planck Institute: Preprint. Retrieved from http://www.mis.mpg.de/preprints/2012/preprint2012_25.pdf
Faes, L., Nollo, G., & Porta, A. (2011). Information-based detection of nonlinear Granger causality in multivariate processes via a nonuniform embedding technique. Phys Rev E, 83, 1–15. http://doi.org/10.1103/PhysRevE.83.051112
- Attributes:
- source_setlist
indices of source processes tested for their influence on the target
- targetlist
index of target process
- settingsdict
analysis settings
- current_valuetuple
index of the current value in TE estimation, (idx process, idx sample)
- selected_vars_fulllist of tuples
samples in the full conditional set, (idx process, idx sample)
- selected_vars_sourceslist of tuples
source samples in the conditional set, (idx process, idx sample)
- selected_vars_targetlist of tuples
target samples in the conditional set, (idx process, idx sample)
- pvalue_omnibusfloat
p-value of the omnibus test
- pvalues_sign_sourcesnumpy array
array of p-values for TE from individual sources to the target
- statistic_omnibusfloat
joint TE from all sources to the target
- statistic_sign_sourcesnumpy array
raw TE values from individual sources to the target
- sign_ominbusbool
statistical significance of the over-all TE
- analyse_network(settings, data, targets='all', sources='all')[source]¶
Find multivariate transfer entropy between all nodes in the network.
Estimate multivariate transfer entropy (TE) between all nodes in the network or between selected sources and targets.
- Note:
For a detailed description of the algorithm and settings see documentation of the analyse_single_target() method and references in the class docstring.
- Example:
>>> data = Data() >>> data.generate_mute_data(100, 5) >>> settings = { >>> 'cmi_estimator': 'JidtKraskovCMI', >>> 'n_perm_max_stat': 200, >>> 'n_perm_min_stat': 200, >>> 'n_perm_omnibus': 500, >>> 'n_perm_max_seq': 500, >>> 'max_lag_sources': 5, >>> 'min_lag_sources': 2 >>> } >>> network_analysis = MultivariateTE() >>> results = network_analysis.analyse_network(settings, data)
- Args:
- settingsdict
parameters for estimation and statistical testing, see documentation of analyse_single_target() for details, settings can further contain
verbose : bool [optional] - toggle console output (default=True)
fdr_correction : bool [optional] - correct results on the network level, see documentation of stats.network_fdr() for details (default=True)
- dataData instance
raw data for analysis
- targetslist of int | ‘all’ [optional]
index of target processes (default=’all’)
- sourceslist of int | list of list | ‘all’ [optional]
indices of source processes for each target (default=’all’); if ‘all’, all network nodes excluding the target node are considered as potential sources and tested; if list of int, the source specified by each int is tested as a potential source for the target with the same index or a single target; if list of list, sources specified in each inner list are tested for the target with the same index
- Returns:
- ResultsNetworkInference instance
results of network inference, see documentation of ResultsNetworkInference()
- analyse_single_target(settings, data, target, sources='all')[source]¶
Find multivariate transfer entropy between sources and a target.
Find multivariate transfer entropy (TE) between all source processes and the target process. Uses multivariate, non-uniform embedding found through information maximisation. Multivariate TE is calculated in four steps:
find all relevant variables in the target processes’ own past, by iteratively adding candidate variables that have significant conditional mutual information (CMI) with the current value (conditional on all variables that were added previously)
find all relevant variables in the source processes’ pasts (again by finding all candidates with significant CMI)
prune the final conditional set by testing the CMI between each variable in the final set and the current value, conditional on all other variables in the final set
statistics on the final set of sources (test for over-all transfer between the final conditional set and the current value, and for significant transfer of all individual variables in the set)
- Note:
For a further description of the algorithm see references in the class docstring.
Example:
>>> data = Data() >>> data.generate_mute_data(100, 5) >>> settings = { >>> 'cmi_estimator': 'JidtKraskovCMI', >>> 'n_perm_max_stat': 200, >>> 'n_perm_min_stat': 200, >>> 'n_perm_omnibus': 500, >>> 'n_perm_max_seq': 500, >>> 'max_lag_sources': 5, >>> 'min_lag_sources': 2 >>> } >>> target = 0 >>> sources = [1, 2, 3] >>> network_analysis = MultivariateTE() >>> results = network_analysis.analyse_single_target(settings, >>> data, target, >>> sources)
- Args:
- settingsdict
parameters for estimation and statistical testing:
cmi_estimator : str - estimator to be used for CMI calculation (for estimator settings see the documentation in the estimators_* modules)
max_lag_sources : int - maximum temporal search depth for candidates in the sources’ past in samples
min_lag_sources : int - minimum temporal search depth for candidates in the sources’ past in samples
max_lag_target : int [optional] - maximum temporal search depth for candidates in the target’s past in samples (default=same as max_lag_sources)
tau_sources : int [optional] - spacing between candidates in the sources’ past in samples (default=1)
tau_target : int [optional] - spacing between candidates in the target’s past in samples (default=1)
n_perm_* : int [optional] - number of permutations, where * can be ‘max_stat’, ‘min_stat’, ‘omnibus’, and ‘max_seq’ (default=500)
alpha_* : float [optional] - critical alpha level for statistical significance, where * can be ‘max_stats’, ‘min_stats’, ‘omnibus’, and ‘max_seq’ (default=0.05)
add_conditionals : list of tuples | str [optional] - force the estimator to add these conditionals when estimating TE; can either be a list of variables, where each variable is described as (idx process, lag wrt to current value) or can be a string: ‘faes’ for Faes-Method (see references)
permute_in_time : bool [optional] - force surrogate creation by shuffling realisations in time instead of shuffling replications; see documentation of Data.permute_samples() for further settings (default=False)
verbose : bool [optional] - toggle console output (default=True)
write_ckp : bool [optional] - enable checkpointing, writes analysis state to disk every time a variable is selected; resume crashed analysis using network_analysis.resume_checkpoint() (default=False)
filename_ckp : str [optional] - checkpoint file name (without extension) (default=’./idtxl_checkpoint’)
- dataData instance
raw data for analysis
- targetint
index of target process
- sourceslist of int | int | ‘all’ [optional]
single index or list of indices of source processes (default=’all’), if ‘all’, all network nodes excluding the target node are considered as potential sources
- Returns:
- ResultsNetworkInference instance
results of network inference, see documentation of ResultsNetworkInference()
idtxl.multivariate_mi module¶
Perform network inference using multivarate mutual information.
Estimate multivariate mutual information (MI) for network inference using a greedy approach with maximum statistics to generate a non-uniform embedding (Faes, 2011; Lizier, 2012).
- Note:
Written for Python 3.4+
- class idtxl.multivariate_mi.MultivariateMI[source]¶
Bases:
idtxl.network_inference.NetworkInferenceMI
,idtxl.network_inference.NetworkInferenceMultivariate
Perform network inference using multivariate mutual information.
Perform network inference using multivariate mutual information (MI). To perform network inference call analyse_network() on the whole network or a set of nodes or call analyse_single_target() to estimate MI for a single target. See docstrings of the two functions for more information.
References:
Lizier, J. T., & Rubinov, M. (2012). Multivariate construction of effective computational networks from observational data. Max Planck Institute: Preprint. Retrieved from http://www.mis.mpg.de/preprints/2012/preprint2012_25.pdf
Faes, L., Nollo, G., & Porta, A. (2011). Information-based detection of nonlinear Granger causality in multivariate processes via a nonuniform embedding technique. Phys Rev E, 83, 1–15. http://doi.org/10.1103/PhysRevE.83.051112
- Attributes:
- source_setlist
indices of source processes tested for their influence on the target
- targetlist
index of target process
- settingsdict
analysis settings
- current_valuetuple
index of the current value in MI estimation, (idx process, idx sample)
- selected_vars_fulllist of tuples
samples in the full conditional set, (idx process, idx sample)
- selected_vars_sourceslist of tuples
source samples in the conditional set, (idx process, idx sample)
- pvalue_omnibusfloat
p-value of the omnibus test
- pvalues_sign_sourcesnumpy array
array of p-values for MI from individual sources to the target
- mi_omnibusfloat
joint MI from all sources to the target
- mi_sign_sourcesnumpy array
raw MI values from individual sources to the target
- sign_ominbusbool
statistical significance of the over-all MI
- analyse_network(settings, data, targets='all', sources='all')[source]¶
Find multivariate mutual information between nodes in the network.
Estimate multivariate mutual information (MI) between all nodes in the network or between selected sources and targets.
- Note:
For a detailed description of the algorithm and settings see documentation of the analyse_single_target() method and references in the class docstring.
Example:
>>> data = Data() >>> data.generate_mute_data(100, 5) >>> # The algorithm uses a conditional mutual information to >>> # construct a non-uniform embedding, hence a CMI- not MI- >>> # estimator has to be specified: >>> settings = { >>> 'cmi_estimator': 'JidtKraskovCMI', >>> 'n_perm_max_stat': 200, >>> 'n_perm_min_stat': 200, >>> 'n_perm_omnibus': 500, >>> 'n_perm_max_seq': 500, >>> 'max_lag_sources': 5, >>> 'min_lag_sources': 2 >>> } >>> network_analysis = MultivariateMI() >>> results = network_analysis.analyse_network(settings, data)
- Args:
- settingsdict
parameters for estimation and statistical testing, see documentation of analyse_single_target() for details, settings can further contain
verbose : bool [optional] - toggle console output (default=True)
fdr_correction : bool [optional] - correct results on the network level, see documentation of stats.network_fdr() for details (default=True)
- dataData instance
raw data for analysis
- targetslist of int | ‘all’ [optional]
index of target processes (default=’all’)
- sourceslist of int | list of list | ‘all’ [optional]
indices of source processes for each target (default=’all’); if ‘all’, all network nodes excluding the target node are considered as potential sources and tested; if list of int, the source specified by each int is tested as a potential source for the target with the same index or a single target; if list of list, sources specified in each inner list are tested for the target with the same index
- Returns:
- dict
results for each target, see documentation of analyse_single_target(); results FDR-corrected, see documentation of stats.network_fdr()
- analyse_single_target(settings, data, target, sources='all')[source]¶
Find multivariate mutual information between sources and a target.
Find multivariate mutual information (MI) between all source processes and the target process. Uses multivariate, non-uniform embedding found through information maximisation .
Multivariate MI is calculated in four steps (see Lizier and Faes for details):
- Note:
For a further description of the algorithm see references in the class docstring.
Find all relevant samples in the source processes’ past, by iteratively adding candidate samples that have significant conditional mutual information (CMI) with the current value (conditional on all samples that were added previously)
Prune the final conditional set by testing the CMI between each sample in the final set and the current value, conditional on all other samples in the final set
Statistics on the final set of sources (test for over-all transfer between the final conditional set and the current value, and for significant transfer of all individual samples in the set)
Example:
>>> data = Data() >>> data.generate_mute_data(100, 5) >>> # The algorithm uses a conditional mutual information to >>> # construct a non-uniform embedding, hence a CMI- not MI- >>> # estimator has to be specified: >>> settings = { >>> 'cmi_estimator': 'JidtKraskovCMI', >>> 'n_perm_max_stat': 200, >>> 'n_perm_min_stat': 200, >>> 'n_perm_omnibus': 500, >>> 'n_perm_max_seq': 500, >>> 'max_lag_sources': 5, >>> 'min_lag_sources': 2 >>> } >>> target = 0 >>> sources = [1, 2, 3] >>> network_analysis = MultivariateMI() >>> results = network_analysis.analyse_single_target(settings, >>> data, target, >>> sources)
- Args:
- settingsdict
parameters for estimation and statistical testing:
cmi_estimator : str - estimator to be used for CMI calculation (for estimator settings see the documentation in the estimators_* modules)
max_lag_sources : int - maximum temporal search depth for candidates in the sources’ past in samples
min_lag_sources : int - minimum temporal search depth for candidates in the sources’ past in samples
tau_sources : int [optional] - spacing between candidates in the sources’ past in samples (default=1)
n_perm_* : int [optional] - number of permutations, where * can be ‘max_stat’, ‘min_stat’, ‘omnibus’, and ‘max_seq’ (default=500)
alpha_* : float [optional] - critical alpha level for statistical significance, where * can be ‘max_stats’, ‘min_stats’, ‘omnibus’, and ‘max_seq’ (default=0.05)
add_conditionals : list of tuples | str [optional] - force the estimator to add these conditionals when estimating MI; can either be a list of variables, where each variable is described as (idx process, lag wrt to current value) or can be a string: ‘faes’ for Faes-Method (see references)
permute_in_time : bool [optional] - force surrogate creation by shuffling realisations in time instead of shuffling replications; see documentation of Data.permute_samples() for further settings (default=False)
verbose : bool [optional] - toggle console output (default=True)
write_ckp : bool [optional] - enable checkpointing, writes analysis state to disk every time a variable is selected; resume crashed analysis using network_analysis.resume_checkpoint() (default=False)
filename_ckp : str [optional] - checkpoint file name (without extension) (default=’./idtxl_checkpoint’)
- dataData instance
raw data for analysis
- targetint
index of target process
- sourceslist of int | int | ‘all’ [optional]
single index or list of indices of source processes (default=’all’), if ‘all’, all network nodes excluding the target node are considered as potential sources
- Returns:
- dict
results consisting of sets of selected variables as (full set, variables from the sources’ past), pvalues and MI for each selected variable, the current value for this analysis, results for omnibus test (joint MI between all selected source variables and the target, omnibus MI, p-value, and significance); NOTE that all variables are listed as tuples (process, lag wrt. current value)
idtxl.multivariate_pid module¶
Estimate partial information decomposition (PID).
Estimate PID for multiple sources (up to 4 sources) and one target process using SxPID estimator.
- Note:
Written for Python 3.4+
- class idtxl.multivariate_pid.MultivariatePID[source]¶
Bases:
idtxl.single_process_analysis.SingleProcessAnalysis
Perform partial information decomposition for individual processes.
Perform partial information decomposition (PID) for multiple source processes (up to 4 sources) and a target process in the network. Estimate unique, shared, and synergistic information in the multiple sources about the target. Call analyse_network() on the whole network or a set of nodes or call analyse_single_target() to estimate PID for a single process. See docstrings of the two functions for more information.
References:
Williams, P. L., & Beer, R. D. (2010). Nonnegative Decomposition of Multivariate Information, 1–14. Retrieved from http://arxiv.org/abs/1004.2515
Makkeh, A. & Gutknecht, A. & Wibral, M. (2020). A Differentiable measure for shared information. 1- 27 Retrieved from http://arxiv.org/abs/2002.03356
- Attributes:
- targetint
index of target process
- sourcesarray type
multiple of indices of source processes
- settingsdict
analysis settings
- resultsdict
estimated PID
- analyse_network(settings, data, targets, sources)[source]¶
Estimate partial information decomposition for network nodes.
Estimate, for multiple nodes (target processes), the partial information decomposition (PID) for multiple source processes (up to 4 sources) and each of these target processes in the network.
- Note:
For a detailed description of the algorithm and settings see documentation of the analyse_single_target() method and references in the class docstring.
Example:
>>> n = 20 >>> alph = 2 >>> s1 = np.random.randint(0, alph, n) >>> s2 = np.random.randint(0, alph, n) >>> s3 = np.random.randint(0, alph, n) >>> target1 = np.logical_xor(s1, s2).astype(int) >>> target = np.logical_xor(target1, s3).astype(int) >>> data = Data(np.vstack((s1, s2, s3, target)), 'ps', >>> normalise=False) >>> settings = { >>> 'lags_pid': [[1, 1, 1], [3, 2, 7]], >>> 'verbose': False, >>> 'pid_estimator': 'SxPID'} >>> targets = [0, 1] >>> sources = [[1, 2, 3], [0, 2, 3]] >>> pid_analysis = MultivariatePID() >>> results = pid_analysis.analyse_network(settings, data, targets, >>> sources)
- Args:
- settingsdict
parameters for estimation and statistical testing, see documentation of analyse_single_target() for details, can contain
lags_pid : list of lists of ints [optional] - lags in samples between sources and target (default=[[1, 1, …, 1], [1, 1, …, 1], …])
- dataData instance
raw data for analysis
- targetslist of int
index of target processes
- sourceslist of lists
indices of the multiple source processes for each target, e.g., [[0, 1, 2], [1, 0, 3]], all must lists be of the same lenght and list of lists must have the same length as targets
- Returns:
- ResultsMultivariatePID instance
results of network inference, see documentation of ResultsMultivariatePID()
- analyse_single_target(settings, data, target, sources)[source]¶
Estimate partial information decomposition for a network node.
Estimate partial information decomposition (PID) for multiple source processes (up to 4 sources) and a target process in the network.
- Note:
For a description of the algorithm and the method see references in the class and estimator docstrings.
Example:
>>> n = 20 >>> alph = 2 >>> s1 = np.random.randint(0, alph, n) >>> s2 = np.random.randint(0, alph, n) >>> s3 = np.random.randint(0, alph, n) >>> target1 = np.logical_xor(s1, s2).astype(int) >>> target = np.logical_xor(target1, s3).astype(int) >>> data = Data(np.vstack((s1, s2, s3, target)), 'ps', >>> normalise=False) >>> settings = { >>> 'verbose' : false, >>> 'pid_estimator': 'SxPID', >>> 'lags_pid': [2, 3, 1]} >>> pid_analysis = MultivariatePID() >>> results = pid_analysis.analyse_single_target(settings=settings, >>> data=data, >>> target=0, >>> sources=[1, 2, 3])
Args: settings : dict parameters for estimator use and statistics:
pid_estimator : str - estimator to be used for PID estimation (for estimator settings see the documentation in the estimators_pid modules)
lags_pid : list of ints [optional] - lags in samples between sources and target (default=[1, 1, …, 1])
verbose : bool [optional] - toggle console output (default=True)
- dataData instance
raw data for analysis
- targetint
index of target processes
- sourceslist of ints
indices of the multiple source processes for the target
- Returns: ResultsMultivariatePID instance results of
network inference, see documentation of ResultsPID()
idtxl.active_information_storage module¶
Analysis of AIS in a network of processes.
Analysis of active information storage (AIS) in individual processes of a network. The algorithm uses non-uniform embedding as described in Faes (2011).
- Note:
Written for Python 3.4+
- class idtxl.active_information_storage.ActiveInformationStorage[source]¶
Bases:
idtxl.single_process_analysis.SingleProcessAnalysis
Estimate active information storage in individual processes.
Estimate active information storage (AIS) in individual processes of the network. To perform AIS estimation call analyse_network() on the whole network or a set of nodes or call analyse_single_process() to estimate AIS for a single process. See docstrings of the two functions for more information.
References:
Lizier, J. T., Prokopenko, M., & Zomaya, A. Y. (2012). Local measures of information storage in complex distributed computation. Inform Sci, 208, 39–54. http://doi.org/10.1016/j.ins.2012.04.016
Wibral, M., Lizier, J. T., Vögler, S., Priesemann, V., & Galuske, R. (2014). Local active information storage as a tool to understand distributed neural information processing. Front Neuroinf, 8, 1. http://doi.org/10.3389/fninf.2014.00001
Faes, L., Nollo, G., & Porta, A. (2011). Information-based detection of nonlinear Granger causality in multivariate processes via a nonuniform embedding technique. Phys Rev E, 83, 1–15. http://doi.org/10.1103/PhysRevE.83.051112
- Attributes:
- process_setlist
list with indices of analyzed processes
- settingsdict
analysis settings
- current_valuetuple
index of the current value in AIS estimation, (idx process, idx sample)
- selected_vars_fulllist of tuples
samples in the past state, (idx process, idx sample)
- aisfloat
raw AIS value
- signbool
true if AIS is significant
- pvalue: float
p-value of AIS
- analyse_network(settings, data, processes='all')[source]¶
Estimate active information storage for multiple network processes.
Estimate active information storage for all or a subset of processes in the network.
- Note:
For a detailed description of the algorithm and settings see documentation of the analyse_single_process() method and references in the class docstring.
Example:
>>> data = Data() >>> data.generate_mute_data(100, 5) >>> settings = { >>> 'cmi_estimator': 'JidtKraskovCMI', >>> 'n_perm_max_stat': 200, >>> 'n_perm_min_stat': 200, >>> 'max_lag': 5, >>> 'tau': 1 >>> } >>> processes = [1, 2, 3] >>> network_analysis = ActiveInformationStorage() >>> results = network_analysis.analyse_network(settings, data, >>> processes)
- Args:
- settingsdict
parameters for estimation and statistical testing, see documentation of analyse_single_target() for details, settings can further contain
verbose : bool [optional] - toggle console output (default=True)
fdr_correction : bool [optional] - correct results on the network level, see documentation of stats.ais_fdr() for details (default=True)
- dataData instance
raw data for analysis
- processeslist of int | ‘all’
index of processes (default=’all’); if ‘all’, AIS is estimated for all processes; if list of int, AIS is estimated for processes specified in the list.
- Returns:
- ResultsSingleProcessAnalysis instance
results of network AIS estimation, see documentation of ResultsSingleProcessAnalysis()
- analyse_single_process(settings, data, process)[source]¶
Estimate active information storage for a single process.
Estimate active information storage for one process in the network. Uses non-uniform embedding found through information maximisation. This is done in three steps (see Lizier and Faes for details):
Find all relevant samples in the processes’ own past, by iteratively adding candidate samples that have significant conditional mutual information (CMI) with the current value (conditional on all samples that were added previously)
Prune the final conditional set by testing the CMI between each sample in the final set and the current value, conditional on all other samples in the final set
Calculate AIS using the final set of candidates as the past state (calculate MI between samples in the past and the current value); test for statistical significance using a permutation test
- Note:
For a further description of the algorithm see references in the class docstring.
- Args:
- settingsdict
parameters for estimator use and statistics:
cmi_estimator : str - estimator to be used for CMI and MI calculation (for estimator settings see the documentation in the estimators_* modules)
max_lag : int - maximum temporal search depth for candidates in the processes’ past in samples
tau : int [optional] - spacing between candidates in the sources’ past in samples (default=1)
n_perm_* : int [optional] - number of permutations, where * can be ‘max_stat’, ‘min_stat’, ‘mi’ (default=500)
alpha_* : float [optional] - critical alpha level for statistical significance, where * can be ‘max_stat’, ‘min_stat’, ‘mi’ (default=0.05)
add_conditionals : list of tuples | str [optional] - force the estimator to add these conditionals when estimating TE; can either be a list of variables, where each variable is described as (idx process, lag wrt to current value) or can be a string: ‘faes’ for Faes-Method (see references)
permute_in_time : bool [optional] - force surrogate creation by shuffling realisations in time instead of shuffling replications; see documentation of Data.permute_samples() for further settings (default=False)
verbose : bool [optional] - toggle console output (default=True)
write_ckp : bool [optional] - enable checkpointing, writes analysis state to disk every time a variable is selected; resume crashed analysis using network_analysis.resume_checkpoint() (default=False)
filename_ckp : str [optional] - checkpoint file name (without extension) (default=’./idtxl_checkpoint’)
- dataData instance
raw data for analysis
- processint
index of process
- Returns:
- ResultsSingleProcessAnalysis instance
results of AIS estimation, see documentation of ResultsSingleProcessAnalysis()
idtxl.embedding_optimization_ais_Rudelt module¶
Optimization of embedding parameters of spike times using the history dependence estimators
- class idtxl.embedding_optimization_ais_Rudelt.OptimizationRudelt(settings=None)[source]¶
Bases:
object
Optimization of embedding parameters of spike times using the history dependence estimators
References:
- [1]: L. Rudelt, D. G. Marx, M. Wibral, V. Priesemann: Embedding
optimization reveals long-lasting history dependence in neural spiking activity, 2021, PLOS Computational Biology, 17(6)
implemented in idtxl by Michael Lindner, Göttingen 2021
- Args:
- settingsdict
- estimation_methodstring
The method to be used to estimate the history dependence ‘bbc’ or ‘shuffling’.
- embedding_step_sizefloat
Step size delta t (in seconds) with which the window is slid through the data. (default: 0.005)
- embedding_number_of_bins_setlist of integer values
Set of values for d, the number of bins in the embedding. (default: [1, 2, 3, 4, 5])
- embedding_past_range_setlist of floating-point values
Set of values for T, the past range (in seconds) to be used for embeddings. (default: [0.005, 0.00561, 0.00629, 0.00706, 0.00792, 0.00889, 0.00998, 0.01119, 0.01256, 0.01409, 0.01581, 0.01774, 0.01991, 0.02233, 0.02506, 0.02812, 0.03155, 0.0354, 0.03972, 0.04456, 0.05, 0.0561, 0.06295, 0.07063, 0.07924, 0.08891, 0.09976, 0.11194, 0.12559, 0.14092, 0.15811, 0.17741, 0.19905, 0.22334, 0.25059, 0.28117, 0.31548, 0.35397, 0.39716, 0.44563, 0.5, 0.56101, 0.62946, 0.70627, 0.79245, 0.88914, 0.99763, 1.11936, 1.25594, 1.40919, 1.58114, 1.77407, 1.99054, 2.23342, 2.50594, 2.81171, 3.15479, 3.53973, 3.97164, 4.45625, 5.0])
- embedding_scaling_exponent_setdict
Set of values for kappa, the scaling exponent for the bins in the embedding. Should be a python-dictionary with the three entries ‘number_of_scalings’, ‘min_first_bin_size’ and ‘min_step_for_scaling’. defaults: {‘number_of_scalings’: 10, ‘min_first_bin_size’: 0.005, ‘min_step_for_scaling’: 0.01})
- bbc_tolerancefloat
The tolerance for the Bayesian Bias Criterion. Influences which embeddings are discarded from the analysis. (default: 0.05)
- return_averaged_Rbool
Return R_tot as the average over R(T) for T in [T_D, T_max], instead of R_tot = R(T_D). If set to True, the setting for number_of_bootstraps_R_tot (see below) is ignored and set to 0 and CI bounds are not calculated. (default: True)
- timescale_minimum_past_rangefloat
Minimum past range T_0 (in seconds) to take into consideration for the estimation of the information timescale tau_R. (default: 0.01)
- number_of_bootstraps_R_maxint
The number of bootstrap re-shuffles that should be used to determine the optimal embedding. (Bootstrap the estimates of R_max to determine R_tot.) These are computed during the ‘history-dependence’ task because they are essential to obtain R_tot. (default: 250)
- number_of_bootstraps_R_totint
The number of bootstrap re-shuffles that should be used to estimate the confidence interval of the optimal embedding. (Bootstrap the estimates of R_tot = R(T_D) to obtain a confidence interval for R_tot.). These are computed during the ‘confidence-intervals’ task. The setting return_averaged_R (see above) needs to be set to False for this setting to take effect. (default: 250)
- number_of_bootstraps_nonessentialint
The number of bootstrap re-shuffles that should be used to estimate the confidence intervals for embeddings other than the optimal one. (Bootstrap the estimates of R(T) for all other T.) (These are not necessary for the main analysis and therefore default to 0.)
- symbol_block_lengthint
The number of symbols that should be drawn in each block for bootstrap resampling If it is set to None (recommended), the length is automatically chosen, based on heuristics (default: None)
- bootstrap_CI_use_sdbool
Most of the time we observed normally-distributed bootstrap replications, so it is sufficient (and more efficient) to compute confidence intervals based on the standard deviation (default: True)
- bootstrap_CI_percentile_lofloat
The lower percentile for the confidence interval. This has no effect if bootstrap_CI_use_sd is set to True (default: 2.5)
- bootstrap_CI_percentile_hifloat
The upper percentiles for the confidence interval. This has no effect if bootstrap_CI_use_sd is set to True (default: 97.5)
- analyse_auto_MIbool
perform calculation of auto mutual information of the spike train (default: True) If set to True:
- auto_MI_bin_size_setlist of floating-point values
Set of values for the sizes of the bins (in seconds). (default: [0.005, 0.01, 0.025, 0.05, 0.25, 0.5])
- auto_MI_max_delayint
The maximum delay (in seconds) between the past bin and the response. (default: 5)
- visualizationbool
create .eps output image showing the optimization values and graphs for the history dependence and the auto mutual information (default: False) if set to True:
- output_pathString
Path where the .eps images should be saved
- output_prefixString
Prefix of the output images e.g. <output_prefix>_process0.eps
- debug: bool
show values while calculating (default: False)
- analyse_auto_MI(spike_times)[source]¶
Get the auto MI for the spike times. If it is available from file, load it, else compute it.
- compute_CIs(data, target_R='R_max', symbol_block_length=None)[source]¶
Compute bootstrap replications of the history dependence estimate which can be used to obtain confidence intervals.
- Args:
- datadata_spiketime object
Input data
- target_RString
One of ‘R_max’, ‘R_tot’ or ‘nonessential’. If set to R_max, replications of R are produced for the T at which R is maximised. If set to R_tot, replications of R are produced for T = T_D (cf get_temporal_depth_T_D). If set to nonessential, replications of R are produced for each T (one embedding per T, cf get_embeddings_that_maximise_R). These are not otherwise used in the analysis and are probably only useful if the resulting plot is visually inspected, so in most cases it can be set to zero.
- symbol_block_lengthint
The number of symbols that should be drawn in each block for bootstrap resampling If it is set to None (recommended), the length is automatically chosen, based on heuristics
- get_auto_MI(spike_times, bin_size, number_of_delays)[source]¶
Compute the auto mutual information in the neuron’s activity, a measure closely related to history dependence.
- get_bootstrap_history_dependence(data, embedding, number_of_bootstraps, symbol_block_length=None)[source]¶
For a given embedding, return bootstrap replications for R.
- get_embeddings(embedding_past_range_set, embedding_number_of_bins_set, embedding_scaling_exponent_set)[source]¶
Get all combinations of parameters T, d, k, based on the sets of selected parameters.
- get_embeddings_that_maximise_R(bbc_tolerance=None, dependent_var='T', get_as_list=False)[source]¶
For each T (or d), get the embedding for which R is maximised.
For the bbc estimator, here the bbc_tolerance is applied, ie get the unbiased embeddings that maximise R.
- get_history_dependence(data, process)[source]¶
Estimate the history dependence for each embedding to all given processes.
- get_information_timescale_tau_R()[source]¶
Get the information timescale tau_R, a characteristic timescale of history dependence similar to an autocorrelation time.
- get_past_range(number_of_bins_d, first_bin_size, scaling_k)[source]¶
Get the past range T of the embedding, based on the parameters d, tau_1 and k.
- get_set_of_scalings(past_range_T, number_of_bins_d, number_of_scalings, min_first_bin_size, min_step_for_scaling)[source]¶
Get scaling exponents such that the uniform embedding as well as the embedding for which the first bin has a length of min_first_bin_size (in seconds), as well as linearly spaced scaling factors in between, such that in total number_of_scalings scalings are obtained.
- get_temporal_depth_T_D(get_R_thresh=False)[source]¶
Get the temporal depth T_D, the past range for the ‘optimal’ embedding parameters.
Given the maximal history dependence R at each past range T, (cf get_embeddings_that_maximise_R), first find the smallest T at which R is maximised (cf get_max_R_T). If bootstrap replications for this R are available, get the smallest T at which this R minus one standard deviation of the bootstrap estimates is attained.
- optimize(data, processes='all')[source]¶
Optimize the embedding parameters of spike time data using the Rudelt history dependence estimator.
References:
- [1]: L. Rudelt, D. G. Marx, M. Wibral, V. Priesemann: Embedding
optimization reveals long-lasting history dependence in neural spiking activity, 2021, PLOS Computational Biology, 17(6)
implemented in idtxl by Michael Lindner, Göttingen 2021
- Args:
- dataData_spiketime instance
raw data for analysis
- processeslist of int
index of processes; spike times are optimized all processes specified in the list separately.
- Returns: # ——————————————————————————————————– TODO
- ResultsSingleProcessRudelt instance
results of Rudelt optimization, see documentation of ResultsSingleProcessRudelt()
- if visulization in settings was set True (see class OptimizationRudelt):
- .eps images are created for each optimized process containing:
optimized values for the process
graph for the history dependence
graph for auto mutual information (if calculated)
- optimize_single_run(data, process)[source]¶
optimizes a single realisation of spike time data given the process number
- Args:
- dataData_spiketime instance
raw data for analysis
- processint
index of process;
- Returns:
- DotDict
with the following keys
- Processint
Process that was optimized
- estimation_methodString
Estimation method that was used for optimization
- T_Dfloat
Estimated optimal value for the temporal depth TD
- tau_R :
Information timescale tau_R, a characteristic timescale of history dependence similar to an autocorrelation time.
- R_totfloat
Estimated value for the total history dependence Rtot,
- AIS_totfloat
Estimated value for the total active information storage
- opt_number_of_bins_dint
Number of bins d for the embedding that yields (R̂tot ,T̂D)
- opt_scaling_kint
Scaling exponent κ for the embedding that yields (R̂tot , T̂D)
- opt_first_bin_sizeint
Size of the first bin τ1 for the embedding that yields (R̂tot , T̂D ),
- history_dependencearray with floating-point values
Estimated history dependence for each embedding
- firing_ratefloat
Firing rate of the neuron/ spike train
- recording_lengthfloat
Length of the recording (in seconds)
- H_spikingfloat
Entropy of the spike times
- if analyse_auto_MI was set to True additionally:
- auto_MIdict
numpy array of MI values for each delay
- auto_MI_delayslist of int
list of delays depending on the given auto_MI_bin_sizes and auto_MI_max_delay
idtxl.estimators_Rudelt module¶
Provide HDE estimators.
- class idtxl.estimators_Rudelt.RudeltAbstractEstimator(settings=None)[source]¶
Bases:
idtxl.estimator.Estimator
Abstract class for implementation of nsb and plugin estimators from Rudelt.
Abstract class for implementation of nsb and plugin estimators, child classes implement estimators for mutual information (MI) .
References:
- [1]: L. Rudelt, D. G. Marx, M. Wibral, V. Priesemann: Embedding
optimization reveals long-lasting history dependence in neural spiking activity, 2021, PLOS Computational Biology, 17(6)
implemented in idtxl by Michael Lindner, Göttingen 2021
- Args:
- settingsdict
- embedding_step_sizefloat [optional]
Step size delta t (in seconds) with which the window is slid through the data (default = 0.005).
- normalisebool [optional]
rebase spike times to zero (default=True)
- return_averaged_Rbool [optional]
If set to True, compute R̂tot as the average over R̂(T ) for T ∈ [T̂D, Tmax ] instead of R̂tot = R(T̂D ). If set to True, the setting for number_of_bootstraps_R_tot is ignored and set to 0 (default=True)
- get_median_number_of_spikes_per_bin(raw_symbols)[source]¶
Given raw symbols (in which the number of spikes per bin are counted, ie not necessarily binary quantity), get the median number of spikes for each bin, among all symbols obtained by the embedding.
- get_multiplicities(symbol_counts, alphabet_size)[source]¶
Get the multiplicities of some given symbol counts.
To estimate the entropy of a system, it is only important how often a symbol/ event occurs (the probability that it occurs), not what it represents. Therefore, computations can be simplified by summarizing symbols by their frequency, as represented by the multiplicities.
- get_past_range(number_of_bins_d, first_bin_size, scaling_k)[source]¶
Get the past range T of the embedding, based on the parameters d, tau_1 and k.
- get_raw_symbols(spike_times, embedding, first_bin_size)[source]¶
Get the raw symbols (in which the number of spikes per bin are counted, ie not necessarily binary quantity), as obtained by applying the embedding.
- get_window_delimiters(number_of_bins_d, scaling_k, first_bin_size)[source]¶
Get delimiters of the window, used to describe the embedding. The window includes both the past embedding and the response.
The delimiters are times, relative to the first bin, that separate two consequent bins.
- is_analytic_null_estimator()[source]¶
Indicate if estimator supports analytic surrogates.
Return true if the estimator implements estimate_surrogates_analytic() where data is formatted as per the estimate method for this estimator.
- Returns:
bool
- is_parallel()[source]¶
Indicate if estimator supports parallel estimation over chunks.
Return true if the supports parallel estimation over chunks, where a chunk is one independent data set.
- Returns:
bool
- class idtxl.estimators_Rudelt.RudeltAbstractNSBEstimator(settings=None)[source]¶
Bases:
idtxl.estimators_Rudelt.RudeltAbstractEstimator
Abstract class for implementation of NSB estimators from Rudelt.
Abstract class for implementation of Nemenman-Shafee-Bialek (NSB) estimators, child classes implement nsb estimators for mutual information (MI).
implemented in idtxl by Michael Lindner, Göttingen 2021
References:
- [1]: L. Rudelt, D. G. Marx, M. Wibral, V. Priesemann: Embedding
optimization reveals long-lasting history dependence in neural spiking activity, 2021, PLOS Computational Biology, 17(6)
- [2]: I. Nemenman, F. Shafee, W. Bialek: Entropy and inference,
revisited. In T.G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, Cambridge, MA, 2002. MIT Press.
- Args:
- settingsdict
- embedding_step_sizefloat [optional]
Step size delta t (in seconds) with which the window is slid through the data (default = 0.005).
- normalisebool [optional]
rebase spike times to zero (default=True)
- return_averaged_Rbool [optional]
If set to True, compute R̂tot as the average over R̂(T ) for T ∈ [T̂D, Tmax ] instead of R̂tot = R(T̂D ). If set to True, the setting for number_of_bootstraps_R_tot is ignored and set to 0 (default=True)
- H1(beta, mk, K, N)[source]¶
Compute the first moment (expectation value) of the entropy H.
H is the entropy one obtains with a symmetric Dirichlet prior with concentration parameter beta and a multinomial likelihood.
- alpha_ML(mk, K1, N)[source]¶
Compute first guess for the beta_MAP (cf get_beta_MAP) parameter via the posterior of a Dirichlet process.
- d2_log_rho(beta, mk, K, N)[source]¶
Second derivate of the logarithm of the Dirichlet multinomial likelihood.
- d2_log_rho_xi(beta, mk, K, N)[source]¶
Second derivative of the logarithm of the nsb (unnormalized) posterior.
- d_log_rho(beta, mk, K, N)[source]¶
First derivate of the logarithm of the Dirichlet multinomial likelihood.
- d_log_rho_xi(beta, mk, K, N)[source]¶
First derivative of the logarithm of the nsb (unnormalized) posterior.
- d_xi(beta, K)[source]¶
First derivative of xi(beta).
xi(beta) is the entropy of the system when no data has been observed. d_xi is the prior for the nsb estimator
- get_beta_MAP(mk, K, N)[source]¶
Get the maximum a posteriori (MAP) value for beta.
Provides the location of the peak, around which we integrate.
beta_MAP is the value for beta for which the posterior of the NSB estimator is maximised (or, equivalently, of the logarithm thereof, as computed here).
- get_integration_bounds(mk, K, N)[source]¶
Find the integration bounds for the estimator.
Typically it is a delta-like distribution so it is sufficient to integrate around this peak. (If not this function is not called.)
- log_likelihood_DP_alpha(a, K1, N)[source]¶
Alpha-dependent terms of the log-likelihood of a Dirichlet Process.
- nsb_entropy(mk, K, N)[source]¶
Estimate the entropy of a system using the NSB estimator.
- Parameters
mk – multiplicities
K – number of possible symbols/ state space of the system
N – total number of observed symbols
- class idtxl.estimators_Rudelt.RudeltBBCEstimator(settings=None)[source]¶
Bases:
idtxl.estimators_Rudelt.RudeltAbstractEstimator
Bayesian bias criterion (BBC) Estimator using NSB and Plugin estimator
Calculate the mutual information (MI) of one variable depending on its past using nsb and plugin estimator and check if bias criterion is passed. See parent class for references.
implemented in idtxl by Michael Lindner, Göttingen 2021
- Args:
- settingsdict
- embedding_step_sizefloat [optional]
Step size delta t (in seconds) with which the window is slid through the data (default = 0.005).
- normalisebool [optional]
rebase spike times to zero (default=True)
- return_averaged_Rbool [optional]
If set to True, compute R̂tot as the average over R̂(T ) for T ∈ [T̂D, Tmax ] instead of R̂tot = R(T̂D ). If set to True, the setting for number_of_bootstraps_R_tot is ignored and set to 0 (default=True)
- bayesian_bias_criterion(R_nsb, R_plugin, bbc_tolerance)[source]¶
Get whether the Bayesian bias criterion (bbc) is passed.
- Parameters
R_nsb – history dependence computed with NSB estimator
R_plugin – history dependence computed with plugin estimator
bbc_tolerance – tolerance for the Bayesian bias criterion
- estimate(symbol_array, past_symbol_array, current_symbol_array, bbc_tolerance=None)[source]¶
Calculate the mutual information (MI) of one variable depending on its past using nsb and plugin estimator and check if bias criterion is passed/
- Args:
- symbol_array1D numpy array
realisations of symbols based on current and past states. (first output of get_realisations_symbol from data_spiketimes object)
- past_symbol_arraynumpy array
realisations of symbols based on current and past states. (first output of get_realisations_symbol from data_spiketimes object)
- current_symbol_arraynumpy array
realisations of symbols based on current and past states. (first output of get_realisations_symbol from data_spiketimes object)
- Returns:
- I (float)
MI (AIS)
- R (float)
MI / H_uncond (History dependence)
- bbc_term (float)
bbc tolerance-independent term of the Bayesian bias criterion (bbc)
- class idtxl.estimators_Rudelt.RudeltNSBEstimatorSymbolsMI(settings=None)[source]¶
Bases:
idtxl.estimators_Rudelt.RudeltAbstractNSBEstimator
History dependence NSB estimator
Calculate the mutual information (MI) of one variable depending on its past using NSB estimator. See parent class for references.
implemented in idtxl by Michael Lindner, Göttingen 2021
- Args:
- settingsdict
- embedding_step_sizefloat [optional]
Step size delta t (in seconds) with which the window is slid through the data (default = 0.005).
- normalisebool [optional]
rebase spike times to zero (default=True)
- return_averaged_Rbool [optional]
If set to True, compute R̂tot as the average over R̂(T ) for T ∈ [T̂D, Tmax ] instead of R̂tot = R(T̂D ). If set to True, the setting for number_of_bootstraps_R_tot is ignored and set to 0 (default=True)
- estimate(symbol_array, past_symbol_array, current_symbol_array)[source]¶
Estimate mutual information using NSB estimator.
- Args:
- symbol_array1D numpy array
realisations of symbols based on current and past states. (first output of get_realisations_symbol from data_spiketimes object)
- past_symbol_arraynumpy array
realisations of symbols based on current and past states. (first output of get_realisations_symbol from data_spiketimes object)
- current_symbol_arraynumpy array
realisations of symbols based on current and past states. (first output of get_realisations_symbol from data_spiketimes object)
- Returns:
- I (float)
MI (AIS)
- R (float)
MI / H_uncond (History dependence)
- class idtxl.estimators_Rudelt.RudeltPluginEstimatorSymbolsMI(settings=None)[source]¶
Bases:
idtxl.estimators_Rudelt.RudeltAbstractEstimator
Plugin History dependence estimator
Calculate the mutual information (MI) of one variable depending on its past using plugin estimator. See parent class for references.
implemented in idtxl by Michael Lindner, Göttingen 2021
- Args:
- settingsdict
- embedding_step_sizefloat [optional] - Step size delta t (in seconds) with which the window is slid
through the data (default = 0.005).
normalise : bool [optional] - rebase spike times to zero (default=True)
return_averaged_R : bool [optional] - rebase spike times to zero (default=True)
- estimate(symbol_array, past_symbol_array, current_symbol_array)[source]¶
Estimate mutual information using plugin estimator.
- Args:
- symbol_array1D numpy array
realisations of symbols based on current and past states. (first output of get_realisations_symbol from data_spiketimes object)
- past_symbol_arraynumpy array
realisations of symbols based on current and past states. (first output of get_realisations_symbol from data_spiketimes object)
- current_symbol_arraynumpy array
realisations of symbols based on current and past states. (first output of get_realisations_symbol from data_spiketimes object)
- Returns:
- I (float)
MI (AIS)
- R (float)
MI / H_uncond (History dependence)
- plugin_entropy(mk, N)[source]¶
Estimate the entropy of a system using the Plugin estimator.
(In principle this is the same function as utl.get_shannon_entropy, only here it is a function of the multiplicities, not the probabilities.)
- Parameters
mk – multiplicities
N – total number of observed symbols
- class idtxl.estimators_Rudelt.RudeltShufflingEstimator(settings=None)[source]¶
Bases:
idtxl.estimators_Rudelt.RudeltAbstractEstimator
Estimate the history dependence in a spike train using the shuffling estimator.
See parent class for references.
implemented in idtxl by Michael Lindner, Göttingen 2021
- estimate(symbol_array)[source]¶
Estimate the history dependence in a spike train using the shuffling estimator.
- Args:
- symbol_array1D numpy array
realisations of symbols based on current and past states. (first output of get_realisations_symbol from data_spiketimes object)
- Returns:
- I (float)
MI (AIS)
- R (float)
MI / H_uncond (History dependence)
- get_H0_X_past_cond_X(marginal_probabilities, number_of_bins_d, P_X_uncond)[source]¶
Compute H_0(X_past | X), the estimate of the entropy for the past symbols given a response, under the assumption that activity in the past contributes independently towards the response.
- get_H0_X_past_cond_X_eq_x(marginal_probabilities, number_of_bins_d)[source]¶
Compute H_0(X_past | X = x), cf get_H0_X_past_cond_X.
- get_H_X_past_cond_X(P_X_uncond, P_X_past_cond_X)[source]¶
Compute H(X_past | X), the plug-in estimate of the conditional entropy for the past symbols, conditioned on the response X, given their probabilities.
- get_H_X_past_uncond(P_X_past_uncond)[source]¶
Compute H(X_past), the plug-in estimate of the entropy for the past symbols, given their probabilities.
- get_P_X_past_cond_X(past_symbol_counts, number_of_symbols)[source]¶
Compute P(X_past | X), the probability of the past activity conditioned on the response X using the plug-in estimator.
- get_P_X_past_uncond(past_symbol_counts, number_of_symbols)[source]¶
Compute P(X_past), the probability of the past activity using the plug-in estimator.
- get_P_X_uncond(number_of_symbols)[source]¶
Compute P(X), the probability of the current activity using the plug-in estimator.
- get_marginal_frequencies_of_spikes_in_bins(symbol_counts, number_of_bins_d)[source]¶
Compute for each past bin 1…d the sum of spikes found in that bin across all observed symbols.
- get_shuffled_symbol_counts(symbol_counts, past_symbol_counts, number_of_bins_d, number_of_symbols)[source]¶
Simulate new data by, for each past bin 1…d, permutating the activity across all observed past_symbols (for a given response X). The marginal probability of observing a spike given the response is thus preserved for each past bin.
- shuffling_MI(symbol_counts, number_of_bins_d)[source]¶
Estimate the mutual information between current and past activity in a spike train using the shuffling estimator.
To obtain the shuffling estimate, compute the plug-in estimate and a correction term to reduce its bias.
For the plug-in estimate:
Extract the past_symbol_counts from the symbol_counts.
I_plugin = H(X_past) - H(X_past | X)
Notation:
X: current activity, aka response
X_past: past activity
P_X_uncond: P(X)
P_X_past_uncond: P(X_past)
P_X_past_cond_X: P(X_past | X)
H_X_past_uncond: H(X_past)
H_X_past_cond_X: H(X_past | X)
I_plugin: plugin estimate of I(X_past; X)
For the correction term:
- Simulate additional data under the assumption that activity
in the past contributes independently towards the current activity.
- Compute the entropy under the assumptions of the model, which
due to its simplicity is easy to sample and the estimate unbiased
- Compute the entropy using the plug-in estimate, whose bias is
similar to that of the plug-in estimate on the original data
- Compute the correction term as the difference between the
unbiased and biased terms
Notation:
P0_sh_X_past_cond_X: P_0,sh(X_past | X), equiv. to P(X_past | X) on the shuffled data
H0_X_past_cond_X: H_0(X_past | X), based on the model of independent contributions
H0_sh_X_past_cond_X: H_0,sh(X_past | X), based on
P0_sh_X_past_cond_X, ie the plug-in estimate
I_corr: the correction term to reduce the bias of I_plugin
- Args:
- symbol_countsiterable
the activity of a spike train is embedded into symbols, whose occurrences are counted (cf emb.get_symbol_counts)
- number_of_bins_dint
the number of bins of the embedding
idtxl.estimators_jidt module¶
Provide JIDT estimators.
- class idtxl.estimators_jidt.JidtDiscrete(settings)[source]¶
Bases:
idtxl.estimators_jidt.JidtEstimator
Abstract class for implementation of discrete JIDT-estimators.
Abstract class for implementation of plug-in JIDT-estimators for discrete data. Child classes implement estimators for mutual information (MI), conditional mutual information (CMI), actice information storage (AIS), and transfer entropy (TE). See parent class for references.
Set common estimation parameters for discrete JIDT-estimators. For usage of these estimators see documentation for the child classes.
- Args:
- settingsdict [optional]
set estimator parameters:
debug : bool [optional] - return debug information when calling JIDT (default=False)
local_values : bool [optional] - return local TE instead of average TE (default=False)
discretise_method : str [optional] - if and how to discretise incoming continuous data, can be ‘max_ent’ for maximum entropy binning, ‘equal’ for equal size bins, and ‘none’ if no binning is required (default=’none’)
- Note:
Discrete JIDT estimators require the data’s alphabet size for instantiation. Hence, opposed to the Kraskov and Gaussian estimators, the JAVA class is added to the object instance, while for Kraskov/ Gaussian estimators an instance of that class is added (because for the latter, objects can be instantiated independent of data properties).
- estimate_surrogates_analytic(n_perm=200, **data)[source]¶
Return estimate of the analytical surrogate distribution.
This method must be implemented because this class’ is_analytic_null_estimator() method returns true.
- Args:
- n_permsint [optional]
number of permutations (default=200)
- datanumpy arrays
realisations of random variables required for the calculation (varies between estimators, e.g. 2 variables for MI, 3 for CMI). Formatted as per the estimate method for this estimator.
- Returns:
- float | numpy array
n_perm surrogates of the average MI/CMI/TE over all samples under the null hypothesis of no relationship between var1 and var2 (in the context of conditional)
- abstract get_analytic_distribution(**data)[source]¶
Return a JIDT AnalyticNullDistribution object.
Required so that our estimate_surrogates_analytic method can use the common_estimate_surrogates_analytic() method, where data is formatted as per the estimate method for this estimator.
- Args:
- datanumpy arrays
realisations of random variables required for the calculation (varies between estimators, e.g. 2 variables for MI, 3 for CMI). Formatted as per the estimate method for this estimator.
- Returns:
- Java object
JIDT calculator that was used here
- class idtxl.estimators_jidt.JidtDiscreteAIS(settings)[source]¶
Bases:
idtxl.estimators_jidt.JidtDiscrete
Calculate AIS with JIDT’s discrete-variable implementation.
Calculate the active information storage (AIS) for one process. Call JIDT via jpype and use the discrete estimator. See parent class for references.
Results are returned in bits.
- Args:
- settingsdict
set estimator parameters:
history : int - number of samples in the target’s past used as embedding (>= 0)
debug : bool [optional] - return debug information when calling JIDT (default=False)
local_values : bool [optional] - return local TE instead of average TE (default=False)
discretise_method : str [optional] - if and how to discretise incoming continuous data, can be ‘max_ent’ for maximum entropy binning, ‘equal’ for equal size bins, and ‘none’ if no binning is required (default=’none’)
n_discrete_bins : int [optional] - number of discrete bins/ levels or the base of each dimension of the discrete variables (default=2). If set, this parameter overwrites/sets alph. (>= 2)
alph : int [optional] - number of discrete bins/levels for var1 (default=2 , or the value set for n_discrete_bins). (>= 2)
- estimate(process, return_calc=False)[source]¶
Estimate active information storage.
- Args:
- processnumpy array
realisations as either a 2D numpy array where array dimensions represent [realisations x variable dimension] or a 1D array representing [realisations], array type can be float (requires discretisation) or int
- return_calcboolean
return the calculator used here as well as the numeric calculated value(s)
- Returns:
- float | numpy array
average AIS over all samples or local AIS for individual samples if ‘local_values’=True
- Java object
JIDT calculator that was used here. Only returned if return_calc was set.
- Raises:
- ex.JidtOutOfMemoryError
Raised when JIDT object cannot be instantiated due to mem error
- get_analytic_distribution(process)[source]¶
Return a JIDT AnalyticNullDistribution object.
Required so that our estimate_surrogates_analytic method can use the common_estimate_surrogates_analytic() method, where data is formatted as per the estimate method for this estimator.
- Args:
- processnumpy array
realisations as either a 2D numpy array where array dimensions represent [realisations x variable dimension] or a 1D array representing [realisations], array type can be float (requires discretisation) or int
- Returns:
- Java object
JIDT calculator that was used here
- class idtxl.estimators_jidt.JidtDiscreteCMI(settings=None)[source]¶
Bases:
idtxl.estimators_jidt.JidtDiscrete
Calculate CMI with JIDT’s implementation for discrete variables.
Calculate the conditional mutual information between two variables given the third. Call JIDT via jpype and use the discrete estimator. See parent class for references.
Results are returned in bits.
- Args:
- settingsdict [optional]
sets estimation parameters:
debug : bool [optional] - return debug information when calling JIDT (default=False)
local_values : bool [optional] - return local TE instead of average TE (default=False)
discretise_method : str [optional] - if and how to discretise incoming continuous data, can be ‘max_ent’ for maximum entropy binning, ‘equal’ for equal size bins, and ‘none’ if no binning is required (default=’none’)
n_discrete_bins : int [optional] - number of discrete bins/ levels or the base of each dimension of the discrete variables (default=2). If set, this parameter overwrites/sets alph1, alph2 and alphc
alph1 : int [optional] - number of discrete bins/levels for var1 (default=2, or the value set for n_discrete_bins)
alph2 : int [optional] - number of discrete bins/levels for var2 (default=2, or the value set for n_discrete_bins)
alphc : int [optional] - number of discrete bins/levels for conditional (default=2, or the value set for n_discrete_bins)
- estimate(var1, var2, conditional=None, return_calc=False)[source]¶
Estimate conditional mutual information.
- Args:
- var1numpy array
realisations of first variable, either a 2D numpy array where array dimensions represent [realisations x variable dimension] or a 1D array representing [realisations], array type can be float (requires discretisation) or int
- var2numpy array
realisations of the second variable (similar to var1)
- conditionalnumpy array [optional]
realisations of the conditioning variable (similar to var), if no conditional is provided, return MI between var1 and var2
- return_calcboolean
return the calculator used here as well as the numeric calculated value(s)
- Returns:
- float | numpy array
average CMI over all samples or local CMI for individual samples if ‘local_values’=True
- Java object
JIDT calculator that was used here. Only returned if return_calc was set.
- Raises:
- ex.JidtOutOfMemoryError
Raised when JIDT object cannot be instantiated due to mem error
- get_analytic_distribution(var1, var2, conditional=None)[source]¶
Return a JIDT AnalyticNullDistribution object.
Required so that our estimate_surrogates_analytic method can use the common_estimate_surrogates_analytic() method, where data is formatted as per the estimate method for this estimator.
- Args:
- var1numpy array
realisations of first variable, either a 2D numpy array where array dimensions represent [realisations x variable dimension] or a 1D array representing [realisations], array type can be float (requires discretisation) or int
- var2numpy array
realisations of the second variable (similar to var1)
- conditionalnumpy array [optional]
realisations of the conditioning variable (similar to var), if no conditional is provided, return MI between var1 and var2
- Returns:
- Java object
JIDT calculator that was used here
- class idtxl.estimators_jidt.JidtDiscreteMI(settings=None)[source]¶
Bases:
idtxl.estimators_jidt.JidtDiscrete
Calculate MI with JIDT’s discrete-variable implementation.
Calculate the mutual information (MI) between two variables. Call JIDT via jpype and use the discrete estimator. See parent class for references.
Results are returned in bits.
- Args:
- settingsdict [optional]
sets estimation parameters:
debug : bool [optional] - return debug information when calling JIDT (default=False)
local_values : bool [optional] - return local TE instead of average TE (default=False)
discretise_method : str [optional] - if and how to discretise incoming continuous data, can be ‘max_ent’ for maximum entropy binning, ‘equal’ for equal size bins, and ‘none’ if no binning is required (default=’none’)
n_discrete_bins : int [optional] - number of discrete bins/ levels or the base of each dimension of the discrete variables (default=2). If set, this parameter overwrites/sets alph1 and alph2
alph1 : int [optional] - number of discrete bins/levels for var1 (default=2, or the value set for n_discrete_bins)
alph2 : int [optional] - number of discrete bins/levels for var2 (default=2, or the value set for n_discrete_bins)
lag_mi : int [optional] - time difference in samples to calculate the lagged MI between processes (default=0)
- estimate(var1, var2, return_calc=False)[source]¶
Estimate mutual information.
- Args:
- var1numpy array
realisations of first variable, either a 2D numpy array where array dimensions represent [realisations x variable dimension] or a 1D array representing [realisations], array type can be float (requires discretisation) or int
- var2numpy array
realisations of the second variable (similar to var1)
- return_calcboolean
return the calculator used here as well as the numeric calculated value(s)
- Returns:
- float | numpy array
average MI over all samples or local MI for individual samples if ‘local_values’=True
- Java object
JIDT calculator that was used here. Only returned if return_calc was set.
- Raises:
- ex.JidtOutOfMemoryError
Raised when JIDT object cannot be instantiated due to mem error
- get_analytic_distribution(var1, var2)[source]¶
Return a JIDT AnalyticNullDistribution object.
Required so that our estimate_surrogates_analytic method can use the common_estimate_surrogates_analytic() method, where data is formatted as per the estimate method for this estimator.
- Args:
- var1numpy array
realisations of first variable, either a 2D numpy array where array dimensions represent [realisations x variable dimension] or a 1D array representing [realisations], array type can be float (requires discretisation) or int
- var2numpy array
realisations of the second variable (similar to var1)
- Returns:
- Java object
JIDT calculator that was used here
- class idtxl.estimators_jidt.JidtDiscreteTE(settings)[source]¶
Bases:
idtxl.estimators_jidt.JidtDiscrete
Calculate TE with JIDT’s implementation for discrete variables.
Calculate the transfer entropy between two time series processes. Call JIDT via jpype and use the discrete estimator. Transfer entropy is defined as the conditional mutual information between the source’s past state and the target’s current value, conditional on the target’s past. See parent class for references.
Results are returned in bits.
- Args:
- settingsdict
sets estimation parameters:
history_target : int - number of samples in the target’s past used as embedding. (>= 0)
history_source : int [optional] - number of samples in the source’s past used as embedding (default=same as the target history). (>= 1)
tau_source : int [optional] - source’s embedding delay (default=1). (>= 1)
tau_target : int [optional] - target’s embedding delay (default=1). (>= 1)
source_target_delay : int [optional] - information transfer delay between source and target (default=1) (>= 0)
discretise_method : str [optional] - if and how to discretise incoming continuous data, can be ‘max_ent’ for maximum entropy binning, ‘equal’ for equal size bins, and ‘none’ if no binning is required (default=’none’)
n_discrete_bins : int [optional] - number of discrete bins/ levels or the base of each dimension of the discrete variables (default=2). If set, this parameter overwrites/sets alph1 and alph2. (>= 2)
alph1 : int [optional] - number of discrete bins/levels for source (default=2, or the value set for n_discrete_bins). (>= 2)
alph2 : int [optional] - number of discrete bins/levels for target (default=2, or the value set for n_discrete_bins). (>= 2)
debug : bool [optional] - return debug information when calling JIDT (default=False)
local_values : bool [optional] - return local TE instead of average TE (default=False)
- estimate(source, target, return_calc=False)[source]¶
Estimate transfer entropy from a source to a target variable.
- Args:
- sourcenumpy array
realisations of source variable, either a 2D numpy array where array dimensions represent [realisations x variable dimension] or a 1D array representing [realisations], array type can be float (requires discretisation) or int
- targetnumpy array
realisations of target variable (similar to var1)
- return_calcboolean
return the calculator used here as well as the numeric calculated value(s)
- Returns:
- float | numpy array
average TE over all samples or local TE for individual samples if ‘local_values’=True
- Java object
JIDT calculator that was used here. Only returned if return_calc was set.
- Raises:
- ex.JidtOutOfMemoryError
Raised when JIDT object cannot be instantiated due to mem error
- get_analytic_distribution(source, target)[source]¶
Return a JIDT AnalyticNullDistribution object.
Required so that our estimate_surrogates_analytic method can use the common_estimate_surrogates_analytic() method, where data is formatted as per the estimate method for this estimator.
- Args:
- sourcenumpy array
realisations of source variable, either a 2D numpy array where array dimensions represent [realisations x variable dimension] or a 1D array representing [realisations], array type can be float (requires discretisation) or int
- targetnumpy array
realisations of target variable (similar to var1)
- Returns:
- Java object
JIDT calculator that was used here
- class idtxl.estimators_jidt.JidtEstimator(settings=None)[source]¶
Bases:
idtxl.estimator.Estimator
Abstract class for implementation of JIDT estimators.
Abstract class for implementation of JIDT estimators, child classes implement estimators for mutual information (MI), conditional mutual information (CMI), active information storage (AIS), transfer entropy (TE) using the Kraskov-Grassberger-Stoegbauer estimator for continuous data, plug-in estimators for discrete data, and Gaussian estimators for continuous Gaussian data.
References:
Lizier, Joseph T. (2014). JIDT: an information-theoretic toolkit for studying the dynamics of complex systems. Front Robot AI, 1(11).
Kraskov, A., Stoegbauer, H., & Grassberger, P. (2004). Estimating mutual information. Phys Rev E, 69(6), 066138.
Lizier, Joseph T., Mikhail Prokopenko, and Albert Y. Zomaya. (2012). Local measures of information storage in complex distributed computation. Inform Sci, 208, 39-54.
Schreiber, T. (2000). Measuring information transfer. Phys Rev Lett, 85(2), 461.
Set common estimation parameters for JIDT estimators. For usage of these estimators see documentation for the child classes.
- Args:
- settingsdict [optional]
set estimator parameters:
debug : bool [optional] - return debug information when calling JIDT (default=False)
local_values : bool [optional] - return local TE instead of average TE (default=False)
- class idtxl.estimators_jidt.JidtGaussian(CalcClass, settings)[source]¶
Bases:
idtxl.estimators_jidt.JidtEstimator
Abstract class for implementation of JIDT Gaussian-estimators.
Abstract class for implementation of JIDT Gaussian-estimators, child classes implement estimators for mutual information (MI), conditional mutual information (CMI), actice information storage (AIS), transfer entropy (TE) using JIDT’s Gaussian estimator for continuous data. See parent class for references.
Set common estimation parameters for JIDT Kraskov-estimators. For usage of these estimators see documentation for the child classes.
Results are returned in nats.
- Args:
- CalcClassJAVA class
JAVA class returned by jpype.JPackage
- settingsdict [optional]
set estimator parameters:
debug : bool [optional] - return debug information when calling JIDT (default=False)
local_values : bool [optional] - return local TE instead of average TE (default=False)
- estimate_surrogates_analytic(n_perm=200, **data)[source]¶
Estimate the surrogate distribution analytically. This method must be implemented because this class’ is_analytic_null_estimator() method returns true
- Args:
- n_permsint
number of permutations (default=200)
- datanumpy arrays
realisations of random variables required for the calculation (varies between estimators, e.g. 2 variables for MI, 3 for CMI). Formatted as per estimate_parallel for this estimator.
- Returns:
- float | numpy array
n_perm surrogates of the average MI/CMI/TE over all samples under the null hypothesis of no relationship between var1 and var2 (in the context of conditional)
- get_analytic_distribution(**data)[source]¶
Return a JIDT AnalyticNullDistribution object.
Required so that our estimate_surrogates_analytic method can use the common_estimate_surrogates_analytic() method, where data is formatted as per the estimate method for this estimator.
- Args:
- datanumpy arrays
realisations of random variables required for the calculation (varies between estimators, e.g. 2 variables for MI, 3 for CMI). Formatted as per the estimate method for this estimator.
- Returns:
- Java object
JIDT calculator that was used here
- class idtxl.estimators_jidt.JidtGaussianAIS(settings)[source]¶
Bases:
idtxl.estimators_jidt.JidtGaussian
Calculate active information storage with JIDT’s Gaussian implementation.
Calculate active information storage (AIS) for some process using JIDT’s implementation of the Gaussian estimator. AIS is defined as the mutual information between the processes’ past state and current value.
The past state needs to be defined in the settings dictionary, where a past state is defined as a uniform embedding with parameters history and tau. The history describes the number of samples taken from a processes’ past, tau describes the embedding delay, i.e., the spacing between every two samples from the processes’ past.
See parent class for references.Results are returned in nats.
- Args:
- settingsdict
sets estimation parameters:
history : int - number of samples in the processes’ past used as embedding
tau : int [optional] - the processes’ embedding delay (default=1)
debug : bool [optional] - return debug information when calling JIDT (default=False)
local_values : bool [optional] - return local TE instead of average TE (default=False)
- Note:
Some technical details: JIDT normalises over realisations, IDTxl normalises over raw data once, outside the AIS estimator to save computation time. The Theiler window ignores trial boundaries. The AIS estimator does add noise to the data as a default. To make analysis runs replicable set noise_level to 0.
- estimate(process)[source]¶
Estimate active information storage.
- Args:
- processnumpy array
realisations of first variable, either a 2D numpy array where array dimensions represent [realisations x variable dimension] or a 1D array representing [realisations]
- Returns:
- float | numpy array
average AIS over all samples or local AIS for individual samples if ‘local_values’=True
- class idtxl.estimators_jidt.JidtGaussianCMI(settings=None)[source]¶
Bases:
idtxl.estimators_jidt.JidtGaussian
Calculate conditional mutual infor with JIDT’s Gaussian implementation.
Computes the differential conditional mutual information of two multivariate sets of observations, conditioned on another, assuming that the probability distribution function for these observations is a multivariate Gaussian distribution. Call JIDT via jpype and use ConditionalMutualInfoCalculatorMultiVariateGaussian estimator. If no conditional is given (is None), the function returns the mutual information between var1 and var2.
See parent class for references. Results are returned in nats.
- Args:
- settingsdict [optional]
sets estimation parameters:
debug : bool [optional] - return debug information when calling JIDT (default=False)
local_values : bool [optional] - return local TE instead of average TE (default=False)
- Note:
Some technical details: JIDT normalises over realisations, IDTxl normalises over raw data once, outside the CMI estimator to save computation time. The Theiler window ignores trial boundaries. The CMI estimator does add noise to the data as a default. To make analysis runs replicable set noise_level to 0.
- estimate(var1, var2, conditional=None)[source]¶
Estimate conditional mutual information.
- Args:
- var1numpy array
realisations of first variable, either a 2D numpy array where array dimensions represent [realisations x variable dimension] or a 1D array representing [realisations]
- var2numpy array
realisations of the second variable (similar to var1)
- conditionalnumpy array [optional]
realisations of the conditioning variable (similar to var), if no conditional is provided, return MI between var1 and var2
- Returns:
- float | numpy array
average CMI over all samples or local CMI for individual samples if ‘local_values’=True
- get_analytic_distribution(var1, var2, conditional=None)[source]¶
Return a JIDT AnalyticNullDistribution object.
Required so that our estimate_surrogates_analytic method can use the common_estimate_surrogates_analytic() method, where data is formatted as per the estimate method for this estimator.
- Args:
- var1numpy array
realisations of first variable, either a 2D numpy array where array dimensions represent [realisations x variable dimension] or a 1D array representing [realisations]
- var2numpy array
realisations of the second variable (similar to var1)
- conditionalnumpy array [optional]
realisations of the conditioning variable (similar to var), if no conditional is provided, return MI between var1 and var2
- Returns:
- Java object
JIDT calculator that was used here
- class idtxl.estimators_jidt.JidtGaussianMI(settings=None)[source]¶
Bases:
idtxl.estimators_jidt.JidtGaussian
Calculate mutual information with JIDT’s Gaussian implementation.
Calculate the mutual information between two variables. Call JIDT via jpype and use the Gaussian estimator. See parent class for references.
Results are returned in nats.
- Args:
- settingsdict [optional]
sets estimation parameters:
debug : bool [optional] - return debug information when calling JIDT (default=False)
local_values : bool [optional] - return local TE instead of average TE (default=False)
lag_mi : int [optional] - time difference in samples to calculate the lagged MI between processes (default=0)
- Note:
Some technical details: JIDT normalises over realisations, IDTxl normalises over raw data once, outside the MI estimator to save computation time. The Theiler window ignores trial boundaries. The MI estimator does add noise to the data as a default. To make analysis runs replicable set noise_level to 0.
- estimate(var1, var2)[source]¶
Estimate mutual information.
- Args:
- var1numpy array
realisations of first variable, either a 2D numpy array where array dimensions represent [realisations x variable dimension] or a 1D array representing [realisations]
- var2numpy array
realisations of the second variable (similar to var1)
- Returns:
- float | numpy array
average MI over all samples or local MI for individual samples if ‘local_values’=True
- class idtxl.estimators_jidt.JidtGaussianTE(settings)[source]¶
Bases:
idtxl.estimators_jidt.JidtGaussian
Calculate transfer entropy with JIDT’s Gaussian implementation.
Calculate transfer entropy between a source and a target variable using JIDT’s implementation of the Gaussian estimator. Transfer entropy is defined as the conditional mutual information between the source’s past state and the target’s current value, conditional on the target’s past.
Past states need to be defined in the settings dictionary, where a past state is defined as a uniform embedding with parameters history and tau. The history describes the number of samples taken from a variable’s past, tau descrices the embedding delay, i.e., the spacing between every two samples from the processes’ past.
See parent class for references. Results are returned in nats.
- Args:
- settingsdict
sets estimation parameters:
history_target : int - number of samples in the target’s past used as embedding
history_source : int [optional] - number of samples in the source’s past used as embedding (default=same as the target history)
tau_source : int [optional] - source’s embedding delay (default=1)
tau_target : int [optional] - target’s embedding delay (default=1)
source_target_delay : int [optional] - information transfer delay between source and target (default=1)
debug : bool [optional] - return debug information when calling JIDT (default=False)
local_values : bool [optional] - return local TE instead of average TE (default=False)
- Note:
Some technical details: JIDT normalises over realisations, IDTxl normalises over raw data once, outside the CMI estimator to save computation time. The Theiler window ignores trial boundaries. The CMI estimator does add noise to the data as a default. To make analysis runs replicable set noise_level to 0.
- estimate(source, target)[source]¶
Estimate transfer entropy from a source to a target variable.
- Args:
- sourcenumpy array
realisations of source variable, either a 2D numpy array where array dimensions represent [realisations x variable dimension] or a 1D array representing [realisations]
- var2numpy array
realisations of target variable (similar to var1)
- Returns:
- float | numpy array
average TE over all samples or local TE for individual samples if ‘local_values’=True
- class idtxl.estimators_jidt.JidtKraskov(CalcClass, settings=None)[source]¶
Bases:
idtxl.estimators_jidt.JidtEstimator
Abstract class for implementation of JIDT Kraskov-estimators.
Abstract class for implementation of JIDT Kraskov-estimators, child classes implement estimators for mutual information (MI), conditional mutual information (CMI), actice information storage (AIS), transfer entropy (TE) using the Kraskov-Grassberger-Stoegbauer estimator for continuous data. See parent class for references.
Set common estimation parameters for JIDT Kraskov-estimators. For usage of these estimators see documentation for the child classes.
- Args:
- CalcClassJAVA class
JAVA class returned by jpype.JPackage
- settingsdict [optional]
set estimator parameters:
debug : bool [optional] - return debug information when calling JIDT (default=False)
local_values : bool [optional] - return local TE instead of average TE (default=False)
kraskov_k : int [optional] - no. nearest neighbours for KNN search (default=4)
normalise : bool [optional] - z-standardise data (default=False)
theiler_t : int [optional] - no. next temporal neighbours ignored in KNN and range searches (default=0)
noise_level : float [optional] - random noise added to the data (default=1e-8)
num_threads : int | str [optional] - number of threads used for estimation (default=’USE_ALL’, note that this uses all available threads on the current machine)
algorithm_num : int [optional] - which Kraskov algorithm (1 or 2) to use (default=1). Only applied at this method for TE and AIS (is already applied for MI/CMI). Note that default algorithm of 1 here is different to the default ALG_NUM argument for the JIDT AIS KSG estimator.
- class idtxl.estimators_jidt.JidtKraskovAIS(settings)[source]¶
Bases:
idtxl.estimators_jidt.JidtKraskov
Calculate active information storage with JIDT’s Kraskov implementation.
Calculate active information storage (AIS) for some process using JIDT’s implementation of the Kraskov type 1 estimator. AIS is defined as the mutual information between the processes’ past state and current value.
The past state needs to be defined in the settings dictionary, where a past state is defined as a uniform embedding with parameters history and tau. The history describes the number of samples taken from a processes’ past, tau describes the embedding delay, i.e., the spacing between every two samples from the processes’ past.
See parent class for references. Results are returned in nats.
- Args:
- settingsdict
sets estimation parameters:
history : int - number of samples in the processes’ past used as embedding
tau : int [optional] - the processes’ embedding delay (default=1)
debug : bool [optional] - return debug information when calling JIDT (default=False)
local_values : bool [optional] - return local TE instead of average TE (default=False)
kraskov_k : int [optional] - no. nearest neighbours for KNN search (default=4)
normalise : bool [optional] - z-standardise data (default=False)
theiler_t : int [optional] - no. next temporal neighbours ignored in KNN and range searches (default=0)
noise_level : float [optional] - random noise added to the data (default=1e-8)
num_threads : int | str [optional] - number of threads used for estimation (default=’USE_ALL’, note that this uses all available threads on the current machine)
algorithm_num : int [optional] - which Kraskov algorithm (1 or 2) to use (default=1)
- Note:
Some technical details: JIDT normalises over realisations, IDTxl normalises over raw data once, outside the AIS estimator to save computation time. The Theiler window ignores trial boundaries. The AIS estimator does add noise to the data as a default. To make analysis runs replicable set noise_level to 0.
- estimate(process)[source]¶
Estimate active information storage.
- Args:
- processnumpy array
realisations of first variable, either a 2D numpy array where array dimensions represent [realisations x variable dimension] or a 1D array representing [realisations]
- Returns:
- float | numpy array
average AIS over all samples or local AIS for individual samples if ‘local_values’=True
- class idtxl.estimators_jidt.JidtKraskovCMI(settings=None)[source]¶
Bases:
idtxl.estimators_jidt.JidtKraskov
Calculate conditional mutual inform with JIDT’s Kraskov implementation.
Calculate the conditional mutual information (CMI) between three variables. Call JIDT via jpype and use the Kraskov 1 estimator. If no conditional is given (is None), the function returns the mutual information between var1 and var2. See parent class for references.
Results are returned in nats.
- Args:
- settingsdict [optional]
set estimator parameters:
debug : bool [optional] - return debug information when calling JIDT (default=False)
local_values : bool [optional] - return local TE instead of average TE (default=False)
kraskov_k : int [optional] - no. nearest neighbours for KNN search (default=4)
normalise : bool [optional] - z-standardise data (default=False)
theiler_t : int [optional] - no. next temporal neighbours ignored in KNN and range searches (default=0)
noise_level : float [optional] - random noise added to the data (default=1e-8)
num_threads : int | str [optional] - number of threads used for estimation (default=’USE_ALL’, note that this uses all available threads on the current machine)
algorithm_num : int [optional] - which Kraskov algorithm (1 or 2) to use (default=1)
- Note:
Some technical details: JIDT normalises over realisations, IDTxl normalises over raw data once, outside the CMI estimator to save computation time. The Theiler window ignores trial boundaries. The CMI estimator does add noise to the data as a default. To make analysis runs replicable set noise_level to 0.
- estimate(var1, var2, conditional=None)[source]¶
Estimate conditional mutual information.
- Args:
- var1numpy array
realisations of first variable, either a 2D numpy array where array dimensions represent [realisations x variable dimension] or a 1D array representing [realisations]
- var2numpy array
realisations of the second variable (similar to var1)
- conditionalnumpy array [optional]
realisations of the conditioning variable (similar to var), if no conditional is provided, return MI between var1 and var2
- Returns:
- float | numpy array
average CMI over all samples or local CMI for individual samples if ‘local_values’=True
- class idtxl.estimators_jidt.JidtKraskovMI(settings=None)[source]¶
Bases:
idtxl.estimators_jidt.JidtKraskov
Calculate mutual information with JIDT’s Kraskov implementation.
Calculate the mutual information between two variables. Call JIDT via jpype and use the Kraskov 1 estimator. See parent class for references.
Results are returned in nats.
- Args:
- settingsdict [optional]
sets estimation parameters:
debug : bool [optional] - return debug information when calling JIDT (default=False)
local_values : bool [optional] - return local TE instead of average TE (default=False)
kraskov_k : int [optional] - no. nearest neighbours for KNN search (default=4)
normalise : bool [optional] - z-standardise data (default=False)
theiler_t : int [optional] - no. next temporal neighbours ignored in KNN and range searches (default=0)
noise_level : float [optional] - random noise added to the data (default=1e-8)
num_threads : int | str [optional] - number of threads used for estimation (default=’USE_ALL’, note that this uses all available threads on the current machine)
algorithm_num : int [optional] - which Kraskov algorithm (1 or 2) to use (default=1)
lag_mi : int [optional] - time difference in samples to calculate the lagged MI between processes (default=0)
- Note:
Some technical details: JIDT normalises over realisations, IDTxl normalises over raw data once, outside the MI estimator to save computation time. The Theiler window ignores trial boundaries. The MI estimator does add noise to the data as a default. To make analysis runs replicable set noise_level to 0.
- estimate(var1, var2)[source]¶
Estimate mutual information.
- Args:
- var1numpy array
realisations of first variable, either a 2D numpy array where array dimensions represent [realisations x variable dimension] or a 1D array representing [realisations]
- var2numpy array
realisations of the second variable (similar to var1)
- Returns:
- float | numpy array
average MI over all samples or local MI for individual samples if ‘local_values’=True
- class idtxl.estimators_jidt.JidtKraskovTE(settings)[source]¶
Bases:
idtxl.estimators_jidt.JidtKraskov
Calculate transfer entropy with JIDT’s Kraskov implementation.
Calculate transfer entropy between a source and a target variable using JIDT’s implementation of the Kraskov type 1 estimator. Transfer entropy is defined as the conditional mutual information between the source’s past state and the target’s current value, conditional on the target’s past.
Past states need to be defined in the settings dictionary, where a past state is defined as a uniform embedding with parameters history and tau. The history describes the number of samples taken from a variable’s past, tau descrices the embedding delay, i.e., the spacing between every two samples from the processes’ past.
See parent class for references. Results are returned in nats.
- Args:
- settingsdict
sets estimation parameters:
history_target : int - number of samples in the target’s past used as embedding
history_source : int [optional] - number of samples in the source’s past used as embedding (default=same as the target history)
tau_source : int [optional] - source’s embedding delay (default=1)
tau_target : int [optional] - target’s embedding delay (default=1)
source_target_delay : int [optional] - information transfer delay between source and target (default=1)
debug : bool [optional] - return debug information when calling JIDT (default=False)
local_values : bool [optional] - return local TE instead of average TE (default=False)
algorithm_num : int [optional] - which Kraskov algorithm (1 or 2) to use (default=1)
- Note:
Some technical details: JIDT normalises over realisations, IDTxl normalises over raw data once, outside the CMI estimator to save computation time. The Theiler window ignores trial boundaries. The CMI estimator does add noise to the data as a default. To make analysis runs replicable set noise_level to 0.
- estimate(source, target)[source]¶
Estimate transfer entropy from a source to a target variable.
- Args:
- sourcenumpy array
realisations of source variable, either a 2D numpy array where array dimensions represent [realisations x variable dimension] or a 1D array representing [realisations]
- var2numpy array
realisations of target variable (similar to var1)
- Returns:
- float | numpy array
average TE over all samples or local TE for individual samples if ‘local_values’=True
- idtxl.estimators_jidt.common_estimate_surrogates_analytic(estimator, n_perm=200, **data)[source]¶
Estimate the surrogate distribution analytically for JidtEstimator.
Estimate the surrogate distribution analytically for a JidtEstimator which is_analytic_null_estimator(), by sampling estimates at random p-values in the analytic distribution.
- Args:
- estimatora JidtEstimator object, which returns True to a call to
its is_analytic_null_estimator() method
- n_permsint
number of permutations (default=200)
- datanumpy arrays
realisations of random variables required for the calculation (varies between estimators, e.g. 2 variables for MI, 3 for CMI)
- Returns:
- float | numpy array
n_perm surrogates of the average MI/CMI/TE over all samples under the null hypothesis of no relationship between var1 and var2 (in the context of conditional)
idtxl.estimators_opencl module¶
- class idtxl.estimators_opencl.OpenCLKraskov(settings=None)[source]¶
Bases:
idtxl.estimator.Estimator
Abstract class for implementation of OpenCL estimators.
Abstract class for implementation of OpenCL estimators, child classes implement estimators for mutual information (MI) and conditional mutual information (CMI) using the Kraskov-Grassberger-Stoegbauer estimator for continuous data.
References:
Kraskov, A., Stoegbauer, H., & Grassberger, P. (2004). Estimating mutual information. Phys Rev E, 69(6), 066138.
Lizier, Joseph T., Mikhail Prokopenko, and Albert Y. Zomaya. (2012). Local measures of information storage in complex distributed computation. Inform Sci, 208, 39-54.
Schreiber, T. (2000). Measuring information transfer. Phys Rev Lett, 85(2), 461.
Estimators can be used to perform multiple, independent searches in parallel. Each of these parallel searches is called a ‘chunk’. To search multiple chunks, provide point sets as 2D arrays, where the first dimension represents samples or points, and the second dimension represents the points’ dimensions. Concatenate chunk data in the first dimension and pass the number of chunks to the estimators. Chunks must be of equal size.
Set common estimation parameters for OpenCL estimators. For usage of these estimators see documentation for the child classes.
- Args:
- settingsdict [optional]
set estimator parameters:
gpuid : int [optional] - device ID used for estimation (if more than one device is available on the current platform) (default=0)
kraskov_k : int [optional] - no. nearest neighbours for KNN search (default=4)
normalise : bool [optional] - z-standardise data (default=False)
theiler_t : int [optional] - no. next temporal neighbours ignored in KNN and range searches (default=0)
noise_level : float [optional] - random noise added to the data (default=1e-8)
padding : bool [optional] - pad data to a length that is a multiple of 1024, workaround for a
debug : bool [optional] - calculate intermediate results, i.e. neighbour counts from range searches and KNN distances, print debug output to console (default=False)
return_counts : bool [optional] - return intermediate results, i.e. neighbour counts from range searches and KNN distances (default=False)
- class idtxl.estimators_opencl.OpenCLKraskovCMI(settings=None)[source]¶
Bases:
idtxl.estimators_opencl.OpenCLKraskov
Calculate conditional mutual inform with OpenCL Kraskov implementation.
Calculate the conditional mutual information (CMI) between three variables using OpenCL GPU-code. If no conditional is given (is None), the function returns the mutual information between var1 and var2. See parent class for references.
Results are returned in nats.
- Args:
- settingsdict [optional]
set estimator parameters:
gpuid : int [optional] - device ID used for estimation (if more than one device is available on the current platform) (default=0)
kraskov_k : int [optional] - no. nearest neighbours for KNN search (default=4)
normalise : bool [optional] - z-standardise data (default=False)
theiler_t : int [optional] - no. next temporal neighbours ignored in KNN and range searches (default=0)
noise_level : float [optional] - random noise added to the data (default=1e-8)
debug : bool [optional] - return intermediate results, i.e. neighbour counts from range searches and KNN distances (default=False)
return_counts : bool [optional] - return intermediate results, i.e. neighbour counts from range searches and KNN distances (default=False)
- estimate(var1, var2, conditional=None, n_chunks=1)[source]¶
Estimate conditional mutual information.
If conditional is None, the mutual information between var1 and var2 is calculated.
- Args:
- var1numpy array
realisations of first variable, either a 2D numpy array where array dimensions represent [(realisations * n_chunks) x variable dimension] or a 1D array representing [realisations], array type should be int32
- var2numpy array
realisations of the second variable (similar to var1)
- conditionalnumpy array
realisations of conditioning variable (similar to var1)
- n_chunksint
number of data chunks, no. data points has to be the same for each chunk
- Returns:
- float | numpy array
average CMI over all samples or local CMI for individual samples if ‘local_values’=True
- numpy arrays
distances and neighborhood counts for var1 and var2 if debug=True and return_counts=True
- class idtxl.estimators_opencl.OpenCLKraskovMI(settings=None)[source]¶
Bases:
idtxl.estimators_opencl.OpenCLKraskov
Calculate mutual information with OpenCL Kraskov implementation.
Calculate the mutual information (MI) between two variables using OpenCL GPU-code. See parent class for references.
Results are returned in nats.
- Args:
- settingsdict [optional]
set estimator parameters:
gpuid : int [optional] - device ID used for estimation (if more than one device is available on the current platform) (default=0)
kraskov_k : int [optional] - no. nearest neighbours for KNN search (default=4)
normalise : bool [optional] - z-standardise data (default=False)
theiler_t : int [optional] - no. next temporal neighbours ignored in KNN and range searches (default=0)
noise_level : float [optional] - random noise added to the data (default=1e-8)
debug : bool [optional] - return intermediate results, i.e. neighbour counts from range searches and KNN distances (default=False)
return_counts : bool [optional] - return intermediate results, i.e. neighbour counts from range searches and KNN distances (default=False)
lag_mi : int [optional] - time difference in samples to calculate the lagged MI between processes (default=0)
- estimate(var1, var2, n_chunks=1)[source]¶
Estimate mutual information.
- Args:
- var1numpy array
realisations of first variable, either a 2D numpy array where array dimensions represent [(realisations * n_chunks) x variable dimension] or a 1D array representing [realisations], array type should be int32
- var2numpy array
realisations of the second variable (similar to var1)
- n_chunksint
number of data chunks, no. data points has to be the same for each chunk
- Returns:
- float | numpy array
average MI over all samples or local MI for individual samples if ‘local_values’=True
- numpy arrays
distances and neighborhood counts for var1 and var2 if debug=True and return_counts=True
idtxl.estimators_mpi module¶
- class idtxl.estimators_mpi.MPIEstimator(est, settings)[source]¶
Bases:
idtxl.estimator.Estimator
MPI Wrapper for arbitrary Estimator implementations
Make sure to have an “if __name__==’__main__’:” guard in your main script to avoid infinite recursion!
To use MPI, add MPI=True to the Estimator settings dictionary and optionally provide max_workers
- Call using mpiexec:
>>> mpiexec -n 1 -usize <max workers + 1> python <python script>
- or, if MPI does not support spawning new workers (i.e. MPI version < 2)
>>> mpiexec -n <max workers + 1> python -m mpi4py.futures <python script>
- Call using slurm:
>>> srun -n $SLURM_NTASKS --mpi=pmi2 python -m mpi4py.futures <python script>
- estimate(*, n_chunks=1, **data)[source]¶
Distributes the given chunks of a task to Estimators on worker ranks using MPI.
Needs to be called with kwargs only.
- Args:
- n_chunksint [optional]
Number of chunks to split the data into, default=1.
- datadict[str, Sequence]
Dictionary of random variable realizations
- Returns:
- numpy array
Estimates of information-theoretic quantities as np.double values
idtxl.estimators_python module¶
- class idtxl.estimators_python.PythonKraskovCMI(settings)[source]¶
Bases:
idtxl.estimator.Estimator
Estimate conditional mutual information using Kraskov’s first estimator.
- Args:
- settingsdict [optional]
set estimator parameters:
kraskov_k : int [optional] - no. nearest neighbours for KNN search (default=4)
base : float - base of returned values (default=np=e)
normalise : bool [optional] - z-standardise data (default=False)
noise_level : float [optional] - random noise added to the data (default=1e-8)
rng_seed : int | None [optional] - random seed if noise level > 0
num_threads : int | str [optional] - number of threads used for estimation (default=’USE_ALL’, note that this uses all available threads on the current machine)
knn_finder : str [optional] - knn algorithm to use, can be ‘scipy_kdtree’ (default), ‘sklearn_kdtree’, or ‘sklearn_balltree’
- estimate(var1: numpy.ndarray, var2: numpy.ndarray, conditional=None)[source]¶
Estimate conditional mutual information between var1 and var2, given conditional.
idtxl.estimators_multivariate_pid module¶
Multivariate Partical information decomposition for discrete random variables.
This module provides an estimator for multivariate partial information decomposition as proposed in
Makkeh, A. & Gutknecht, A. & Wibral, M. (2020). A Differentiable measure for shared information. 1- 27 Retrieved from http://arxiv.org/abs/2002.03356
- class idtxl.estimators_multivariate_pid.SxPID(settings)[source]¶
Bases:
idtxl.estimator.Estimator
Estimate partial information decomposition for multiple inputs.
Implementation of the multivariate partial information decomposition (PID) estimator for discrete data with (up to 4 inputs) and one output. The estimator finds shared information, unique information and synergistic information between the multiple inputs s1, s2, …, sn with respect to the output t for each realization (t, s1, …, sn) and then average them according to their distribution weights p(t, s1, …, sn). Both the pointwise (on the realization level) PID and the averaged PID are returned (see the ‘return’ of ‘estimate()’).
The algorithm uses recursion to compute the partial information decomposition.
References:
Makkeh, A. & Wibral, M. (2020). A differentiable pointwise partial Information Decomposition estimator. https://github.com/Abzinger/SxPID.
- Args:
- settingsdict
estimation parameters (with default parameters)
verbose : bool [optional] - print output to console (default=False)
- estimate(s, t)[source]¶
Estimate SxPID from list of sources and a target
- Args:
- slist of numpy arrays
1D arrays containing realizations of a discrete random variable
- tnumpy array
1D array containing realizations of a discrete random variable
- Returns:
- dict
- SxPID results, with entries
‘ptw’ -> { realization -> {alpha -> [float, float, float]}}: pointwise decomposition
‘avg’ -> {alpha -> [float, float, float]}: average decomposition
the list of floats is ordered [informative, misinformative, informative - misinformative]
idtxl.estimators_pid module¶
Partical information decomposition for discrete random variables.
This module provides an estimator for partial information decomposition as proposed in
Bertschinger, N., Rauh, J., Olbrich, E., Jost, J., & Ay, N. (2014). Quantifying Unique Information. Entropy, 16(4), 2161–2183. http://doi.org/10.3390/e16042161
- class idtxl.estimators_pid.SydneyPID(settings)[source]¶
Bases:
idtxl.estimator.Estimator
Estimate partial information decomposition of discrete variables.
Fast implementation of the BROJA partial information decomposition (PID) estimator for discrete data (Bertschinger, 2014). The estimator does not require JAVA or GPU modules to run.
The estimator finds shared information, unique information and synergistic information between the two inputs s1 and s2 with respect to the output t.
Improved version with larger initial swaps and checking for convergence of both the unique information from sources 1 and 2. The function counts the empirical observations, calculates probabilities and the initial CMI, then does the vitrualised swaps until it has converged, and finally calculates the PID. The virtualised swaps stage contains two loops. An inner loop which actually does the virtualised swapping, keeping the changes if the CMI decreases; and an outer loop which decreases the size of the probability mass increment the virtualised swapping utilises.
References
Bertschinger, N., Rauh, J., Olbrich, E., Jost, J., & Ay, N. (2014). Quantifying unique information. Entropy, 16(4), 2161–2183. http://doi.org/10.3390/e16042161
- Args:
- settingsdict
estimation parameters
alph_s1 : int - alphabet size of s1
alph_s2 : int - alphabet size of s2
alph_t : int - alphabet size of t
max_unsuc_swaps_row_parm : int - soft limit for virtualised swaps based on the number of unsuccessful swaps attempted in a row. If there are too many unsuccessful swaps in a row, then it will break the inner swap loop; the outer loop decrements the size of the probability mass increment and then attemps virtualised swaps again with the smaller probability increment. The exact number of unsuccessful swaps allowed before breaking is the total number of possible swaps (given our alphabet sizes) times the control parameter max_unsuc_swaps_row_parm, e.g., if the parameter is set to 3, this gives a high degree of confidence that nearly (if not) all of the possible swaps have been attempted before this soft limit breaks the swap loop.
num_reps : int - number of times the outer loop will halve the size of the probability increment used for the virtualised swaps. This is in direct correspondence with the number of times the empirical data was replicated in your original implementation.
max_iters : int - provides a hard upper bound on the number of times it will attempt to perform virtualised swaps in the inner loop. However, this hard limit is (practically) never used as it should always hit the soft limit defined above (parameter may be removed in the future).
verbose : bool [optional] - print output to console (default=False)
- estimate(s1, s2, t)[source]¶
- Args:
- s1numpy array
1D array containing realizations of a discrete random variable
- s2numpy array
1D array containing realizations of a discrete random variable
- tnumpy array
1D array containing realizations of a discrete random variable
- Returns:
- dict
estimated decomposition, contains the joint distribution, unique, shared, and synergistic information
- class idtxl.estimators_pid.TartuPID(settings)[source]¶
Bases:
idtxl.estimator.Estimator
Estimate partial information decomposition for two inputs and one output
Implementation of the partial information decomposition (PID) estimator for discrete data. The estimator finds shared information, unique information and synergistic information between the two inputs s1 and s2 with respect to the output t.
The algorithm uses exponential cone programming and requires the Python package for ECOS: Embedded Cone Solver (https://pypi.python.org/pypi/ecos).
References:
Makkeh, A., Theis, D.O., & Vicente, R. (2017). Bivariate Partial Information Decomposition: The Optimization Perspective. Entropy, 19(10), 530.
Makkeh, A., Theis, D.O., & Vicente, R. (2018). BROJA-2PID: A cone programming based Partial Information Decomposition estimator. Entropy, 20(271), https://github.com/Abzinger/BROJA_2PID.
- Args:
- settingsdict
estimation parameters (with default parameters)
verbose : bool [optional] - print output to console (default=False)
cone_solver : str [optional] - which cone solver to use (default=’ECOS’)
solver_args : dict [optional] - solver arguments (default={})
- estimate(s1, s2, t)[source]¶
- Args:
- s1numpy array
1D array containing realizations of a discrete random variable
- s2numpy array
1D array containing realizations of a discrete random variable
- tnumpy array
1D array containing realizations of a discrete random variable
- Returns:
- dict
estimated decomposition, solver used, numerical error
idtxl.idtxl_exceptions module¶
Provide error handling and warnings.
- exception idtxl.idtxl_exceptions.AlgorithmExhaustedError(message)[source]¶
Bases:
Exception
Exception raised to signal that the estimators can no longer be used for this particular target (e.g. because of memory errors in high dimensions) but that the estimation could continue for others.
- Attributes:
message – explanation of the error
- exception idtxl.idtxl_exceptions.JidtOutOfMemoryError(message)[source]¶
Bases:
idtxl.idtxl_exceptions.AlgorithmExhaustedError
- Exception raised to signal a Java OutOfMemoryException.
It is a child class of AlgorithmExhaustedError.
- Attributes:
message – explanation of the error
idtxl.idtxl_io module¶
Provide I/O functionality.
Provide functions to load and save IDTxl data, provide import functions (e.g., mat-files, FieldTrip) and export functions (e.g., networkx, BrainNet Viewer).
- idtxl.idtxl_io.export_brain_net_viewer(adjacency_matrix, mni_coord, file_name, **kwargs)[source]¶
Export network to BrainNet Viewer.
Export networks to BrainNet Viewer (project home page: http://www.nitrc.org/projects/bnv/). BrainNet Viewer is a MATLAB toolbox offering brain network visualisation (e.g., ‘glass’ brains). The function creates text files [file_name].node and [file_name].edge, containing information on node location (in MNI coordinates), directed edges, node color and size.
References:
Xia, M., Wang, J., & He, Y. (2013). BrainNet Viewer: A Network Visualization Tool for Human Brain Connectomics. PLoS ONE 8(7):e68910. https://doi.org/10.1371/journal.pone.0068910
- Args:
- adjacency_matrixAdjacencyMatrix instance
adjacency matrix to be exported, returned by get_adjacency_matrix() method of Results() class
- mni_coordnumpy array
MNI coordinates (x,y,z) of the sources, array with size [n 3], where n is the number of nodes
- file_namestr
file name for output files including the file path
- labelsarray type of str [optional]
list of node labels of length n, description or label for each node. Note that labels can’t contain spaces (causes BrainNet to crash), the function will remove any spaces from labels (default=no labels)
- node_colorarray type of colors [optional]
BrainNet gives you the option to color nodes according to the values in this vector (length n), see BrainNet Manual
- node_sizearray type of int [optional]
BrainNet gives you the option to size nodes according to the values in this array (length n), see BrainNet Manual
- idtxl.idtxl_io.export_networkx_graph(adjacency_matrix, weights)[source]¶
Export networkx graph object for an inferred network.
Export a weighted, directed graph object from the network of inferred (multivariate) interactions (e.g., multivariate TE), using the networkx class for directed graphs (DiGraph). Multiple options for the weight are available (see documentation of method get_adjacency_matrix for details).
- Args:
- adjacency_matrixAdjacencyMatrix instances
adjacency matrix to be exported, returned by get_adjacency_matrix() method of Results() class
- weightsstr
weights for the adjacency matrix (see documentation of method get_adjacency_matrix for details)
- fdrbool [optional]
return FDR-corrected results (default=True)
- Returns: DiGraph instance
directed graph of networkx package’s DiGraph() class
- idtxl.idtxl_io.export_networkx_source_graph(results, target, sign_sources=True, fdr=True)[source]¶
Export graph object of source variables for a single target.
Export graph object from the network of (multivariate) interactions (e.g., multivariate TE) between single source variables and a target process using the networkx class for directed graphs (DiGraph). The graph shows the information transfer between individual source variables and the target. Each node is a tuple with the following format: (process index, sample index).
- Args:
- resultsResults() instance
network analysis results
- targetint
target index
- sign_sourcesbool [optional]
add sources with significant information contribution only (default=True)
- fdrbool [optional]
return FDR-corrected results (default=True)
- Returns:
- DiGraph instance
directed graph of networkx package’s DiGraph() class
- idtxl.idtxl_io.import_fieldtrip(file_name, ft_struct_name, file_version, normalise=True)[source]¶
Convert FieldTrip-style MATLAB-file into an IDTxl Data object.
Import a MATLAB structure with fields “trial” (data), “label” (channel labels), “time” (time stamps for data samples), and “fsample” (sampling rate). This structure is the standard file format in the MATLAB toolbox FieldTrip and commonly use to represent neurophysiological data (see also http://www.fieldtriptoolbox.org/reference/ft_datatype_raw). The data is returned as a IDTxl Data() object.
The structure is assumed to be saved as a matlab hdf5 file (“-v7.3’ or higher, .mat) with a SINGLE FieldTrip data structure inside.
- Args:
- file_namestring
full (matlab) file_name on disk
- ft_struct_namestring
variable name of the MATLAB structure that is in FieldTrip format (autodetect will hopefully be possible later …)
- file_versionstring
version of the file, e.g. ‘v7.3’ for MATLAB’s 7.3 format
- normalisebool [optional]
normalise data after import (default=True)
- Returns:
- Data() instance
instance of IDTxl Data object, containing data from the ‘trial’ field
- list of strings
list of channel labels, corresponding to the ‘label’ field
- numpy array
time stamps for samples, corresponding to one entry in the ‘time’ field
- int
sampling rate, corresponding to the ‘fsample’ field
- idtxl.idtxl_io.import_matarray(file_name, array_name, file_version, dim_order, normalise=True)[source]¶
Read Matlab hdf5 file into IDTxl.
reads a matlab hdf5 file (“-v7.3’ or higher, .mat) or non-hdf5 files with a SINGLE array inside and returns an IDTxl Data() object.
- Note:
The import function squeezes the loaded mat-file, i.e., any singleton dimension will be removed. Hence do not enter singleton dimension into the ‘dim_order’, e.g., don’t pass dim_order=’ps’ but dim_order=’s’ if you want to load a 1D-array where entries represent samples recorded from a single channel.
- Args:
- file_namestring
full (matlab) file_name on disk
- array_namestring
variable name of the MATLAB structure to be read
- file_versionstring
version of the file, e.g. ‘v7.3’ for MATLAB’s 7.3 format, currently versions ‘v4’, ‘v6’, ‘v7’, and ‘v7’ are supported
- dim_orderstring
order of dimensions, accepts any combination of the characters ‘p’, ‘s’, and ‘r’ for processes, samples, and replications; must have the same length as the data dimensionality, e.g., ‘ps’ for a two-dimensional array of data from several processes over time
- normalisebool [optional]
normalise data after import (default=True)
- Returns:
- Data() instance
instance of IDTxl Data object, containing data from the ‘trial’ field
- idtxl.idtxl_io.load_json(file_path)[source]¶
Load dictionary saved as JSON file from disk.
- Args:
- file_pathstr
path to file (including extension)
- Returns:
dict
Note: JSON does not recognize numpy data structures and types. Numpy arrays and data types (float, int) are thus converted to Python types and lists. The loaded dictionary may thus contain different data types than the saved one.
- idtxl.idtxl_io.load_pickle(name)[source]¶
Load objects that have been saved using Python’s pickle module.
- idtxl.idtxl_io.save_json(d, file_path)[source]¶
Save dictionary to disk as JSON file.
Writes dictionary to disk at the specified file path.
- Args:
- ddict
dictionary to be written to disk
- file_pathstr
path to file (including extension)
Note: JSON does not recognize numpy data types, those are converted to basic Python data types first.
idtxl.idtxl_utils module¶
Provide IDTxl utility functions.
- idtxl.idtxl_utils.argsort_descending(a)[source]¶
Sort array in descending order and return sortind indices.
- idtxl.idtxl_utils.calculate_mi(corr)[source]¶
Calculate mutual information from correlation coefficient.
- idtxl.idtxl_utils.combine_discrete_dimensions(a, numBins)[source]¶
Combine multi-dimensional discrete variable into a single dimension.
Combine all dimensions for a discrete variable down into a single dimensional value for each sample. This is done basically by multiplying each dimension by a different power of the base (numBins).
Adapted from infodynamics.utils.MatrixUtils.computeCombinedValues() from JIDT by J.Lizier.
- Args:
- anumpy array
data to be combined across all variable dimensions. Dimensions are realisations (samples) x variable dimension
- numBinsint
number of discrete levels or bins for each variable dimension
- Returns:
- numpy array
a univariate array – one entry now for each sample, with all dimensions of the data now combined for that sample
- idtxl.idtxl_utils.conflicting_entries(dict_1, dict_2)[source]¶
Test two dictionaries for unequal entries.
Note that only keys that are present in both dicts are compared. If one dictionary contains an entry not present in the other dictionary, the test passes.
- idtxl.idtxl_utils.discretise(a, numBins)[source]¶
Discretise continuous data.
Discretise continuous data into discrete values (with 0 as lowest) by evenly partitioning the range of the data, one dimension at a time. Adapted from infodynamics.utils.MatrixUtils.discretise() from JIDT by J. Lizier.
- Args:
- anumpy array
data to be discretised. Dimensions are realisations x variable dimension
- numBinsint
number of discrete levels or bins to partition the data into
- Returns:
- numpy array
discretised data
- idtxl.idtxl_utils.discretise_max_ent(a, numBins)[source]¶
Discretise continuous data using maximum entropy partitioning.
Discretise continuous data into discrete values (with 0 as lowest) by making a maximum entropy partitioning, one dimension at a time. Adapted from infodynamics.utils.MatrixUtils.discretiseMaxEntropy() from JIDT by J. Lizier.
- Args:
- anumpy array
data to be discretised. Dimensions are realisations x variable dimension
- numBinsint
number of discrete levels or bins to partition the data into
- Returns:
- numpy array
discretised data
- idtxl.idtxl_utils.print_dict(d, indent=4)[source]¶
Use Python’s pretty printer to print dictionaries to the console.
- idtxl.idtxl_utils.remove_column(a, j)[source]¶
Remove a column from a numpy array.
This is faster than logical indexing (‘25 times faster’), because it does not make copies, see http://scipy.github.io/old-wiki/pages/PerformanceTips
- Args:
- anumpy array
2-dimensional numpy array
- iint
column index to be removed
- idtxl.idtxl_utils.remove_row(a, i)[source]¶
Remove a row from a numpy array.
This is faster than logical indexing (‘25 times faster’), because it does not make copies, see http://scipy.github.io/old-wiki/pages/PerformanceTips
- Args:
- anumpy array
2-dimensional numpy array
- iint
row index to be removed
- idtxl.idtxl_utils.separate_arrays(idx_all, idx_single, a)[source]¶
Separate a single column from all other columns in a 2D-array.
Return the separated single column and the remaining columns of a 2D- array.
- Args:
- idx_alllist<Object>
list of variables indicating the full set
- idx_single<Object>
single variable indicating the column to be separated, variable must be contained in idx_all
- anumpy array
2D-array with the same length along axis 1 as idx_all (.shape[1] == len(idx_all))
- Returns:
- numpy array
remaining columns in full array
- numpy array
column at single index
- idtxl.idtxl_utils.standardise(a, dimension=0, df=1)[source]¶
Z-standardise a numpy array along a given dimension.
Standardise array along the axis defined in dimension using the denominator (N - df) for the calculation of the standard deviation.
- Args:
- anumpy array
data to be standardised
- dimensionint [optional]
dimension along which array should be standardised
- dfint [optional]
degrees of freedom for the denominator of the standard derivation
- Returns:
- numpy array
standardised data
- idtxl.idtxl_utils.swap_chars(s, i_1, i_2)[source]¶
Swap to characters in a string.
- Example:
>>> print(swap_chars('heLlotHere', 2, 6)) 'heHlotLere'
idtxl.network_analysis module¶
Parent class for network inference and network comparison.
- class idtxl.network_analysis.NetworkAnalysis[source]¶
Bases:
object
Provide an analysis setup for network inference or comparison.
The class provides routines to check user input and set defaults.
- property current_value¶
Get index of the current_value.
- resume_checkpoint(file_path)[source]¶
Resume analysis from a checkpoint saved to disk.
- Args:
- file_pathstr
path to checkpoint file (excluding extension: .ckp)
- property selected_vars_full¶
List of indices of the full conditional set.
- property selected_vars_sources¶
List of indices of source samples in the conditional set.
- property selected_vars_target¶
List of indices of target samples in the conditional set.
idtxl.network_inference module¶
Parent class for all network inference.
- class idtxl.network_inference.NetworkInference[source]¶
Bases:
idtxl.network_analysis.NetworkAnalysis
Parent class for network inference algorithms.
Hold variables that are relevant for network inference using for example bivariate and multivariate transfer entropy.
- Attributes:
- settingsdict
settings for estimation of information theoretic measures and statistical testing, see child classes for documentation
- targetint
target process of analysis
- current_valuetuple
index of the current value
- selected_vars_fulllist of tuples
indices of the full set of random variables to be conditioned on
- selected_vars_targetlist of tuples
indices of the set of conditionals coming from the target process
- selected_vars_sourceslist of tuples
indices of the set of conditionals coming from source processes
- class idtxl.network_inference.NetworkInferenceBivariate[source]¶
Bases:
idtxl.network_inference.NetworkInference
Parent class for bivariate network inference algorithms.
- class idtxl.network_inference.NetworkInferenceMI[source]¶
Bases:
idtxl.network_inference.NetworkInference
Parent class for mutual information network inference algorithms.
- class idtxl.network_inference.NetworkInferenceMultivariate[source]¶
Bases:
idtxl.network_inference.NetworkInference
Parent class for multivariate network inference algorithms.
- class idtxl.network_inference.NetworkInferenceTE[source]¶
Bases:
idtxl.network_inference.NetworkInference
Parent class for transfer entropy network inference algorithms.
idtxl.single_process_analysis module¶
Parent class for analysis of single processes in the network.
idtxl.network_comparison module¶
Perform inference statistics on groups of data.
- class idtxl.network_comparison.NetworkComparison[source]¶
Bases:
idtxl.network_analysis.NetworkAnalysis
Set up network comparison between two experimental conditions.
The class provides methods for the comparison of networks inferred from data recorded under two experimental conditions A and B. Four statistical tests are implemented:
units of observation/ comparison type
stats_type
example
replications/ within a subject
dependent
base line (A) vs. task (B)
independent
detect house (A) vs. face (B)
sets of data/ between subjects
dependent
patients (A) vs. matched controls (B)
independent
male (A) vs. female (B) participants
Depending on the units of observations, one of two statistics methods can be used: compare_within() and compare_between(). The stats_type is passed as an analysis setting, see the documentation of the two methods for details.
Note that for network inference methods that use an embedding, i.e., a collection of variables in the source, the joint information in all variables about the target is used as a test statistic.
- calculate_link_te(data, target, sources='all')[source]¶
Calculate the information transfer for whole links into a target.
Calculate the information transfer for whole links as the joint information transfer from all variables selected for a single source process into the target. The information transfer is calculated conditional on the target’s past and, for multivariate TE, conditional on selected variables from further sources in the network.
If sources is set to ‘all’, a list of information transfer values is returned. If sources is set to a single source index, the information transfer from this source to the target is returned.
- Args:
- dataData instance
raw data for analysis
- targetint
index of target process
- sourceslist of ints | ‘all’ [optional]
return estimates for links from selected or all sources into the target (default=’all’)
- Returns:
- numpy array
information transfer estimate for each link
- compare_between(settings, network_set_a, network_set_b, data_set_a, data_set_b)[source]¶
Compare networks inferred under two conditions between subjects.
Compare two sets of networks inferred from two sets of data recorded under different experimental conditions within multiple subjects, i.e., data have been recorded from subjects assigned to one of two experimental conditions (units of observations are subjects).
- Args:
- settingsdict
parameters for estimation and statistical testing, see documentation of compare_within() for details
- network_set_anumpy array of dicts
results from network inference for multiple subjects observed under condition a
- network_set_bnumpy array of dicts
results from network inference for multiple subjects observed under condition b
- data_anumpy array of Data objects
set of data from which network_set_a was inferred
- data_bnumpy array of Data objects
set of data from which network_set_b was inferred
- Returns
- ResultsNetworkComparison object
results of network inference, see documentation of ResultsNetworkComparison()
- compare_links_within(settings, link_a, link_b, network, data)[source]¶
Compare two links within the same network.
Compare two links within the same network. Check if information transfer is different from information transfer in a second link.
Note that both links have to be part of the inferred network, i.e., there has to be significant effective connectivity for both links.
- Args:
- settingsdict
parameters for estimation and statistical testing
stats_type : str - ‘dependent’ or ‘independent’ for dependent or independent units of observation
cmi_estimator : str - estimator to be used for CMI calculation (for estimator settings see the documentation in the estimators_* modules)
tail_comp : str [optional] - test tail, ‘one’ for one-sided test A > B, ‘two’ for two-sided test (default=’two’)
n_perm_comp : int [optional] - number of permutations (default=500)
alpha_comp : float - critical alpha level for statistical significance (default=0.05)
permute_in_time : bool [optional] - if True, create surrogates by shuffling data over time. See Data.permute_samples() for settings for further options for surrogate creation
verbose : bool [optional] - toggle console output (default=True)
- link_aarray type
first link, array type with two entries [source target]
- link_barray type
second link, array type with two entries [source target]
- networkdict
results from network inference
- dataData object
data from which network was inferred
- Returns
- ResultsNetworkComparison object
results of network inference, see documentation of ResultsNetworkComparison()
- compare_within(settings, network_a, network_b, data_a, data_b)[source]¶
Compare networks inferred under two conditions within one subject.
Compare two networks inferred from data recorded under two different experimental conditions within one subject (units of observations are replications of one experimental condition within one subject).
- Args:
- settingsdict
parameters for estimation and statistical testing
stats_type : str - ‘dependent’ or ‘independent’ for dependent or independent units of observation
cmi_estimator : str - estimator to be used for CMI calculation (for estimator settings see the documentation in the estimators_* modules)
tail_comp : str [optional] - test tail, ‘one’ for one-sided test A > B, ‘two’ for two-sided test (default=’two’)
n_perm_comp : int [optional] - number of permutations (default=500)
alpha_comp : float - critical alpha level for statistical significance (default=0.05)
permute_in_time : bool [optional] - if True, create surrogates by shuffling data over time. See Data.permute_samples() for settings for further options for surrogate creation
verbose : bool [optional] - toggle console output (default=True)
- network_adict
results from network inference, condition a
- network_bdict
results from network inference, condition b
- data_aData object
data from which network_a was inferred
- data_bData object
data from which network_b was inferred
- Returns
- ResultsNetworkComparison object
results of network inference, see documentation of ResultsNetworkComparison()
idtxl.results module¶
Provide results class for IDTxl network analysis.
- class idtxl.results.AdjacencyMatrix(n_nodes, weight_type)[source]¶
Bases:
object
Adjacency matrix representing inferred networks.
- add_edge_list(i_list, j_list, weights)[source]¶
Add multiple weighted edges (i, j) to adjacency matrix.
- class idtxl.results.DotDict[source]¶
Bases:
dict
Dictionary with dot-notation access to values.
Provides the same functionality as a regular dict, but also allows accessing values using dot-notation.
Example:
>>> from idtxl.results import DotDict >>> d = DotDict({'a': 1, 'b': 2}) >>> d.a >>> # Out: 1 >>> d['a'] >>> # Out: 1
- class idtxl.results.Results(n_nodes, n_realisations, normalised)[source]¶
Bases:
object
Parent class for results of network analysis algorithms.
Provide a container for results of network analysis algorithms, e.g., MultivariateTE or ActiveInformationStorage.
- Attributes:
- settingsdict
settings used for estimation of information theoretic measures and statistical testing
- data_propertiesdict
data properties, contains
n_nodes : int - total number of nodes in the network
n_realisations : int - number of samples available for analysis given the settings (e.g., a high maximum lag used in network inference, results in fewer data points available for estimation)
normalised : bool - indicates if data were z-standardised before the estimation
- combine_results(*results)[source]¶
Combine multiple (partial) results objects.
Combine a list of partial network analysis results into a single results object (e.g., results from analysis parallelized over processes). Raise an error if duplicate processes occur in partial results, or if analysis settings are not equal.
Note that only conflicting settings cause an error (i.e., settings with equal keys but different values). If additional settings are included in partial results (i.e., settings with different keys) these settings are added to the common settings dictionary.
Remove FDR-corrections from partial results before combining them. FDR- correction performed on the basis of parts of the network is not valid for the combined network.
- Args:
- resultslist of Results objects
single process analysis results from .analyse_network or .analyse_single_process methods, where each object contains partial results for one or multiple processes
- Returns:
- dict
combined results object
- class idtxl.results.ResultsMultivariatePID(n_nodes, n_realisations, normalised)[source]¶
Bases:
idtxl.results.ResultsNetworkAnalysis
Store results of Multivariate Partial Information Decomposition (PID) analysis.
Provide a container for results of Multivariate Partial Information Decomposition (PID) algorithms.
Note that for convenience all dictionaries in this class can additionally be accessed using dot-notation:
>>> res_pid._single_target[2].source_1
or
>>> res_pid._single_target[2].['source_1'].
- Attributes:
- settingsdict
settings used for estimation of information theoretic measures and statistical testing
- data_propertiesdict
data properties, contains
n_nodes : int - total number of nodes in the network
n_realisations : int - number of samples available for analysis given the settings (e.g., a high maximum lag used in network inference, results in fewer data points available for estimation)
normalised : bool - indicates if data were z-standardised before the estimation
- targets_analysedlist
list of analysed targets
- get_single_target(target)[source]¶
Return results for a single target in the network.
Results for single targets include for each target
source_i : tuple - source variable i
selected_vars_sources : list of tuples - source variables used in PID estimation
avg : dict - avg pid {alpha -> float} where alpha is a redundancy lattice node
ptw : dict of dicts - ptw pid {rlz -> {alpha -> float} } where rlz is a single realisation of the random variables and alpha is a redundancy lattice node
current_value : tuple - current value used for analysis, described by target and sample index in the data
[estimator-specific settings]
- Args:
- targetint
target id
- Returns:
- dict
Results for single target. Note that for convenience dictionary entries can either be accessed via keywords (result[‘selected_vars_sources’]) or via dot-notation (result.selected_vars_sources).
- class idtxl.results.ResultsNetworkAnalysis(n_nodes, n_realisations, normalised)[source]¶
Bases:
idtxl.results.Results
- get_single_target(target, fdr=True)[source]¶
Return results for a single target in the network.
Return results for individual processes, contains for each process
Results for single targets include for each target
omnibus_te : float - TE-value for joint information transfer from all sources into the target
omnibus_pval : float - p-value of omnibus information transfer into the target
omnibus_sign : bool - significance of omnibus information transfer wrt. to the alpha_omnibus specified in the settings
selected_vars_sources : list of tuples - source variables with significant information about the current value
selected_vars_target : list of tuples - target variables with significant information about the current value
selected_sources_pval : array of floats - p-value for each selected variable
selected_sources_te : array of floats - TE-value for each selected variable
sources_tested : list of int - list of sources tested for the current target
current_value : tuple - current value used for analysis, described by target and sample index in the data
Setting fdr to True returns FDR-corrected results (Benjamini, 1995).
- Args:
- targetint
target id
- fdrbool [optional]
return FDR-corrected results, see documentation of network inference algorithms and stats.network_fdr (default=True)
- Returns:
- dict
Results for single target. Note that for convenience dictionary entries can either be accessed via keywords (result[‘selected_vars_sources’]) or via dot-notation (result.selected_vars_sources).
- get_target_sources(target, fdr=True)[source]¶
Return list of sources (parents) for given target.
- Args:
- targetint
target index
- fdrbool [optional]
if True, sources are returned for FDR-corrected results (default=True)
- property targets_analysed¶
Get index of the current_value.
- class idtxl.results.ResultsNetworkComparison(n_nodes, n_realisations, normalised)[source]¶
Bases:
idtxl.results.ResultsNetworkAnalysis
Store results of network comparison.
Provide a container for results of network comparison algorithms.
Note that for convenience all dictionaries in this class can additionally be accessed using dot-notation: res_network.settings.cmi_estimator or res_network.settings[‘cmi_estimator’].
- Attributes:
- settingsdict
settings used for estimation of information theoretic measures and statistical testing
- data_propertiesdict
data properties, contains
n_nodes : int - total number of nodes in the network
n_realisations : int - number of samples available for analysis given the settings (e.g., a high maximum lag used in network inference, results in fewer data points available for estimation)
normalised : bool - indicates if data were z-standardised before the estimation
- surrogate_distributiondict
for each target, surrogate distributions used for testing of each link into the target
- targets_analysedlist
list of analysed targets
- abdict
for each target, list of comparison results for all links into the target; True if link in condition A > link in condition B
- pvaldict
for each target, list of p-values for all compared links
- cmi_diff_absdict
for each target, list of absolute difference in interaction measure for all compared links
- data_propertiesdict
information regarding the data used for analysis
- settingsdict
settings used for comparison
- get_adjacency_matrix(weights='comparison')[source]¶
Return adjacency matrix.
Return adjacency matrix resulting from network inference. Multiple options for the weights are available.
- Args:
- weightsstr [optional]
can either be
‘union’: all links in the union network, i.e., all links that were tested for a difference
or return information for links with a significant difference
- ‘comparison’: True for links with a significant difference in
inferred effective connectivity (default)
- ‘pvalue’: absolute differences in inferred effective
connectivity for significant links
‘diff_abs’: absolute difference
- Returns:
AdjacencyMatrix instance
- get_single_target(target)[source]¶
Return results for a single target in the network.
Results for single targets include for each target
sources : list of ints - list of sources inferred for the current target (union of sources from both data sets entering the comparison)
selected_vars_sources : list of tuples - source variables with significant information about the current value (union of both conditions)
selected_vars_target : list of tuples - target variables with significant information about the current value (union of both conditions)
- Args:
- targetint
target id
- Returns:
- dict
Results for single target. Note that for convenience dictionary entries can either be accessed via keywords (result[‘selected_vars_sources’]) or via dot-notation (result.selected_vars_sources).
- get_target_sources(target)[source]¶
Return list of sources (parents) for given target.
- Args:
- targetint
target index
- print_edge_list(weights='comparison')[source]¶
Print results of network comparison to console.
Print results of network comparison to console. Output looks like this:
>>> 0 -> 1, diff_abs = 0.2 >>> 0 -> 2, diff_abs = 0.5 >>> 0 -> 3, diff_abs = 0.7 >>> 3 -> 4, diff_abs = 1.3 >>> 4 -> 3, diff_abs = 0.4
indicating differences in the network inference measure for a link source -> target.
- Args:
- weightsstr [optional]
weights for the adjacency matrix (see documentation of method get_adjacency_matrix for details)
- class idtxl.results.ResultsNetworkInference(n_nodes, n_realisations, normalised)[source]¶
Bases:
idtxl.results.ResultsNetworkAnalysis
Store results of network inference.
Provide a container for results of network inference algorithms, e.g., MultivariateTE or Bivariate TE.
Note that for convenience all dictionaries in this class can additionally be accessed using dot-notation:
>>> res_network.settings.cmi_estimator
or
>>> res_network.settings['cmi_estimator'].
- Attributes:
- settingsdict
settings used for estimation of information theoretic measures and statistical testing
- data_propertiesdict
data properties, contains
n_nodes : int - total number of nodes in the network
n_realisations : int - number of samples available for analysis given the settings (e.g., a high maximum lag used in network inference, results in fewer data points available for estimation)
normalised : bool - indicates if data were z-standardised before estimation
- targets_analysedlist
list of analysed targets
- get_adjacency_matrix(weights, fdr=True)[source]¶
Return adjacency matrix.
Return adjacency matrix resulting from network inference. The adjacency matrix can either be generated from FDR-corrected results or uncorrected results. Multiple options for the weight are available.
- Args:
- weightsstr
can either be
- ‘max_te_lag’: the weights represent the source -> target
lag corresponding to the maximum tranfer entropy value (see documentation for method get_target_delays for details)
- ‘max_p_lag’: the weights represent the source -> target
lag corresponding to the maximum p-value (see documentation for method get_target_delays for details)
- ‘vars_count’: the weights represent the number of
statistically-significant source -> target lags
- ‘binary’: return unweighted adjacency matrix with binary
entries
1 = significant information transfer;
0 = no significant information transfer.
- fdrbool [optional]
return FDR-corrected results (default=True)
- Returns:
AdjacencyMatrix instance
- get_source_variables(fdr=True)[source]¶
Return list of inferred past source variables for all targets.
Return a list of dictionaries, where each dictionary holds the selected past source variables for one analysed target. The list may be used as and input to significant subgraph mining in the postprocessing module.
- Args:
- fdrbool [optional]
return FDR-corrected results (default=True)
- Returns:
- list of dicts
selected past source variables for each target
- get_target_delays(target, criterion='max_te', fdr=True)[source]¶
Return list of information-transfer delays for a given target.
Return a list of information-transfer delays for a given target. Information-transfer delays are determined by the lag of the variable in a source past that has the highest information transfer into the target process. There are two ways of identifying the variable with maximum information transfer:
use the variable with the highest absolute TE value (highest information transfer),
use the variable with the smallest p-value (highest statistical significance).
- Args:
- targetint
target index
- criterionstr [optional]
use maximum TE value (‘max_te’) or p-value (‘max_p’) to determine the source-target delay (default=’max_te’)
- fdrbool [optional]
return FDR-corrected results (default=True)
- Returns:
- numpy array
information-transfer delays for each source
- print_edge_list(weights, fdr=True)[source]¶
Print results of network inference to console.
Print edge list resulting from network inference to console. Output may look like this:
>>> 0 -> 1, max_te_lag = 2 >>> 0 -> 2, max_te_lag = 3 >>> 0 -> 3, max_te_lag = 2 >>> 3 -> 4, max_te_lag = 1 >>> 4 -> 3, max_te_lag = 1
The edge list can either be generated from FDR-corrected results or uncorrected results. Multiple options for the weight are available (see documentation of method get_adjacency_matrix for details).
- Args:
- weightsstr
link weights (see documentation of method get_adjacency_matrix for details)
- fdrbool [optional]
return FDR-corrected results (default=True)
- class idtxl.results.ResultsPID(n_nodes, n_realisations, normalised)[source]¶
Bases:
idtxl.results.ResultsNetworkAnalysis
Store results of Partial Information Decomposition (PID) analysis.
Provide a container for results of Partial Information Decomposition (PID) algorithms.
Note that for convenience all dictionaries in this class can additionally be accessed using dot-notation:
>>> res_pid._single_target[2].source_1
or
>>> res_pid._single_target[2].['source_1'].
- Attributes:
- settingsdict
settings used for estimation of information theoretic measures and statistical testing
- data_propertiesdict
data properties, contains
n_nodes : int - total number of nodes in the network
n_realisations : int - number of samples available for analysis given the settings (e.g., a high maximum lag used in network inference, results in fewer data points available for estimation)
normalised : bool - indicates if data were z-standardised before the estimation
- targets_analysedlist
list of analysed targets
- get_single_target(target)[source]¶
Return results for a single target in the network.
Results for single targets include for each target
source_1 : tuple - source variable 1
source_2 : tuple - source variable 2
selected_vars_sources : list of tuples - source variables used in PID estimation
s1_unq : float - unique information in source 1
s2_unq : float - unique information in source 2
syn_s1_s2 : float - synergistic information in sources 1 and 2
shd_s1_s2 : float - shared information in sources 1 and 2
current_value : tuple - current value used for analysis, described by target and sample index in the data
[estimator-specific settings]
- Args:
- targetint
target id
- Returns:
- dict
Results for single target. Note that for convenience dictionary entries can either be accessed via keywords (result[‘selected_vars_sources’]) or via dot-notation (result.selected_vars_sources).
- class idtxl.results.ResultsSingleProcessAnalysis(n_nodes, n_realisations, normalised)[source]¶
Bases:
idtxl.results.Results
Store results of single process analysis.
Provide a container for the results of algorithms for the analysis of individual processes (nodes) in a multivariate stochastic process, e.g., estimation of active information storage.
Note that for convenience all dictionaries in this class can additionally be accessed using dot-notation:
>>> res_network.settings.cmi_estimator
or
>>> res_network.settings['cmi_estimator'].
- Attributes:
- settingsdict
settings used for estimation of information theoretic measures and statistical testing
- data_propertiesdict
data properties, contains
n_nodes : int - total number of nodes in the network
n_realisations : int - number of samples available for analysis given the settings (e.g., a high maximum lag used in network inference, results in fewer data points available for estimation)
normalised : bool - indicates if data were z-standardised before estimation
- processes_analysedlist
list of analysed processes
- get_significant_processes(fdr=True)[source]¶
Return statistically-significant processes.
Indicates for each process whether AIS is statistically significant (equivalent to the adjacency matrix returned for network inference)
- Args:
- fdrbool [optional]
return FDR-corrected results, see documentation of network inference algorithms and stats.network_fdr (default=True)
- Returns:
- numpy array
Statistical significance for each process
- get_single_process(process, fdr=True)[source]¶
Return results for a single process in the network.
Return results for individual processes, contains for each process
ais : float - AIS-value for current process
ais_pval : float - p-value of AIS estimate
- ais_signbool - significance of AIS estimate wrt. to the
alpha_mi specified in the settings
- selected_varlist of tuples - variables with significant
information about the current value of the process that have been added to the processes past state, a variable is described by the index of the process in the data and its lag in samples
- current_valuetuple - current value used for analysis,
described by target and sample index in the data
Setting fdr to True returns FDR-corrected results (Benjamini, 1995).
- Args:
- processint
process id
- fdrbool [optional]
return FDR-corrected results, see documentation of network inference algorithms and stats.network_fdr (default=True)
- Returns:
- dict
results for single process. Note that for convenience dictionary entries can either be accessed via keywords (result[‘selected_vars’]) or via dot-notation (result.selected_vars).
- property processes_analysed¶
Get index of the current_value.
- class idtxl.results.ResultsSingleProcessRudelt(processes)[source]¶
Bases:
object
Store results of single process analysis.
Provides a container for the results Rudelt optimization algorithm. To obtain results for individual processes, call the .get_single_process() method (see docstring for details).
Note that for convenience all dictionaries in this class can additionally be accessed using dot-notation:
>>> res_network.settings.estimation_method
or
>>> res_network.settings['estimation_method'].
- Attributes:
- settingsdict
settings used for estimation of information theoretic measures
- data_propertiesdict
- data properties, contains
n_processes : int - total number of processes analysed
- processes_analysedlist
list of analysed processes
- get_single_process(process)[source]¶
Return results for a single process.
Return results for individual processes, contains for each process
- Args:
- processint
process id
- Returns:
- dict
results for single process. Note that for convenience dictionary entries can either be accessed via keywords (result[‘selected_vars’]) or via dot-notation (result.selected_vars). Contains keys
- Processint
Process that was optimized
- estimation_methodString
Estimation method that was used for optimization
- T_Dfloat
Estimated optimal value for the temporal depth TD
- tau_R :
Information timescale tau_R, a characteristic timescale of history dependence similar to an autocorrelation time.
- R_totfloat
Estimated value for the total history dependence Rtot,
- AIS_totfloat
Estimated value for the total active information storage
- opt_number_of_bins_dint
Number of bins d for the embedding that yields (R̂tot ,T̂D)
- opt_scaling_kint
Scaling exponent κ for the embedding that yields (R̂tot , T̂D)
- opt_first_bin_sizeint
Size of the first bin τ1 for the embedding that yields (R̂tot , T̂D ),
- history_dependencearray with floating-point values
Estimated history dependence for each embedding
- firing_ratefloat
Firing rate of the neuron/ spike train
- recording_lengthfloat
Length of the recording (in seconds)
- H_spikingfloat
Entropy of the spike times
if analyse_auto_MI was set to True additionally:
- auto_MIdict
numpy array of MI values for each delay
- auto_MI_delayslist of int
list of delays depending on the given auto_MI_bin_sizes and auto_MI_max_delay
- property processes_analysed¶
Get index of the current_value.
idtxl.stats module¶
Provide statistics functions.
- idtxl.stats.ais_fdr(settings=None, *results)[source]¶
Perform FDR-correction on results of network AIS estimation.
Perform correction of the false discovery rate (FDR) after estimation of active information storage (AIS) for all processes in the network. FDR correction is applied by correcting the AIS estimate’s omnibus p-values for individual processes/nodes in the network.
Input can be a list of partial results to combine results from parallel analysis.
References:
Genovese, C.R., Lazar, N.A., & Nichols, T. (2002). Thresholding of statistical maps in functional neuroimaging using the false discovery rate. Neuroimage, 15(4), 870-878.
- Args:
- settingsdict [optional]
parameters for statistical testing with entries:
alpha_fdr : float [optional] - critical alpha level (default=0.05)
fdr_constant : int [optional] - choose one of two constants used for calculating the FDR-thresholds according to Genovese (2002): 1 will divide alpha by 1, 2 will divide alpha by the sum_i(1/i); see the paper for details on the assumptions (default=2)
- resultsinstances of ResultsSingleProcessAnalysis
results of network AIS estimation, see documentation of ResultsSingleProcessAnalysis()
- Returns:
- ResultsSingleProcessAnalysis instance
input results objects pruned of non-significant estimates
- idtxl.stats.check_n_perm(n_perm, alpha)[source]¶
Check if no. permutations is big enough to obtain the requested alpha.
- Note:
The no. permutations must be big enough to theoretically allow for the detection of a p-value that is smaller than the critical alpha level. Otherwise the permutation test is pointless. The smalles possible p-value is 1/n_perm.
- idtxl.stats.max_statistic(analysis_setup, data, candidate_set, te_max_candidate, conditional)[source]¶
Perform maximum statistics for one candidate source.
Test if a transfer entropy value is significantly bigger than the maximum values obtained from surrogates of all remanining candidates.
- Args:
- analysis_setupMultivariateTE instance
information on the current analysis, can have an optional attribute ‘settings’, a dictionary with parameters for statistical testing:
n_perm_max_stat : int [optional] - number of permutations (default=200)
alpha_max_stat : float [optional] - critical alpha level (default=0.05)
permute_in_time : bool [optional] - generate surrogates by shuffling samples in time instead of shuffling whole replications (default=False)
- dataData instance
raw data
- candidate_setlist of tuples
list of indices of remaning candidates
- te_max_candidatefloat
transfer entropy value to be tested
- conditionalnumpy array
realisations of conditional, 2D numpy array where array dimensions represent [realisations x variable dimension]
- Returns:
- bool
statistical significance
- float
the test’s p-value
- numpy array
surrogate table
- Raises:
- ex.AlgorithmExhaustedError
Raised from _create_surrogate_table() when calculation cannot be made
- idtxl.stats.max_statistic_sequential(analysis_setup, data)[source]¶
Perform sequential maximum statistics for a set of candidate sources.
Test multivariate/bivariate MI/TE values against surrogates. Test highest TE/MI value against distribution of highest surrogate values, second highest against distribution of second highest, and so forth. Surrogates are created from each candidate in the candidate set, including the candidate that is currently tested. Surrogates are then sorted over candidates. This is repeated n_perm_max_seq times. Stop comparison if a TE/MI value is not significant compared to the distribution of surrogate values of the same rank. All smaller values are considered non-significant as well.
The conditional for estimation of MI/TE is taken from the current set of conditional variables in the analysis setup. For multivariate MI or TE surrogate creation, the full set of conditional variables is used. For bivariate MI or TE surrogate creation, the conditioning set has to be restricted to a subset of the current set of conditional variables: for bivariate MI no conditioning set is required, for bivariate TE only the past variables from the target are required (not the variables selected from other relevant sources).
This function will re-use the surrogate table created in the last min-stats round if that table is in the analysis_setup. This saves the complete calculation of surrogates for this statistic.
- Args:
- analysis_setupMultivariateTE instance
information on the current analysis, can have an optional attribute settings, a dictionary with parameters for statistical testing:
n_perm_max_seq : int [optional] - number of permutations (default=n_perm_min_stat|500)
alpha_max_seq : float [optional] - critical alpha level (default=0.05)
permute_in_time : bool [optional] - generate surrogates by shuffling samples in time instead of shuffling whole replications (default=False)
- dataData instance
raw data
- Returns:
- numpy array, bool
statistical significance of each source
- numpy array, float
the test’s p-values for each source
- numpy array, float
TE values for individual sources
- idtxl.stats.max_statistic_sequential_bivariate(analysis_setup, data)[source]¶
Perform sequential maximum statistics for a set of candidate sources.
Test multivariate/bivariate MI/TE values against surrogates. Test highest TE/MI value against distribution of highest surrogate values, second highest against distribution of second highest, and so forth. Surrogates are created from each candidate in the candidate set, including the candidate that is currently tested. Surrogates are then sorted over candidates. This is repeated n_perm_max_seq times. Stop comparison if a TE/MI value is not significant compared to the distribution of surrogate values of the same rank. All smaller values are considered non-significant as well.
The conditional for estimation of MI/TE is taken from the current set of conditional variables in the analysis setup. For multivariate MI or TE surrogate creation, the full set of conditional variables is used. For bivariate MI or TE surrogate creation, the conditioning set has to be restricted to a subset of the current set of conditional variables: for bivariate MI no conditioning set is required, for bivariate TE only the past variables from the target are required (not the variables selected from other relevant sources).
This function will re-use the surrogate table created in the last min-stats round if that table is in the analysis_setup. This saves the complete calculation of surrogates for this statistic.
- Args:
- analysis_setupMultivariateTE instance
information on the current analysis, can have an optional attribute settings, a dictionary with parameters for statistical testing:
n_perm_max_seq : int [optional] - number of permutations (default=n_perm_min_stat|500)
alpha_max_seq : float [optional] - critical alpha level (default=0.05)
permute_in_time : bool [optional] - generate surrogates by shuffling samples in time instead of shuffling whole replications (default=False)
- dataData instance
raw data
- Returns:
- numpy array, bool
statistical significance of each source
- numpy array, float
the test’s p-values for each source
- numpy array, float
TE values for individual sources
- idtxl.stats.mi_against_surrogates(analysis_setup, data)[source]¶
Test estimated mutual information for significance against surrogate data.
Shuffle realisations of the current value (point to be predicted) and re- calculate mutual information (MI) for shuffled data. The actual estimated MI is then compared against this distribution of MI values from surrogate data.
- Args:
- analysis_setupMultivariateTE instance
information on the current analysis, can have an optional attribute ‘settings’, a dictionary with parameters for statistical testing:
n_perm_mi : int [optional] - number of permutations (default=500)
alpha_mi : float [optional] - critical alpha level (default=0.05)
permute_in_time : bool [optional] - generate surrogates by shuffling samples in time instead of shuffling whole replications (default=False)
- dataData instance
raw data
- Returns:
- float
estimated MI value
- bool
statistical significance
- float
p_value for estimated MI value
- Raises:
- ex.AlgorithmExhaustedError
Raised from estimate() methods when calculation cannot be made
- idtxl.stats.min_statistic(analysis_setup, data, candidate_set, te_min_candidate, conditional=None)[source]¶
Perform minimum statistics for one candidate source.
Test if a transfer entropy value is significantly bigger than the minimum values obtained from surrogates of all remanining candidates.
- Args:
- analysis_setupMultivariateTE instance
information on the current analysis, can have an optional attribute ‘settings’, a dictionary with parameters for statistical testing:
n_perm_min_stat : int [optional] - number of permutations (default=500)
alpha_min_stat : float [optional] - critical alpha level (default=0.05)
permute_in_time : bool [optional] - generate surrogates by shuffling samples in time instead of shuffling whole replications (default=False)
- dataData instance
raw data
- candidate_setlist of tuples
list of indices of remaning candidates
- te_min_candidatefloat
transfer entropy value to be tested
- conditionalnumpy array [optional]
realisations of conditional, 2D numpy array where array dimensions represent [realisations x variable dimension] (default=None, no conditioning performed)
- Returns:
- bool
statistical significance
- float
the test’s p-value
- numpy array
surrogate table
- Raises:
- ex.AlgorithmExhaustedError
Raised from _create_surrogate_table() when calculation cannot be made
- idtxl.stats.network_fdr(settings=None, *results)[source]¶
Perform FDR-correction on results of network inference.
Perform correction of the false discovery rate (FDR) after network analysis. FDR correction can either be applied at the target level (by correcting omnibus p-values) or at the single-link level (by correcting p-values of individual links between single samples and the target).
Input can be a list of partial results to combine results from parallel analysis.
References:
Genovese, C.R., Lazar, N.A., & Nichols, T. (2002). Thresholding of statistical maps in functional neuroimaging using the false discovery rate. Neuroimage, 15(4), 870-878.
- Args:
- settingsdict [optional]
parameters for statistical testing with entries:
alpha_fdr : float [optional] - critical alpha level (default=0.05)
correct_by_target : bool [optional] - if true correct p-values on on the target level (omnibus test p-values), otherwise correct p_values for individual variables (sequential max stats p-values) (default=True)
fdr_constant : int [optional] - choose one of two constants used for calculating the FDR-thresholds according to Genovese (2002): 1 will divide alpha by 1, 2 will divide alpha by the sum_i(1/i); see the paper for details on the assumptions (default=2)
- resultsinstances of ResultsNetworkInference
results of network inference, see documentation of ResultsNetworkInference()
- Returns:
- ResultsNetworkInference instance
input object pruned of non-significant links
- idtxl.stats.omnibus_test(analysis_setup, data)[source]¶
Perform an omnibus test on identified conditional variables.
Test the joint information transfer from all identified sources to the current value conditional on candidates in the target’s past. To test for significance, this is repeated for shuffled realisations of the sources. The distribution of values from shuffled data is then used as test distribution.
- Args:
- analysis_setupMultivariateTE instance
information on the current analysis, can have an optional attribute ‘settings’, a dictionary with parameters for statistical testing:
n_perm_omnibus : int [optional] - number of permutations (default=500)
alpha_omnibus : float [optional] - critical alpha level (default=0.05)
permute_in_time : bool [optional] - generate surrogates by shuffling samples in time instead of shuffling whole replications (default=False)
- dataData instance
raw data
- Returns:
- bool
statistical significance
- float
the test’s p-value
- float
the estimated test statistic, i.e., the information transfer from all sources into the target
- Raises:
- ex.AlgorithmExhaustedError
Raised from estimate() calls when calculation cannot be made
- idtxl.stats.syn_shd_against_surrogates(analysis_setup, data)[source]¶
Test the shared/synergistic information in the PID estimate.
Shuffle realisations of the target and re-calculate PID, in particular the synergistic and shared information from shuffled data. The original shared and synergistic information are then compared against the distribution of values calculated from surrogate data.
- Args:
- analysis_setupPartial_information_decomposition instance
information on the current analysis, should have an Attribute ‘settings’, a dict with optional fields
n_perm : int [optional] - number of permutations (default=500)
alpha : float [optional] - critical alpha level (default=0.05)
permute_in_time : bool [optional] - generate surrogates by shuffling samples in time instead of shuffling whole replications (default=False)
- dataData instance
raw data
- Returns:
- dict
PID estimate from original data
- bool
statistical significance of the shared information
- float
p-value of the shared information
- bool
statistical significance of the synergistic information
- float
p-value of the synergistic information
- idtxl.stats.unq_against_surrogates(analysis_setup, data)[source]¶
Test the unique information in the PID estimate against surrogate data.
Shuffle realisations of both sources individually and re-calculate PID, in particular the unique information from shuffled data. The original unique information is then compared against the distribution of values calculated from surrogate data.
- Args:
- analysis_setupPartial_information_decomposition instance
information on the current analysis, should have an Attribute ‘settings’, a dict with optional fields
n_perm : int [optional] - number of permutations (default=500)
alpha : float [optional] - critical alpha level (default=0.05)
permute_in_time : bool [optional] - generate surrogates by shuffling samples in time instead of shuffling whole replications (default=False)
- dataData instance
raw data
- Returns:
- dict
PID estimate from original data
- bool
statistical significance of the unique information in source 1
- float
p-value of the unique information in source 1
- bool
statistical significance of the unique information in source 2
- float
p-value of the unique information in source 2
idtxl.visualise_graph module¶
Plot results of network inference.
- idtxl.visualise_graph.plot_mute_graph()[source]¶
Plot MuTE example network.
Network of 5 AR-processes, which is used as an example the paper on the MuTE toolbox (Montalto, PLOS ONE, 2014, eq. 14). The network consists of five autoregressive (AR) processes with model orders 2 and les and the following (non-linear) couplings:
>>> 0 -> 1, u = 2 >>> 0 -> 2, u = 3 >>> 0 -> 3, u = 2 (non-linear) >>> 3 -> 4, u = 1 >>> 4 -> 3, u = 1
- Returns:
- Figure handle
Figure object from the matplotlib package
- idtxl.visualise_graph.plot_network(results, weights, fdr=True)[source]¶
Plot network of multivariate TE between processes.
Plot graph of the network of (multivariate) interactions between processes (e.g., multivariate TE). The function uses the networkx class for directed graphs (DiGraph) internally. Plots a network and adjacency matrix.
- Args:
- resultsResultsNetworkInference() instance
output of an network inference algorithm
- weightsstr
for single network inference, it can either be
- ‘max_te_lag’: the weights represent the source -> target
lag corresponding to the maximum transfer entropy value (see documentation for method get_target_delays for details)
- ‘max_p_lag’: the weights represent the source -> target
lag corresponding to the maximum p-value (see documentation for method get_target_delays for details)
- ‘vars_count’: the weights represent the number of
statistically-significant source -> target lags
- ‘binary’: return unweighted adjacency matrix with binary
entries
1 = significant information transfer;
0 = no significant information transfer.
for network comparison, it can either be
‘union’: all links in the union network, i.e., all links that were tested for a difference
- ‘comparison’: True for links with a significant difference in
inferred effective connectivity (default)
- ‘pvalue’: absolute differences in inferred effective
connectivity for significant links
‘diff_abs’: absolute difference
- fdrbool [optional]
print FDR-corrected results (default=True)
- Returns:
- DiGraph
instance of a directed graph class from the networkx package
- Figure
figure handle, Figure object from the matplotlib package
- idtxl.visualise_graph.plot_network_comparison(results)[source]¶
Plot results of network comparison.
Plot results of network comparison. Produces a figure with five subplots, where the first plot shows the network graph of the union network, the second plot shows the adjacency matrix of the union network, the third plot shows the qualitative results of the comparison of each link, the fourth plot shows the absolute differences in CMI per link, and the fifth plot shows p-values for each link.
- Args:
- resultsResultsNetworkComparison() instance
network comparison results
- Returns:
- DiGraph
instance of a directed graph class from the networkx package
- Figure
figure handle, Figure object from the matplotlib package
- idtxl.visualise_graph.plot_selected_vars(results, target, sign_sources=True, display_edge_labels=False, fdr=True)[source]¶
Plot network of a target process and single variables.
Plot graph of the network of (multivariate) interactions between source variables and the target. The function uses the networkx class for directed graphs (DiGraph) internally. Plots a network and reduced adjacency matrix.
- Args:
- resultsResultsNetworkInference() instance
output of an network inference algorithm
- targetint
index of target process
- sign_sourcesbool [optional]
plot sources with significant information contribution only (default=True)
- display_edge_labelsbool [optional]
display TE value on edge lables (default=False)
- fdrbool [optional]
print FDR-corrected results (default=True)
- Returns:
- DiGraph
instance of a directed graph class from the networkx package
- Figure
figure handle, Figure object from the matplotlib package
Module contents¶
IDTxl: Information Dynamics Toolkit xl.
IDTxl is a comprehensive software package for efficient inference of networks and their node dynamics from multivariate time series data using information theory. IDTxl provides functionality to estimate the following measures:
For network inference:
multivariate transfer entropy (TE)/Granger causality (GC)
multivariate mutual information (MI)
bivariate TE/GC
bivariate MI
For analysis of node dynamics:
active information storage (AIS)
partial information decomposition (PID)
IDTxl implements estimators for discrete and continuous data with parallel computing engines for both GPU and CPU platforms. Written for Python3.4.3+.