Helper functions¶
utils module¶
Provide IDTxl utility functions.
- idtxl.idtxl_utils.argsort_descending(a)[source]
Sort array in descending order and return sortind indices.
- idtxl.idtxl_utils.autocorrelation(x)[source]
Calculate autocorrelation of a vector.
- idtxl.idtxl_utils.calculate_mi(corr)[source]
Calculate mutual information from correlation coefficient.
- idtxl.idtxl_utils.combine_discrete_dimensions(a, numBins)[source]
Combine multi-dimensional discrete variable into a single dimension.
Combine all dimensions for a discrete variable down into a single dimensional value for each sample. This is done basically by multiplying each dimension by a different power of the base (numBins).
Adapted from infodynamics.utils.MatrixUtils.computeCombinedValues() from JIDT by J.Lizier.
- Args:
- anumpy array
data to be combined across all variable dimensions. Dimensions are realisations (samples) x variable dimension
- numBinsint
number of discrete levels or bins for each variable dimension
- Returns:
- numpy array
a univariate array – one entry now for each sample, with all dimensions of the data now combined for that sample
- idtxl.idtxl_utils.conflicting_entries(dict_1, dict_2)[source]
Test two dictionaries for unequal entries.
Note that only keys that are present in both dicts are compared. If one dictionary contains an entry not present in the other dictionary, the test passes.
- idtxl.idtxl_utils.discretise(a, numBins)[source]
Discretise continuous data.
Discretise continuous data into discrete values (with 0 as lowest) by evenly partitioning the range of the data, one dimension at a time. Adapted from infodynamics.utils.MatrixUtils.discretise() from JIDT by J. Lizier.
- Args:
- anumpy array
data to be discretised. Dimensions are realisations x variable dimension
- numBinsint
number of discrete levels or bins to partition the data into
- Returns:
- numpy array
discretised data
- idtxl.idtxl_utils.discretise_max_ent(a, numBins)[source]
Discretise continuous data using maximum entropy partitioning.
Discretise continuous data into discrete values (with 0 as lowest) by making a maximum entropy partitioning, one dimension at a time. Adapted from infodynamics.utils.MatrixUtils.discretiseMaxEntropy() from JIDT by J. Lizier.
- Args:
- anumpy array
data to be discretised. Dimensions are realisations x variable dimension
- numBinsint
number of discrete levels or bins to partition the data into
- Returns:
- numpy array
discretised data
- idtxl.idtxl_utils.equal_dicts(dict_1, dict_2)[source]
Test two dictionaries for equality.
- idtxl.idtxl_utils.print_dict(d, indent=4)[source]
Use Python’s pretty printer to print dictionaries to the console.
- idtxl.idtxl_utils.remove_column(a, j)[source]
Remove a column from a numpy array.
This is faster than logical indexing (‘25 times faster’), because it does not make copies, see http://scipy.github.io/old-wiki/pages/PerformanceTips
- Args:
- anumpy array
2-dimensional numpy array
- iint
column index to be removed
- idtxl.idtxl_utils.remove_row(a, i)[source]
Remove a row from a numpy array.
This is faster than logical indexing (‘25 times faster’), because it does not make copies, see http://scipy.github.io/old-wiki/pages/PerformanceTips
- Args:
- anumpy array
2-dimensional numpy array
- iint
row index to be removed
- idtxl.idtxl_utils.separate_arrays(idx_all, idx_single, a)[source]
Separate a single column from all other columns in a 2D-array.
Return the separated single column and the remaining columns of a 2D- array.
- Args:
- idx_alllist<Object>
list of variables indicating the full set
- idx_single<Object>
single variable indicating the column to be separated, variable must be contained in idx_all
- anumpy array
2D-array with the same length along axis 1 as idx_all (.shape[1] == len(idx_all))
- Returns:
- numpy array
remaining columns in full array
- numpy array
column at single index
- idtxl.idtxl_utils.sort_descending(a)[source]
Sort array in descending order.
- idtxl.idtxl_utils.standardise(a, dimension=0, df=1)[source]
Z-standardise a numpy array along a given dimension.
Standardise array along the axis defined in dimension using the denominator (N - df) for the calculation of the standard deviation.
- Args:
- anumpy array
data to be standardised
- dimensionint [optional]
dimension along which array should be standardised
- dfint [optional]
degrees of freedom for the denominator of the standard derivation
- Returns:
- numpy array
standardised data
- idtxl.idtxl_utils.swap_chars(s, i_1, i_2)[source]
Swap to characters in a string.
- Example:
>>> print(swap_chars('heLlotHere', 2, 6)) 'heHlotLere'
- class idtxl.idtxl_utils.timeout(timeout_duration, exception_message='Timeout')[source]
Context manager for a timeout using threading module.
- Args:
- timeout_duration: float
number of seconds to wait before timeout is triggered
- exception_messagestring
message to put in the exception
stats module¶
Provide statistics functions.
- idtxl.stats.ais_fdr(settings=None, *results)[source]
Perform FDR-correction on results of network AIS estimation.
Perform correction of the false discovery rate (FDR) after estimation of active information storage (AIS) for all processes in the network. FDR correction is applied by correcting the AIS estimate’s omnibus p-values for individual processes/nodes in the network.
Input can be a list of partial results to combine results from parallel analysis.
References:
Genovese, C.R., Lazar, N.A., & Nichols, T. (2002). Thresholding of statistical maps in functional neuroimaging using the false discovery rate. Neuroimage, 15(4), 870-878.
- Args:
- settingsdict [optional]
parameters for statistical testing with entries:
alpha_fdr : float [optional] - critical alpha level (default=0.05)
fdr_constant : int [optional] - choose one of two constants used for calculating the FDR-thresholds according to Genovese (2002): 1 will divide alpha by 1, 2 will divide alpha by the sum_i(1/i); see the paper for details on the assumptions (default=2)
- resultsinstances of ResultsSingleProcessAnalysis
results of network AIS estimation, see documentation of ResultsSingleProcessAnalysis()
- Returns:
- ResultsSingleProcessAnalysis instance
input results objects pruned of non-significant estimates
- idtxl.stats.check_n_perm(n_perm, alpha)[source]
Check if no. permutations is big enough to obtain the requested alpha.
- Note:
The no. permutations must be big enough to theoretically allow for the detection of a p-value that is smaller than the critical alpha level. Otherwise the permutation test is pointless. The smalles possible p-value is 1/n_perm.
- idtxl.stats.max_statistic(analysis_setup, data, candidate_set, te_max_candidate, conditional)[source]
Perform maximum statistics for one candidate source.
Test if a transfer entropy value is significantly bigger than the maximum values obtained from surrogates of all remanining candidates.
- Args:
- analysis_setupMultivariateTE instance
information on the current analysis, can have an optional attribute ‘settings’, a dictionary with parameters for statistical testing:
n_perm_max_stat : int [optional] - number of permutations (default=200)
alpha_max_stat : float [optional] - critical alpha level (default=0.05)
permute_in_time : bool [optional] - generate surrogates by shuffling samples in time instead of shuffling whole replications (default=False)
- dataData instance
raw data
- candidate_setlist of tuples
list of indices of remaning candidates
- te_max_candidatefloat
transfer entropy value to be tested
- conditionalnumpy array
realisations of conditional, 2D numpy array where array dimensions represent [realisations x variable dimension]
- Returns:
- bool
statistical significance
- float
the test’s p-value
- numpy array
surrogate table
- Raises:
- ex.AlgorithmExhaustedError
Raised from _create_surrogate_table() when calculation cannot be made
- idtxl.stats.max_statistic_sequential(analysis_setup, data)[source]
Perform sequential maximum statistics for a set of candidate sources.
Test multivariate/bivariate MI/TE values against surrogates. Test highest TE/MI value against distribution of highest surrogate values, second highest against distribution of second highest, and so forth. Surrogates are created from each candidate in the candidate set, including the candidate that is currently tested. Surrogates are then sorted over candidates. This is repeated n_perm_max_seq times. Stop comparison if a TE/MI value is not significant compared to the distribution of surrogate values of the same rank. All smaller values are considered non-significant as well.
The conditional for estimation of MI/TE is taken from the current set of conditional variables in the analysis setup. For multivariate MI or TE surrogate creation, the full set of conditional variables is used. For bivariate MI or TE surrogate creation, the conditioning set has to be restricted to a subset of the current set of conditional variables: for bivariate MI no conditioning set is required, for bivariate TE only the past variables from the target are required (not the variables selected from other relevant sources).
This function will re-use the surrogate table created in the last min-stats round if that table is in the analysis_setup. This saves the complete calculation of surrogates for this statistic.
- Args:
- analysis_setupMultivariateTE instance
information on the current analysis, can have an optional attribute settings, a dictionary with parameters for statistical testing:
n_perm_max_seq : int [optional] - number of permutations (default=n_perm_min_stat|500)
alpha_max_seq : float [optional] - critical alpha level (default=0.05)
permute_in_time : bool [optional] - generate surrogates by shuffling samples in time instead of shuffling whole replications (default=False)
- dataData instance
raw data
- Returns:
- numpy array, bool
statistical significance of each source
- numpy array, float
the test’s p-values for each source
- numpy array, float
TE values for individual sources
- idtxl.stats.max_statistic_sequential_bivariate(analysis_setup, data)[source]
Perform sequential maximum statistics for a set of candidate sources.
Test multivariate/bivariate MI/TE values against surrogates. Test highest TE/MI value against distribution of highest surrogate values, second highest against distribution of second highest, and so forth. Surrogates are created from each candidate in the candidate set, including the candidate that is currently tested. Surrogates are then sorted over candidates. This is repeated n_perm_max_seq times. Stop comparison if a TE/MI value is not significant compared to the distribution of surrogate values of the same rank. All smaller values are considered non-significant as well.
The conditional for estimation of MI/TE is taken from the current set of conditional variables in the analysis setup. For multivariate MI or TE surrogate creation, the full set of conditional variables is used. For bivariate MI or TE surrogate creation, the conditioning set has to be restricted to a subset of the current set of conditional variables: for bivariate MI no conditioning set is required, for bivariate TE only the past variables from the target are required (not the variables selected from other relevant sources).
This function will re-use the surrogate table created in the last min-stats round if that table is in the analysis_setup. This saves the complete calculation of surrogates for this statistic.
- Args:
- analysis_setupMultivariateTE instance
information on the current analysis, can have an optional attribute settings, a dictionary with parameters for statistical testing:
n_perm_max_seq : int [optional] - number of permutations (default=n_perm_min_stat|500)
alpha_max_seq : float [optional] - critical alpha level (default=0.05)
permute_in_time : bool [optional] - generate surrogates by shuffling samples in time instead of shuffling whole replications (default=False)
- dataData instance
raw data
- Returns:
- numpy array, bool
statistical significance of each source
- numpy array, float
the test’s p-values for each source
- numpy array, float
TE values for individual sources
- idtxl.stats.mi_against_surrogates(analysis_setup, data)[source]
Test estimated mutual information for significance against surrogate data.
Shuffle realisations of the current value (point to be predicted) and re- calculate mutual information (MI) for shuffled data. The actual estimated MI is then compared against this distribution of MI values from surrogate data.
- Args:
- analysis_setupMultivariateTE instance
information on the current analysis, can have an optional attribute ‘settings’, a dictionary with parameters for statistical testing:
n_perm_mi : int [optional] - number of permutations (default=500)
alpha_mi : float [optional] - critical alpha level (default=0.05)
permute_in_time : bool [optional] - generate surrogates by shuffling samples in time instead of shuffling whole replications (default=False)
- dataData instance
raw data
- Returns:
- float
estimated MI value
- bool
statistical significance
- float
p_value for estimated MI value
- Raises:
- ex.AlgorithmExhaustedError
Raised from estimate() methods when calculation cannot be made
- idtxl.stats.min_statistic(analysis_setup, data, candidate_set, te_min_candidate, conditional=None)[source]
Perform minimum statistics for one candidate source.
Test if a transfer entropy value is significantly bigger than the minimum values obtained from surrogates of all remanining candidates.
- Args:
- analysis_setupMultivariateTE instance
information on the current analysis, can have an optional attribute ‘settings’, a dictionary with parameters for statistical testing:
n_perm_min_stat : int [optional] - number of permutations (default=500)
alpha_min_stat : float [optional] - critical alpha level (default=0.05)
permute_in_time : bool [optional] - generate surrogates by shuffling samples in time instead of shuffling whole replications (default=False)
- dataData instance
raw data
- candidate_setlist of tuples
list of indices of remaning candidates
- te_min_candidatefloat
transfer entropy value to be tested
- conditionalnumpy array [optional]
realisations of conditional, 2D numpy array where array dimensions represent [realisations x variable dimension] (default=None, no conditioning performed)
- Returns:
- bool
statistical significance
- float
the test’s p-value
- numpy array
surrogate table
- Raises:
- ex.AlgorithmExhaustedError
Raised from _create_surrogate_table() when calculation cannot be made
- idtxl.stats.network_fdr(settings=None, *results)[source]
Perform FDR-correction on results of network inference.
Perform correction of the false discovery rate (FDR) after network analysis. FDR correction can either be applied at the target level (by correcting omnibus p-values) or at the single-link level (by correcting p-values of individual links between single samples and the target).
Input can be a list of partial results to combine results from parallel analysis.
References:
Genovese, C.R., Lazar, N.A., & Nichols, T. (2002). Thresholding of statistical maps in functional neuroimaging using the false discovery rate. Neuroimage, 15(4), 870-878.
- Args:
- settingsdict [optional]
parameters for statistical testing with entries:
alpha_fdr : float [optional] - critical alpha level (default=0.05)
correct_by_target : bool [optional] - if true correct p-values on on the target level (omnibus test p-values), otherwise correct p_values for individual variables (sequential max stats p-values) (default=True)
fdr_constant : int [optional] - choose one of two constants used for calculating the FDR-thresholds according to Genovese (2002): 1 will divide alpha by 1, 2 will divide alpha by the sum_i(1/i); see the paper for details on the assumptions (default=2)
- resultsinstances of ResultsNetworkInference
results of network inference, see documentation of ResultsNetworkInference()
- Returns:
- ResultsNetworkInference instance
input object pruned of non-significant links
- idtxl.stats.omnibus_test(analysis_setup, data)[source]
Perform an omnibus test on identified conditional variables.
Test the joint information transfer from all identified sources to the current value conditional on candidates in the target’s past. To test for significance, this is repeated for shuffled realisations of the sources. The distribution of values from shuffled data is then used as test distribution.
- Args:
- analysis_setupMultivariateTE instance
information on the current analysis, can have an optional attribute ‘settings’, a dictionary with parameters for statistical testing:
n_perm_omnibus : int [optional] - number of permutations (default=500)
alpha_omnibus : float [optional] - critical alpha level (default=0.05)
permute_in_time : bool [optional] - generate surrogates by shuffling samples in time instead of shuffling whole replications (default=False)
- dataData instance
raw data
- Returns:
- bool
statistical significance
- float
the test’s p-value
- float
the estimated test statistic, i.e., the information transfer from all sources into the target
- Raises:
- ex.AlgorithmExhaustedError
Raised from estimate() calls when calculation cannot be made
- idtxl.stats.syn_shd_against_surrogates(analysis_setup, data)[source]
Test the shared/synergistic information in the PID estimate.
Shuffle realisations of the target and re-calculate PID, in particular the synergistic and shared information from shuffled data. The original shared and synergistic information are then compared against the distribution of values calculated from surrogate data.
- Args:
- analysis_setupPartial_information_decomposition instance
information on the current analysis, should have an Attribute ‘settings’, a dict with optional fields
n_perm : int [optional] - number of permutations (default=500)
alpha : float [optional] - critical alpha level (default=0.05)
permute_in_time : bool [optional] - generate surrogates by shuffling samples in time instead of shuffling whole replications (default=False)
- dataData instance
raw data
- Returns:
- dict
PID estimate from original data
- bool
statistical significance of the shared information
- float
p-value of the shared information
- bool
statistical significance of the synergistic information
- float
p-value of the synergistic information
- idtxl.stats.unq_against_surrogates(analysis_setup, data)[source]
Test the unique information in the PID estimate against surrogate data.
Shuffle realisations of both sources individually and re-calculate PID, in particular the unique information from shuffled data. The original unique information is then compared against the distribution of values calculated from surrogate data.
- Args:
- analysis_setupPartial_information_decomposition instance
information on the current analysis, should have an Attribute ‘settings’, a dict with optional fields
n_perm : int [optional] - number of permutations (default=500)
alpha : float [optional] - critical alpha level (default=0.05)
permute_in_time : bool [optional] - generate surrogates by shuffling samples in time instead of shuffling whole replications (default=False)
- dataData instance
raw data
- Returns:
- dict
PID estimate from original data
- bool
statistical significance of the unique information in source 1
- float
p-value of the unique information in source 1
- bool
statistical significance of the unique information in source 2
- float
p-value of the unique information in source 2