Helper functions

utils module

Provide IDTxl utility functions.

idtxl.idtxl_utils.argsort_descending(a)[source]

Sort array in descending order and return sortind indices.

idtxl.idtxl_utils.autocorrelation(x)[source]

Calculate autocorrelation of a vector.

idtxl.idtxl_utils.calculate_mi(corr)[source]

Calculate mutual information from correlation coefficient.

idtxl.idtxl_utils.combine_discrete_dimensions(a, numBins)[source]

Combine multi-dimensional discrete variable into a single dimension.

Combine all dimensions for a discrete variable down into a single dimensional value for each sample. This is done basically by multiplying each dimension by a different power of the base (numBins).

Adapted from infodynamics.utils.MatrixUtils.computeCombinedValues() from JIDT by J.Lizier.

Args:
anumpy array

data to be combined across all variable dimensions. Dimensions are realisations (samples) x variable dimension

numBinsint

number of discrete levels or bins for each variable dimension

Returns:
numpy array

a univariate array – one entry now for each sample, with all dimensions of the data now combined for that sample

idtxl.idtxl_utils.conflicting_entries(dict_1, dict_2)[source]

Test two dictionaries for unequal entries.

Note that only keys that are present in both dicts are compared. If one dictionary contains an entry not present in the other dictionary, the test passes.

idtxl.idtxl_utils.discretise(a, numBins)[source]

Discretise continuous data.

Discretise continuous data into discrete values (with 0 as lowest) by evenly partitioning the range of the data, one dimension at a time. Adapted from infodynamics.utils.MatrixUtils.discretise() from JIDT by J. Lizier.

Args:
anumpy array

data to be discretised. Dimensions are realisations x variable dimension

numBinsint

number of discrete levels or bins to partition the data into

Returns:
numpy array

discretised data

idtxl.idtxl_utils.discretise_max_ent(a, numBins)[source]

Discretise continuous data using maximum entropy partitioning.

Discretise continuous data into discrete values (with 0 as lowest) by making a maximum entropy partitioning, one dimension at a time. Adapted from infodynamics.utils.MatrixUtils.discretiseMaxEntropy() from JIDT by J. Lizier.

Args:
anumpy array

data to be discretised. Dimensions are realisations x variable dimension

numBinsint

number of discrete levels or bins to partition the data into

Returns:
numpy array

discretised data

idtxl.idtxl_utils.equal_dicts(dict_1, dict_2)[source]

Test two dictionaries for equality.

idtxl.idtxl_utils.print_dict(d, indent=4)[source]

Use Python’s pretty printer to print dictionaries to the console.

idtxl.idtxl_utils.remove_column(a, j)[source]

Remove a column from a numpy array.

This is faster than logical indexing (‘25 times faster’), because it does not make copies, see http://scipy.github.io/old-wiki/pages/PerformanceTips

Args:
anumpy array

2-dimensional numpy array

iint

column index to be removed

idtxl.idtxl_utils.remove_row(a, i)[source]

Remove a row from a numpy array.

This is faster than logical indexing (‘25 times faster’), because it does not make copies, see http://scipy.github.io/old-wiki/pages/PerformanceTips

Args:
anumpy array

2-dimensional numpy array

iint

row index to be removed

idtxl.idtxl_utils.separate_arrays(idx_all, idx_single, a)[source]

Separate a single column from all other columns in a 2D-array.

Return the separated single column and the remaining columns of a 2D- array.

Args:
idx_alllist<Object>

list of variables indicating the full set

idx_single<Object>

single variable indicating the column to be separated, variable must be contained in idx_all

anumpy array

2D-array with the same length along axis 1 as idx_all (.shape[1] == len(idx_all))

Returns:
numpy array

remaining columns in full array

numpy array

column at single index

idtxl.idtxl_utils.sort_descending(a)[source]

Sort array in descending order.

idtxl.idtxl_utils.standardise(a, dimension=0, df=1)[source]

Z-standardise a numpy array along a given dimension.

Standardise array along the axis defined in dimension using the denominator (N - df) for the calculation of the standard deviation.

Args:
anumpy array

data to be standardised

dimensionint [optional]

dimension along which array should be standardised

dfint [optional]

degrees of freedom for the denominator of the standard derivation

Returns:
numpy array

standardised data

idtxl.idtxl_utils.swap_chars(s, i_1, i_2)[source]

Swap to characters in a string.

Example:
>>> print(swap_chars('heLlotHere', 2, 6))
'heHlotLere'
class idtxl.idtxl_utils.timeout(timeout_duration, exception_message='Timeout')[source]

Context manager for a timeout using threading module.

Args:
timeout_duration: float

number of seconds to wait before timeout is triggered

exception_messagestring

message to put in the exception

stats module

Provide statistics functions.

idtxl.stats.ais_fdr(settings=None, *results)[source]

Perform FDR-correction on results of network AIS estimation.

Perform correction of the false discovery rate (FDR) after estimation of active information storage (AIS) for all processes in the network. FDR correction is applied by correcting the AIS estimate’s omnibus p-values for individual processes/nodes in the network.

Input can be a list of partial results to combine results from parallel analysis.

References:

  • Genovese, C.R., Lazar, N.A., & Nichols, T. (2002). Thresholding of statistical maps in functional neuroimaging using the false discovery rate. Neuroimage, 15(4), 870-878.

Args:
settingsdict [optional]

parameters for statistical testing with entries:

  • alpha_fdr : float [optional] - critical alpha level (default=0.05)

  • fdr_constant : int [optional] - choose one of two constants used for calculating the FDR-thresholds according to Genovese (2002): 1 will divide alpha by 1, 2 will divide alpha by the sum_i(1/i); see the paper for details on the assumptions (default=2)

resultsinstances of ResultsSingleProcessAnalysis

results of network AIS estimation, see documentation of ResultsSingleProcessAnalysis()

Returns:
ResultsSingleProcessAnalysis instance

input results objects pruned of non-significant estimates

idtxl.stats.check_n_perm(n_perm, alpha)[source]

Check if no. permutations is big enough to obtain the requested alpha.

Note:

The no. permutations must be big enough to theoretically allow for the detection of a p-value that is smaller than the critical alpha level. Otherwise the permutation test is pointless. The smalles possible p-value is 1/n_perm.

idtxl.stats.max_statistic(analysis_setup, data, candidate_set, te_max_candidate, conditional)[source]

Perform maximum statistics for one candidate source.

Test if a transfer entropy value is significantly bigger than the maximum values obtained from surrogates of all remanining candidates.

Args:
analysis_setupMultivariateTE instance

information on the current analysis, can have an optional attribute ‘settings’, a dictionary with parameters for statistical testing:

  • n_perm_max_stat : int [optional] - number of permutations (default=200)

  • alpha_max_stat : float [optional] - critical alpha level (default=0.05)

  • permute_in_time : bool [optional] - generate surrogates by shuffling samples in time instead of shuffling whole replications (default=False)

dataData instance

raw data

candidate_setlist of tuples

list of indices of remaning candidates

te_max_candidatefloat

transfer entropy value to be tested

conditionalnumpy array

realisations of conditional, 2D numpy array where array dimensions represent [realisations x variable dimension]

Returns:
bool

statistical significance

float

the test’s p-value

numpy array

surrogate table

Raises:
ex.AlgorithmExhaustedError

Raised from _create_surrogate_table() when calculation cannot be made

idtxl.stats.max_statistic_sequential(analysis_setup, data)[source]

Perform sequential maximum statistics for a set of candidate sources.

Test multivariate/bivariate MI/TE values against surrogates. Test highest TE/MI value against distribution of highest surrogate values, second highest against distribution of second highest, and so forth. Surrogates are created from each candidate in the candidate set, including the candidate that is currently tested. Surrogates are then sorted over candidates. This is repeated n_perm_max_seq times. Stop comparison if a TE/MI value is not significant compared to the distribution of surrogate values of the same rank. All smaller values are considered non-significant as well.

The conditional for estimation of MI/TE is taken from the current set of conditional variables in the analysis setup. For multivariate MI or TE surrogate creation, the full set of conditional variables is used. For bivariate MI or TE surrogate creation, the conditioning set has to be restricted to a subset of the current set of conditional variables: for bivariate MI no conditioning set is required, for bivariate TE only the past variables from the target are required (not the variables selected from other relevant sources).

This function will re-use the surrogate table created in the last min-stats round if that table is in the analysis_setup. This saves the complete calculation of surrogates for this statistic.

Args:
analysis_setupMultivariateTE instance

information on the current analysis, can have an optional attribute settings, a dictionary with parameters for statistical testing:

  • n_perm_max_seq : int [optional] - number of permutations (default=n_perm_min_stat|500)

  • alpha_max_seq : float [optional] - critical alpha level (default=0.05)

  • permute_in_time : bool [optional] - generate surrogates by shuffling samples in time instead of shuffling whole replications (default=False)

dataData instance

raw data

Returns:
numpy array, bool

statistical significance of each source

numpy array, float

the test’s p-values for each source

numpy array, float

TE values for individual sources

idtxl.stats.max_statistic_sequential_bivariate(analysis_setup, data)[source]

Perform sequential maximum statistics for a set of candidate sources.

Test multivariate/bivariate MI/TE values against surrogates. Test highest TE/MI value against distribution of highest surrogate values, second highest against distribution of second highest, and so forth. Surrogates are created from each candidate in the candidate set, including the candidate that is currently tested. Surrogates are then sorted over candidates. This is repeated n_perm_max_seq times. Stop comparison if a TE/MI value is not significant compared to the distribution of surrogate values of the same rank. All smaller values are considered non-significant as well.

The conditional for estimation of MI/TE is taken from the current set of conditional variables in the analysis setup. For multivariate MI or TE surrogate creation, the full set of conditional variables is used. For bivariate MI or TE surrogate creation, the conditioning set has to be restricted to a subset of the current set of conditional variables: for bivariate MI no conditioning set is required, for bivariate TE only the past variables from the target are required (not the variables selected from other relevant sources).

This function will re-use the surrogate table created in the last min-stats round if that table is in the analysis_setup. This saves the complete calculation of surrogates for this statistic.

Args:
analysis_setupMultivariateTE instance

information on the current analysis, can have an optional attribute settings, a dictionary with parameters for statistical testing:

  • n_perm_max_seq : int [optional] - number of permutations (default=n_perm_min_stat|500)

  • alpha_max_seq : float [optional] - critical alpha level (default=0.05)

  • permute_in_time : bool [optional] - generate surrogates by shuffling samples in time instead of shuffling whole replications (default=False)

dataData instance

raw data

Returns:
numpy array, bool

statistical significance of each source

numpy array, float

the test’s p-values for each source

numpy array, float

TE values for individual sources

idtxl.stats.mi_against_surrogates(analysis_setup, data)[source]

Test estimated mutual information for significance against surrogate data.

Shuffle realisations of the current value (point to be predicted) and re- calculate mutual information (MI) for shuffled data. The actual estimated MI is then compared against this distribution of MI values from surrogate data.

Args:
analysis_setupMultivariateTE instance

information on the current analysis, can have an optional attribute ‘settings’, a dictionary with parameters for statistical testing:

  • n_perm_mi : int [optional] - number of permutations (default=500)

  • alpha_mi : float [optional] - critical alpha level (default=0.05)

  • permute_in_time : bool [optional] - generate surrogates by shuffling samples in time instead of shuffling whole replications (default=False)

dataData instance

raw data

Returns:
float

estimated MI value

bool

statistical significance

float

p_value for estimated MI value

Raises:
ex.AlgorithmExhaustedError

Raised from estimate() methods when calculation cannot be made

idtxl.stats.min_statistic(analysis_setup, data, candidate_set, te_min_candidate, conditional=None)[source]

Perform minimum statistics for one candidate source.

Test if a transfer entropy value is significantly bigger than the minimum values obtained from surrogates of all remanining candidates.

Args:
analysis_setupMultivariateTE instance

information on the current analysis, can have an optional attribute ‘settings’, a dictionary with parameters for statistical testing:

  • n_perm_min_stat : int [optional] - number of permutations (default=500)

  • alpha_min_stat : float [optional] - critical alpha level (default=0.05)

  • permute_in_time : bool [optional] - generate surrogates by shuffling samples in time instead of shuffling whole replications (default=False)

dataData instance

raw data

candidate_setlist of tuples

list of indices of remaning candidates

te_min_candidatefloat

transfer entropy value to be tested

conditionalnumpy array [optional]

realisations of conditional, 2D numpy array where array dimensions represent [realisations x variable dimension] (default=None, no conditioning performed)

Returns:
bool

statistical significance

float

the test’s p-value

numpy array

surrogate table

Raises:
ex.AlgorithmExhaustedError

Raised from _create_surrogate_table() when calculation cannot be made

idtxl.stats.network_fdr(settings=None, *results)[source]

Perform FDR-correction on results of network inference.

Perform correction of the false discovery rate (FDR) after network analysis. FDR correction can either be applied at the target level (by correcting omnibus p-values) or at the single-link level (by correcting p-values of individual links between single samples and the target).

Input can be a list of partial results to combine results from parallel analysis.

References:

  • Genovese, C.R., Lazar, N.A., & Nichols, T. (2002). Thresholding of statistical maps in functional neuroimaging using the false discovery rate. Neuroimage, 15(4), 870-878.

Args:
settingsdict [optional]

parameters for statistical testing with entries:

  • alpha_fdr : float [optional] - critical alpha level (default=0.05)

  • correct_by_target : bool [optional] - if true correct p-values on on the target level (omnibus test p-values), otherwise correct p_values for individual variables (sequential max stats p-values) (default=True)

  • fdr_constant : int [optional] - choose one of two constants used for calculating the FDR-thresholds according to Genovese (2002): 1 will divide alpha by 1, 2 will divide alpha by the sum_i(1/i); see the paper for details on the assumptions (default=2)

resultsinstances of ResultsNetworkInference

results of network inference, see documentation of ResultsNetworkInference()

Returns:
ResultsNetworkInference instance

input object pruned of non-significant links

idtxl.stats.omnibus_test(analysis_setup, data)[source]

Perform an omnibus test on identified conditional variables.

Test the joint information transfer from all identified sources to the current value conditional on candidates in the target’s past. To test for significance, this is repeated for shuffled realisations of the sources. The distribution of values from shuffled data is then used as test distribution.

Args:
analysis_setupMultivariateTE instance

information on the current analysis, can have an optional attribute ‘settings’, a dictionary with parameters for statistical testing:

  • n_perm_omnibus : int [optional] - number of permutations (default=500)

  • alpha_omnibus : float [optional] - critical alpha level (default=0.05)

  • permute_in_time : bool [optional] - generate surrogates by shuffling samples in time instead of shuffling whole replications (default=False)

dataData instance

raw data

Returns:
bool

statistical significance

float

the test’s p-value

float

the estimated test statistic, i.e., the information transfer from all sources into the target

Raises:
ex.AlgorithmExhaustedError

Raised from estimate() calls when calculation cannot be made

idtxl.stats.syn_shd_against_surrogates(analysis_setup, data)[source]

Test the shared/synergistic information in the PID estimate.

Shuffle realisations of the target and re-calculate PID, in particular the synergistic and shared information from shuffled data. The original shared and synergistic information are then compared against the distribution of values calculated from surrogate data.

Args:
analysis_setupPartial_information_decomposition instance

information on the current analysis, should have an Attribute ‘settings’, a dict with optional fields

  • n_perm : int [optional] - number of permutations (default=500)

  • alpha : float [optional] - critical alpha level (default=0.05)

  • permute_in_time : bool [optional] - generate surrogates by shuffling samples in time instead of shuffling whole replications (default=False)

dataData instance

raw data

Returns:
dict

PID estimate from original data

bool

statistical significance of the shared information

float

p-value of the shared information

bool

statistical significance of the synergistic information

float

p-value of the synergistic information

idtxl.stats.unq_against_surrogates(analysis_setup, data)[source]

Test the unique information in the PID estimate against surrogate data.

Shuffle realisations of both sources individually and re-calculate PID, in particular the unique information from shuffled data. The original unique information is then compared against the distribution of values calculated from surrogate data.

Args:
analysis_setupPartial_information_decomposition instance

information on the current analysis, should have an Attribute ‘settings’, a dict with optional fields

  • n_perm : int [optional] - number of permutations (default=500)

  • alpha : float [optional] - critical alpha level (default=0.05)

  • permute_in_time : bool [optional] - generate surrogates by shuffling samples in time instead of shuffling whole replications (default=False)

dataData instance

raw data

Returns:
dict

PID estimate from original data

bool

statistical significance of the unique information in source 1

float

p-value of the unique information in source 1

bool

statistical significance of the unique information in source 2

float

p-value of the unique information in source 2