The Data Class

class idtxl.data.Data(data=None, dim_order='psr', normalise=True, seed=None)[source]

Store data for information dynamics estimation.

Data takes a 1- to 3-dimensional array representing realisations of random variables in dimensions: processes, samples (over time), and replications. If necessary, data reshapes provided realisations to fit the format expected by IDTxl, which is a 3-dimensional array with axes representing (process index, sample index, replication index). Indicate the actual order of dimensions in the provided array in a three-character string, e.g. ‘spr’ for an array with realisations over (1) samples in time, (2) processes, (3) replications.

Example:

>>> data_mute = Data()              # initialise empty data object
>>> data_mute.generate_mute_data()  # simulate data from MuTE paper
>>>
>>> # Create data objects with data of various sizes
>>> d = np.arange(10000).reshape((2, 1000, 5))  # 2 procs.,
>>> data_1 = Data(d, dim_order='psr')           # 1000 samples, 5 repl.
>>>
>>> d = np.arange(3000).reshape((3, 1000))  # 3 procs.,
>>> data_2 = Data(d, dim_order='ps')        # 1000 samples
>>>
>>> # Overwrite data in existing object with random data
>>> d = np.arange(5000)
>>> data_2.set_data(data_new, 's')
Note:

Realisations are stored as attribute ‘data’. This can only be set via the ‘set_data()’ method.

Args:
datanumpy array [optional]

1/2/3-dimensional array with raw data

dim_orderstring [optional]

order of dimensions, accepts any combination of the characters ‘p’, ‘s’, and ‘r’ for processes, samples, and replications; must have the same length as the data dimensionality, e.g., ‘ps’ for a two-dimensional array of data from several processes over time (default=’psr’)

normalisebool [optional]

if True, data gets normalised per process (default=True)

seedint [optional]

can be set to a fixed integer to get repetitive results on the same data with multiple runs of analyses. Otherwise a random seed is set as default.

Attributes:
datanumpy array

realisations, can only be set via ‘set_data’ method

n_processesint

number of processes

n_replicationsint

number of replications

n_samplesint

number of samples in time

normalisebool

if true, all data gets z-standardised per process

initial_statearray

initial state of the seed for shuffled permutations

property data

Return data array.

generate_logistic_maps_data(n_samples=1000, n_replications=10, coefficient_matrices=array([[[0.5, 0.], [0.4, 0.5]]]), noise_std=0.1)[source]

Generate discrete-time coupled-logistic-maps time series.

Generate data and overwrite the instance’s current data.

The implemented logistic map function is f(x) = 4 * x * (1 - x).

Args:
n_samplesint [optional]

number of samples simulated for each process and replication

n_replicationsint [optional]

number of replications

coefficient_matricesnumpy array [optional]

coefficient matrices: numpy array with dimensions (order, number of processes, number of processes). Each square coefficient matrix corresponds to a lag, starting from lag=1. The total number of provided matrices implicitly determines the order of the stochastic process. (default = np.array([[[0.5, 0], [0.4, 0.5]]]))

noise_stdfloat [optional]

standard deviation of uncorrelated Gaussian noise (default = 0.1)

generate_mute_data(n_samples=1000, n_replications=10)[source]

Generate example data for a 5-process network.

Generate example data and overwrite the instance’s current data. The network is used as an example the paper on the MuTE toolbox (Montalto, PLOS ONE, 2014, eq. 14) and was orginally proposed by Baccala & Sameshima (2001). The network consists of five auto-regressive (AR) processes with model orders 2 and the following (non-linear) couplings:

0 -> 1, u = 2 (non-linear) 0 -> 2, u = 3 0 -> 3, u = 2 (non-linear) 3 -> 4, u = 1 4 -> 3, u = 1

References:

  • Montalto, A., Faes, L., & Marinazzo, D. (2014) MuTE: A MATLAB toolbox to compare established and novel estimators of the multivariate transfer entropy. PLoS ONE 9(10): e109462. https://doi.org/10.1371/journal.pone.0109462

  • Baccala, L.A. & Sameshima, K. (2001). Partial directed coherence: a new concept in neural structure determination. Biol Cybern 84: 463–474. https://doi.org/10.1007/PL00007990

Args:
n_samplesint

number of samples simulated for each process and replication

n_replicationsint

number of replications

generate_var_data(n_samples=1000, n_replications=10, coefficient_matrices=array([[[0.5, 0.], [0.4, 0.5]]]), noise_std=0.1)[source]

Generate discrete-time VAR (vector auto-regressive) time series.

Generate data and overwrite the instance’s current data.

Args:
n_samplesint [optional]

number of samples simulated for each process and replication

n_replicationsint [optional]

number of replications

coefficient_matricesnumpy array [optional]

coefficient matrices: numpy array with dimensions (VAR order, number of processes, number of processes). Each square coefficient matrix corresponds to a lag, starting from lag=1. The total number of provided matrices implicitly determines the order of the VAR process. (default = np.array([[[0.5, 0], [0.4, 0.5]]]))

noise_stdfloat [optional]

standard deviation of uncorrelated Gaussian noise (default = 0.1)

get_realisations(current_value, idx_list, shuffle=False)[source]

Return realisations for a list of indices.

Return realisations for indices in list. Optionally, realisations can be shuffled to create surrogate data for statistical testing. For shuffling, data blocks are permuted over replications while their temporal order stays intact within replications:

Original data:

repl. ind.

1 1 1 1

2 2 2 2

3 3 3 3

4 4 4 4

5 5 5 5

sample index

1 2 3 4

1 2 3 4

1 2 3 4

1 2 3 4

1 2 3 4

Shuffled data:

repl. ind.

3 3 3 3

1 1 1 1

4 4 4 4

2 2 2 2

5 5 5 5

sample index

1 2 3 4

1 2 3 4

1 2 3 4

1 2 3 4

1 2 3 4

Args:
idx_list: list of tuples

variable indices

current_valuetuple

index of the current value in current analysis, has to have the form (idx process, idx sample); if current_value == idx, all samples for a process are returned

shuffle: bool

if true permute blocks of replications over trials

Returns:
numpy array

realisations with dimensions (no. samples * no.replications) x number of indices

numpy array

replication index for each realisation with dimensions (no. samples * no.replications) x number of indices

get_seed()[source]

return the initial seed of the data

get_state()[source]

return the current state of the random seed

n_realisations(current_value=None)[source]

Number of realisations over samples and replications.

Args:
current_valuetuple [optional]

reference point for calculation of number of realisations (e.g. when using an embedding of length k, we count realisations from the k+1th sample because we loose the first k samples to the embedding); if no current_value is provided, the number of all samples is used

n_realisations_repl()[source]

Number of realisations over replications.

n_realisations_samples(current_value=None)[source]

Number of realisations over samples.

Args:
current_valuetuple [optional]

reference point for calculation of number of realisations (e.g. when using an embedding of length k, the current value is at sample k + 1; we thus count realisations from the k + 1st sample because we loose the first k samples to the embedding)

permute_replications(current_value, idx_list)[source]

Return realisations with permuted replications (time stays intact).

Create surrogate data by permuting realisations over replications while keeping the temporal structure (order of samples) intact. Return realisations for all indices in the list, where an index is expected to have the form (process index, sample index). Realisations are permuted block-wise by permuting the order of replications:

Original data:

repl. ind.

1 1 1 1

2 2 2 2

3 3 3 3

4 4 4 4

5 5 5 5

sample index

1 2 3 4

1 2 3 4

1 2 3 4

1 2 3 4

1 2 3 4

Permuted data:

repl. ind.

3 3 3 3

1 1 1 1

4 4 4 4

2 2 2 2

5 5 5 5

sample index

1 2 3 4

1 2 3 4

1 2 3 4

1 2 3 4

1 2 3 4

Args:
current_valuetuple

index of the current_value in the data

idx_listlist of tuples

indices of variables

Returns:
numpy array

permuted realisations with dimensions replications x number of indices

numpy array

replication index for each realisation

Raises:

TypeError if idx_realisations is not a list

permute_samples(current_value, idx_list, perm_settings)[source]

Return realisations with permuted samples (repl. stays intact).

Create surrogate data by permuting realisations over samples (time) while keeping the order of replications intact. Surrogates can be created for multiple variables in parallel, where variables are provided as a list of indices. An index is expected to have the form (process index, sample index).

Permuting samples in time is the fall-back option for surrogate data creation. The default method for surrogate data creation is the permutation of replications, while keeping the order of samples in time intact. If the number of replications is too small to allow for a sufficient number of permutations for the generation of surrogate data, permutation of samples in time is chosen instead.

Different permutation strategies can be chosen to permute realisations in time. Note that if data consists of multiple replications, within each replication, samples are shuffled following the same permutation pattern:

Original data:

repl. ind.

1 1 1 1 1 1 1 1

2 2 2 2 2 2 2 2

3 3 3 3 3 3 3 3

sample index

1 2 3 4 5 6 7 8

1 2 3 4 5 6 7 8

1 2 3 4 5 6 7 8

Circular shift by a random number of samples, e.g. 4 samples:

repl. ind.

1 1 1 1 1 1 1 1

2 2 2 2 2 2 2 2

3 3 3 3 3 3 3 3

sample index

5 6 7 8 1 2 3 4

5 6 7 8 1 2 3 4

5 6 7 8 1 2 3 4

Permute blocks of 3 samples:

repl. ind.

1 1 1 1 1 1 1 1

2 2 2 2 2 2 2 2

3 3 3 3 3 3 3 3

sample index

4 5 6 7 8 1 2 3

4 5 6 7 8 1 2 3

4 5 6 7 8 1 2 3

Permute data locally within a range of 4 samples:

repl. ind.

1 1 1 1 1 1 1 1

2 2 2 2 2 2 2 2

3 3 3 3 3 3 3 3

sample index

1 2 4 3 8 5 6 7

1 2 4 3 8 5 6 7

1 2 4 3 8 5 6 7

Random permutation:

repl. ind.

1 1 1 1 1 1 1 1

2 2 2 2 2 2 2 2

3 3 3 3 3 3 3 3

sample index

4 2 5 7 1 3 2 6

4 2 5 7 1 3 2 6

4 2 5 7 1 3 2 6

Args:
current_valuetuple

index of the current_value in the data

idx_listlist of tuples

indices of variables

perm_settingsdict

settings specifying the allowed permutations:

  • perm_type : str permutation type, can be

    • ‘random’: swaps samples at random,

    • ‘circular’: shifts time series by a random number of samples

    • ‘block’: swaps blocks of samples,

    • ‘local’: swaps samples within a given range, or

  • additional settings depending on the perm_type (n is the number of samples):

    • if perm_type == ‘circular’:

      ‘max_shift’int

      the maximum number of samples for shifting (e.g., number of samples / 2)

    • if perm_type == ‘block’:

      ‘block_size’int

      no. samples per block (e.g., number of samples / 10)

      ‘perm_range’int

      range in which blocks can be swapped (e.g., number of samples / block_size)

    • if perm_type == ‘local’:

      ‘perm_range’int

      range in samples over which realisations can be permuted (e.g., number of samples / 10)

Returns:
numpy array

permuted realisations with dimensions replications x number of indices

numpy array

sample index for each realisation

Raises:

TypeError if idx_realisations is not a list

Note:

This permutation scheme is the fall-back option if surrogate data can not be created by shuffling replications because the number of replications is too small to generate the requested number of permutations.

set_data(data, dim_order)[source]

Overwrite data in an existing Data object.

Args:
datanumpy array

1- to 3-dimensional array of realisations

dim_orderstring

order of dimensions, accepts any combination of the characters ‘p’, ‘s’, and ‘r’ for processes, samples, and replications; must have the same length as number of dimensions in data

slice_permute_replications(process)[source]

Return data slice with permuted replications (time stays intact).

Create surrogate data by permuting realisations over replications while keeping the temporal structure (order of samples) intact. Return realisations for all indices in the list, where an index is expected to have the form (process index, sample index). Realisations are permuted block-wise by permuting the order of replications

slice_permute_samples(process, perm_settings)[source]

Return slice of data with permuted samples (repl. stays intact).

Create surrogate data by permuting data in a slice over samples (time) while keeping the order of replications intact. Return slice for the entry specified by ‘process’. Realisations are permuted according to the settings specified in perm_settings:

Original data:

repl. ind.

1 1 1 1 1 1 1 1

2 2 2 2 2 2 2 2

3 3 3 3 3 3 3 3

sample index

1 2 3 4 5 6 7 8

1 2 3 4 5 6 7 8

1 2 3 4 5 6 7 8

Circular shift by 2, 6, and 4 samples:

repl. ind.

1 1 1 1 1 1 1 1

2 2 2 2 2 2 2 2

3 3 3 3 3 3 3 3

sample index

7 8 1 2 3 4 5 6

3 4 5 6 7 8 1 2

5 6 7 8 1 2 3 4

Permute blocks of 3 samples:

repl. ind.

1 1 1 1 1 1 1 1

2 2 2 2 2 2 2 2

3 3 3 3 3 3 3 3

sample index

4 5 6 7 8 1 2 3

1 2 3 7 8 4 5 6

7 8 4 5 6 1 2 3

Permute data locally within a range of 4 samples:

repl. ind.

1 1 1 1 1 1 1 1

2 2 2 2 2 2 2 2

3 3 3 3 3 3 3 3

sample index

1 2 4 3 8 5 6 7

4 1 2 3 5 7 8 6

3 1 2 4 8 5 6 7

Random permutation:

repl. ind.

1 1 1 1 1 1 1 1

2 2 2 2 2 2 2 2

3 3 3 3 3 3 3 3

sample index

4 2 5 7 1 3 2 6

7 5 3 4 2 1 8 5

1 2 4 3 6 8 7 5

Permuting samples is the fall-back option for surrogate creation if the number of replications is too small to allow for a sufficient number of permutations for the generation of surrogate data.

Args:
processint

process for which to return data slice

perm_settingsdict

settings specifying the allowed permutations:

  • perm_type : str permutation type, can be

    • ‘circular’: shifts time series by a random number of samples

    • ‘block’: swaps blocks of samples,

    • ‘local’: swaps samples within a given range, or

    • ‘random’: swaps samples at random,

  • additional settings depending on the perm_type (n is the number of samples):

    • if perm_type == ‘circular’:

      ‘max_shift’int

      the maximum number of samples for shifting (default=n/2)

    • if perm_type == ‘block’:

      ‘block_size’int

      no. samples per block (default=n/10)

      ‘perm_range’int

      range in which blocks can be swapped (default=max)

    • if perm_type == ‘local’:

      ‘perm_range’int

      range in samples over which realisations can be permuted (default=n/10)

Returns:
numpy array

data slice with data permuted over samples with dimensions samples x number of replications

numpy array

index of permuted samples

Note:

This permutation scheme is the fall-back option if the number of replications is too small to allow a sufficient number of permutations for the generation of surrogate data.