The Data Class¶
- class idtxl.data.Data(data=None, dim_order='psr', normalise=True, seed=None)[source]
Store data for information dynamics estimation.
Data takes a 1- to 3-dimensional array representing realisations of random variables in dimensions: processes, samples (over time), and replications. If necessary, data reshapes provided realisations to fit the format expected by IDTxl, which is a 3-dimensional array with axes representing (process index, sample index, replication index). Indicate the actual order of dimensions in the provided array in a three-character string, e.g. ‘spr’ for an array with realisations over (1) samples in time, (2) processes, (3) replications.
Example:
>>> data_mute = Data() # initialise empty data object >>> data_mute.generate_mute_data() # simulate data from MuTE paper >>> >>> # Create data objects with data of various sizes >>> d = np.arange(10000).reshape((2, 1000, 5)) # 2 procs., >>> data_1 = Data(d, dim_order='psr') # 1000 samples, 5 repl. >>> >>> d = np.arange(3000).reshape((3, 1000)) # 3 procs., >>> data_2 = Data(d, dim_order='ps') # 1000 samples >>> >>> # Overwrite data in existing object with random data >>> d = np.arange(5000) >>> data_2.set_data(data_new, 's')
- Note:
Realisations are stored as attribute ‘data’. This can only be set via the ‘set_data()’ method.
- Args:
- datanumpy array [optional]
1/2/3-dimensional array with raw data
- dim_orderstring [optional]
order of dimensions, accepts any combination of the characters ‘p’, ‘s’, and ‘r’ for processes, samples, and replications; must have the same length as the data dimensionality, e.g., ‘ps’ for a two-dimensional array of data from several processes over time (default=’psr’)
- normalisebool [optional]
if True, data gets normalised per process (default=True)
- seedint [optional]
can be set to a fixed integer to get repetitive results on the same data with multiple runs of analyses. Otherwise a random seed is set as default.
- Attributes:
- datanumpy array
realisations, can only be set via ‘set_data’ method
- n_processesint
number of processes
- n_replicationsint
number of replications
- n_samplesint
number of samples in time
- normalisebool
if true, all data gets z-standardised per process
- initial_statearray
initial state of the seed for shuffled permutations
- property data
Return data array.
- generate_logistic_maps_data(n_samples=1000, n_replications=10, coefficient_matrices=array([[[0.5, 0.], [0.4, 0.5]]]), noise_std=0.1)[source]
Generate discrete-time coupled-logistic-maps time series.
Generate data and overwrite the instance’s current data.
The implemented logistic map function is f(x) = 4 * x * (1 - x).
- Args:
- n_samplesint [optional]
number of samples simulated for each process and replication
- n_replicationsint [optional]
number of replications
- coefficient_matricesnumpy array [optional]
coefficient matrices: numpy array with dimensions (order, number of processes, number of processes). Each square coefficient matrix corresponds to a lag, starting from lag=1. The total number of provided matrices implicitly determines the order of the stochastic process. (default = np.array([[[0.5, 0], [0.4, 0.5]]]))
- noise_stdfloat [optional]
standard deviation of uncorrelated Gaussian noise (default = 0.1)
- generate_mute_data(n_samples=1000, n_replications=10)[source]
Generate example data for a 5-process network.
Generate example data and overwrite the instance’s current data. The network is used as an example the paper on the MuTE toolbox (Montalto, PLOS ONE, 2014, eq. 14) and was orginally proposed by Baccala & Sameshima (2001). The network consists of five auto-regressive (AR) processes with model orders 2 and the following (non-linear) couplings:
0 -> 1, u = 2 (non-linear) 0 -> 2, u = 3 0 -> 3, u = 2 (non-linear) 3 -> 4, u = 1 4 -> 3, u = 1
References:
Montalto, A., Faes, L., & Marinazzo, D. (2014) MuTE: A MATLAB toolbox to compare established and novel estimators of the multivariate transfer entropy. PLoS ONE 9(10): e109462. https://doi.org/10.1371/journal.pone.0109462
Baccala, L.A. & Sameshima, K. (2001). Partial directed coherence: a new concept in neural structure determination. Biol Cybern 84: 463–474. https://doi.org/10.1007/PL00007990
- Args:
- n_samplesint
number of samples simulated for each process and replication
- n_replicationsint
number of replications
- generate_var_data(n_samples=1000, n_replications=10, coefficient_matrices=array([[[0.5, 0.], [0.4, 0.5]]]), noise_std=0.1)[source]
Generate discrete-time VAR (vector auto-regressive) time series.
Generate data and overwrite the instance’s current data.
- Args:
- n_samplesint [optional]
number of samples simulated for each process and replication
- n_replicationsint [optional]
number of replications
- coefficient_matricesnumpy array [optional]
coefficient matrices: numpy array with dimensions (VAR order, number of processes, number of processes). Each square coefficient matrix corresponds to a lag, starting from lag=1. The total number of provided matrices implicitly determines the order of the VAR process. (default = np.array([[[0.5, 0], [0.4, 0.5]]]))
- noise_stdfloat [optional]
standard deviation of uncorrelated Gaussian noise (default = 0.1)
- get_realisations(current_value, idx_list, shuffle=False)[source]
Return realisations for a list of indices.
Return realisations for indices in list. Optionally, realisations can be shuffled to create surrogate data for statistical testing. For shuffling, data blocks are permuted over replications while their temporal order stays intact within replications:
- Original data:
repl. ind.
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
…
sample index
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
…
- Shuffled data:
repl. ind.
3 3 3 3
1 1 1 1
4 4 4 4
2 2 2 2
5 5 5 5
…
sample index
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
…
- Args:
- idx_list: list of tuples
variable indices
- current_valuetuple
index of the current value in current analysis, has to have the form (idx process, idx sample); if current_value == idx, all samples for a process are returned
- shuffle: bool
if true permute blocks of replications over trials
- Returns:
- numpy array
realisations with dimensions (no. samples * no.replications) x number of indices
- numpy array
replication index for each realisation with dimensions (no. samples * no.replications) x number of indices
- get_seed()[source]
return the initial seed of the data
- get_state()[source]
return the current state of the random seed
- n_realisations(current_value=None)[source]
Number of realisations over samples and replications.
- Args:
- current_valuetuple [optional]
reference point for calculation of number of realisations (e.g. when using an embedding of length k, we count realisations from the k+1th sample because we loose the first k samples to the embedding); if no current_value is provided, the number of all samples is used
- n_realisations_repl()[source]
Number of realisations over replications.
- n_realisations_samples(current_value=None)[source]
Number of realisations over samples.
- Args:
- current_valuetuple [optional]
reference point for calculation of number of realisations (e.g. when using an embedding of length k, the current value is at sample k + 1; we thus count realisations from the k + 1st sample because we loose the first k samples to the embedding)
- permute_replications(current_value, idx_list)[source]
Return realisations with permuted replications (time stays intact).
Create surrogate data by permuting realisations over replications while keeping the temporal structure (order of samples) intact. Return realisations for all indices in the list, where an index is expected to have the form (process index, sample index). Realisations are permuted block-wise by permuting the order of replications:
- Original data:
repl. ind.
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
…
sample index
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
…
- Permuted data:
repl. ind.
3 3 3 3
1 1 1 1
4 4 4 4
2 2 2 2
5 5 5 5
…
sample index
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
…
- Args:
- current_valuetuple
index of the current_value in the data
- idx_listlist of tuples
indices of variables
- Returns:
- numpy array
permuted realisations with dimensions replications x number of indices
- numpy array
replication index for each realisation
- Raises:
TypeError if idx_realisations is not a list
- permute_samples(current_value, idx_list, perm_settings)[source]
Return realisations with permuted samples (repl. stays intact).
Create surrogate data by permuting realisations over samples (time) while keeping the order of replications intact. Surrogates can be created for multiple variables in parallel, where variables are provided as a list of indices. An index is expected to have the form (process index, sample index).
Permuting samples in time is the fall-back option for surrogate data creation. The default method for surrogate data creation is the permutation of replications, while keeping the order of samples in time intact. If the number of replications is too small to allow for a sufficient number of permutations for the generation of surrogate data, permutation of samples in time is chosen instead.
Different permutation strategies can be chosen to permute realisations in time. Note that if data consists of multiple replications, within each replication, samples are shuffled following the same permutation pattern:
- Original data:
repl. ind.
1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3
…
sample index
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
…
- Circular shift by a random number of samples, e.g. 4 samples:
repl. ind.
1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3
…
sample index
5 6 7 8 1 2 3 4
5 6 7 8 1 2 3 4
5 6 7 8 1 2 3 4
…
- Permute blocks of 3 samples:
repl. ind.
1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3
…
sample index
4 5 6 7 8 1 2 3
4 5 6 7 8 1 2 3
4 5 6 7 8 1 2 3
…
- Permute data locally within a range of 4 samples:
repl. ind.
1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3
…
sample index
1 2 4 3 8 5 6 7
1 2 4 3 8 5 6 7
1 2 4 3 8 5 6 7
…
- Random permutation:
repl. ind.
1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3
…
sample index
4 2 5 7 1 3 2 6
4 2 5 7 1 3 2 6
4 2 5 7 1 3 2 6
…
- Args:
- current_valuetuple
index of the current_value in the data
- idx_listlist of tuples
indices of variables
- perm_settingsdict
settings specifying the allowed permutations:
perm_type : str permutation type, can be
‘random’: swaps samples at random,
‘circular’: shifts time series by a random number of samples
‘block’: swaps blocks of samples,
‘local’: swaps samples within a given range, or
additional settings depending on the perm_type (n is the number of samples):
if perm_type == ‘circular’:
- ‘max_shift’int
the maximum number of samples for shifting (e.g., number of samples / 2)
if perm_type == ‘block’:
- ‘block_size’int
no. samples per block (e.g., number of samples / 10)
- ‘perm_range’int
range in which blocks can be swapped (e.g., number of samples / block_size)
if perm_type == ‘local’:
- ‘perm_range’int
range in samples over which realisations can be permuted (e.g., number of samples / 10)
- Returns:
- numpy array
permuted realisations with dimensions replications x number of indices
- numpy array
sample index for each realisation
- Raises:
TypeError if idx_realisations is not a list
- Note:
This permutation scheme is the fall-back option if surrogate data can not be created by shuffling replications because the number of replications is too small to generate the requested number of permutations.
- set_data(data, dim_order)[source]
Overwrite data in an existing Data object.
- Args:
- datanumpy array
1- to 3-dimensional array of realisations
- dim_orderstring
order of dimensions, accepts any combination of the characters ‘p’, ‘s’, and ‘r’ for processes, samples, and replications; must have the same length as number of dimensions in data
- slice_permute_replications(process)[source]
Return data slice with permuted replications (time stays intact).
Create surrogate data by permuting realisations over replications while keeping the temporal structure (order of samples) intact. Return realisations for all indices in the list, where an index is expected to have the form (process index, sample index). Realisations are permuted block-wise by permuting the order of replications
- slice_permute_samples(process, perm_settings)[source]
Return slice of data with permuted samples (repl. stays intact).
Create surrogate data by permuting data in a slice over samples (time) while keeping the order of replications intact. Return slice for the entry specified by ‘process’. Realisations are permuted according to the settings specified in perm_settings:
- Original data:
repl. ind.
1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3
…
sample index
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
…
- Circular shift by 2, 6, and 4 samples:
repl. ind.
1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3
…
sample index
7 8 1 2 3 4 5 6
3 4 5 6 7 8 1 2
5 6 7 8 1 2 3 4
…
- Permute blocks of 3 samples:
repl. ind.
1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3
…
sample index
4 5 6 7 8 1 2 3
1 2 3 7 8 4 5 6
7 8 4 5 6 1 2 3
…
- Permute data locally within a range of 4 samples:
repl. ind.
1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3
…
sample index
1 2 4 3 8 5 6 7
4 1 2 3 5 7 8 6
3 1 2 4 8 5 6 7
…
- Random permutation:
repl. ind.
1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3
…
sample index
4 2 5 7 1 3 2 6
7 5 3 4 2 1 8 5
1 2 4 3 6 8 7 5
…
Permuting samples is the fall-back option for surrogate creation if the number of replications is too small to allow for a sufficient number of permutations for the generation of surrogate data.
- Args:
- processint
process for which to return data slice
- perm_settingsdict
settings specifying the allowed permutations:
perm_type : str permutation type, can be
‘circular’: shifts time series by a random number of samples
‘block’: swaps blocks of samples,
‘local’: swaps samples within a given range, or
‘random’: swaps samples at random,
additional settings depending on the perm_type (n is the number of samples):
if perm_type == ‘circular’:
- ‘max_shift’int
the maximum number of samples for shifting (default=n/2)
if perm_type == ‘block’:
- ‘block_size’int
no. samples per block (default=n/10)
- ‘perm_range’int
range in which blocks can be swapped (default=max)
if perm_type == ‘local’:
- ‘perm_range’int
range in samples over which realisations can be permuted (default=n/10)
- Returns:
- numpy array
data slice with data permuted over samples with dimensions samples x number of replications
- numpy array
index of permuted samples
- Note:
This permutation scheme is the fall-back option if the number of replications is too small to allow a sufficient number of permutations for the generation of surrogate data.