Postprocessing of inferred networks

Significant subgraph mining

class idtxl.postprocessing.SignificantSubgraphMining(resultsA, resultsB, alpha, design, graph_type='directed', data_format='adjacency')[source]

Implementation of significant subgraph mining as described in

Sugiyama M, Lopez FL, Kasenburg N, Borgwardt KM. Significant subgraph mining with multiple testing correction. In: Proceedings of the 2015 SIAMInternational Conference on Data Mining. SIAM; 2015. p. 37–45.

Llinares-Lopez F, Sugiyama M, Papaxanthos L, Borgwardt K. Fast and memory-efficient significant pattern mining via permutation testing. In:Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM; 2015. p. 725–734.

Gutknecht, Wibral (2021): “Significant Subgraph Mining for Neural Network Inference with Multiple Comparisons Correction”. bioRxiv. https://www.biorxiv.org/content/10.1101/2021.11.03.467050v1.full

Attributes:
resultsAlist

List of lists of IDTxl results dicts. One list per subject in Group A and one results dict per target.

resultsBlist

List of lists of IDTxl results dicts. One list per subject in Group B and one results dict per target

coding_listlist

List of all target-source-lag triplets in data set. Used to encode networks as lists of indices.

groupA_networkslist

List of lists of indices representing networks of subjects in Group A

groupB_networkslist

List of lists of indices representing networks of subjects in Group B

graph_typestring

can be either “directed” or “undirected” (undirected is only possible if data_format is “adjacency”)

data_formatstring

can be either “idtxl” or “adjacency”

designstring

Sampling design. Either “within” for within-subject designs (the same group of subjects is measured under two different conditions) or “between” for between-subjects designs (two different groups of subjects are measured under the same condition)

n_Aint

Sample size of group A

n_Bint

Sample size of group B

Nint

Total sample size

alphafloat

Uncorrected significance level

min_p_value_tablelist

List of minimum achievable p-values for all possible numbers of total occurrences of a subgraph (0 to N)

p_value_tablenumpy array

Lookup table of p-values for each combination of occurrences in Group A and total occurrences between 0 and N

min_freqint

Minimum number of occurrences required for testability at level alpha

link_countslist

List of links counts for each link that occurs at least once in the data set (i.e. for each target-source-lag triplet in coding_list)

union_indiceslist

List of indices of target-source-lag triplets in coding_list occuring at east min_freq times. All other triplets and all their supergraphs can be ignored because they are not even testable at level alpha

frequent_graphslist

List of frequent subgraphs in data set occuring at least min_freq times. Initialized empty and filled by calling the enumerate_frequent_subgraphs method

p_valueslist

List of p-values for each frequent subgraph. Initialized empty and filled by calling the enumerate_frequent_subgraphs method

minimum_p_valueslist

List of minimum p-values for each frequent subgraph. Initialized empty and filled by calling the enumerate_frequent_subgraphs method

num_testable_graphsint

Number of subgraphs testable at level alpha. Initialized as 0 and determined by calling the enumerate_significant_subgraphs method.

k_rtint

Tarones correction factor. Initialized as None. Determined by calling the enumerate_significant_subgraphs method

p_values_corrlist

List of corrected p-values. Intialized empty and filled by calling the enumerate_significant_subgraphs method

significant_graphslist

List of tuples of significant subgraphs in data set and their associated corrected p-values. Initialized empty and filled by calling the enumerate_significant_subgraphs method

count_discordants(indices, where='original')[source]

Counts the discordant pairs for a given subgraph represented as a list of indices.

Args:
indiceslist of integers

indices of all links the subgraph to be counted consists of

wherestring

if “original” then the discordants are counted in the original data set. if “perm” the discordants are counted in the permuted data set (for WY procedure)

Returns:
tuple of integers

number of cases in which the subgraph occurred in condition A but not in B, and number of cases in which the subgraph occurred in B but not in A

count_discordants_wylight(indices, k)[source]

Counts discordant pairs for subgraph given by list if indices in k-th permuted data set.

Args:
indiceslist of integers

indices of all links of the subgraph

kinteger

index of permuted data set

Returns:
tuple of integers

number of cases in which the subgraph occurred in condition A but not in B, and number of cases in which the subgraph occurred in B but not in A

count_subgraph(indices, where='original')[source]

Counts the number of occurrences of a subgraph represented by a list of indices

Args:
indiceslist of integers

indices of all links the subgraph to be counted consists of

wherestring

if “original” then the subgraph is counted in the original data set. if “perm” the subgraph is counted in the permuted data set (for WY procedure)

Returns:
tuple of integers

number of occurrences of subgraph in GroupA and in GroupB

count_subgraph_wylight(indices, k)[source]

Counts subgraph occurrences in k-th permuted data set

Args:
indiceslist of integers

indices of all links of the subgraph

kinteger

index of permuted data set

Returns:
tuple of integers

number of occurrences in group A and number of occurrences in group B

decode(indices)[source]

Converts a given list of indices (representing a subgraph) into a list of corresponding target-source-lag triplets using the mapping described in the coding list.

Args:

indices : list of integers

Returns:

List of 3-tuples

decode_adjacency(indices)[source]

Decodes list of indices as adjacency matrix

determine_tarone_factor()[source]

Determines Tarone’s correction factor in case there are at least two testable subgraphs.

Returns:
int

Tarone’s correction factor

encode()[source]

Encodes all subject networks as lists of indices. The ith entry describes the occurrence of the ith target-source-lag triplet in the coding list (self.coding_list).

Returns:
tuple of lists of integers

The first entry of the tuple is a list of integers describing the networks of subjects in Group A. The second entry is a list of integers for Group B.

encode_adjacency()[source]

Encodes all input adjacency matrices as lists of indices

enumerate_frequent_graphs(freq)[source]

Adds all subgraphs occuring at least freq times to self.frequent_graphs The process is carried out recursively using the extend() method. Individual links of the union network are successively extended to build more complex subgraphs. As soon as a subgraph does not occur often enough the extension process can be stopped because all supergraphs can at best occur with the same frequency. The extend() method also saves the minimum and actual p-values of all frequent subgraphs.

Args:
freqint

desired minimum frequency

enumerate_significant_subgraphs(method='Hommel', wy_algorithm='simple_depth_first', verbose=True, num_perm=10000, max_depth=inf)[source]

This is the main function carrying out significant subgraph mining according to the multiple comparisons correction method (and algorithm in the case of Westfall-Young) of choice. It calls the relevant methods depending on the input arguments.

Args:
verbosebool

If True, print summary of results

methodstring

Determines method used for multiple comparisons correction. can be “Tarone”, “Hommel”, or “Westfall-Young”

num_permint

Number of permutations used for Westfall-Young procedure.

wy_algorithmstring

algorithm used for Westfall-Young permutation procedure. Can be either “simple_depth_fist” (evaluates one permuted data set at a time) or “wy_light” for the Westfall-Young light algorithm introduced by Llinares-Lopez et al 2015 (distributes computations across permutations)

max_depthinteger

maximum complexity of subgraphs (number of links) up to which subgraphs are mined.

Returns:
list of tuples

The first entry of each tuple is a list of indices representing the identified significant subgraph. The second entry is the associated (uncorrected) p-value.

extend(to_be_extended, freq)[source]

Recursively extends the input subgraph checking at each recursion step if the current subgraph occurs frequently enough to reach significance at level alpha. If this is not the case, it is not extended any further. If it is, the extend method is called again. Frequent subgraphs are appended to self.frequent_subgraphs.

Args:
to_be_extendedlist

list of indices describing the locations of 1s in the union network. Each such list represent a particular subgraph.

freqint

desired minimum frequency

max_depthint

If specified, only subgraphs with at most max_depth links are considered. For instance, if max_depth = 1 only individual links are tested. The default value is infinity meaning that all possible subgraphs are considered.

Returns:

None

extend_mcnemar(to_be_extended, freq)[source]

Same as extend() method but using McNemar’s test for within subject designs

Args:
to_be_extendedlist

list of indices describing the locations of 1s in the union network. Each such list represent a particular subgraph.

freqint

desired minimum frequency

Returns:

None

extend_wy(to_be_extended)[source]

Determines the smallest observed p-value in permuted version of the data set by recursively extending the input subgraph. At each recursion step the function checks if the current subgraph occurs frequently enough (> self.current_min_freq) to obtain a p-value smaller than the smallest p-value observed so far (self.current_min_p). If this is not the case, it is not extended any further. If it is, the actual p-value is calculated. If this p-value happens to be smaller than current_min_p, then current_min_p and self.current_min_freq are updated, and the extend method is called again. If this p-value happens to be larger than current_min_p, the extend method is called again immediately.

Args:

list of indices of coding list. Each such list represent a particular subgraph.

Returns:

None

extend_wy_light(to_be_extended)[source]

Westfall-Young light extension method. Evaluates all permutations at the same time for each subgraph. The goal is to determine the Westfall-Young corrected level, i.e. the alpha quantile of the permutation distribution of the smallest observed p-value among subgraphs. Recursively, evaluates subgraphs and updates the current estimate of the Westfall-Young corrected level self.wy_level_light.

Args:
indiceslist of integers

indices of all links of the subgraph

Returns:

None

extend_wy_light_mcnemar(to_be_extended)[source]

Westfall-Young light extension method for the within-subjects case using McNemars test. Recursively, evaluates subgraphs and updates the current estimate of the Westfall-Young corrected level self.wy_level_light.

Args:
indiceslist of integers

indices of all links of the subgraph

Returns:

None

extend_wy_mcnemar(to_be_extended)[source]

Same as extend_wy but using McNemars test

Args:

list of indices of coding list. Each such list represent a particular subgraph.

Returns:

None

generate_coding_list()[source]

If data_format = “idtxl”: Creates list of all target-source-lag triplets occuring at least once in the data set. This list is used to encode subject networks as lists including all indices of the coding list such that the corresponding triplet is part of the network.

Returns:
list of 3-tuples

each tuple has the form (target index, source index, lags)

If data_format = “adjacency”: Creates list of all source-target tuples occuring at least once in the data set. This list is used to encode subject networks as lists including all indices of the coding list such that the corresponding tuple is part of the network.

Returns:
list of 2-tuples

each tuple has the form (source index, target index)

generate_min_p_table(design)[source]

Computes list of minimum p_values depending on the total number of occurrences and given the group sample sizes.

Returns:
list

minimum p-values for each number of occurrences between 0 and N

generate_p_table(design)[source]

Computes table of p-values depending on the total number of occurrences, the occurrences in Group A, and given the group sample sizes.

Args:
designstring

sampling design. either “within” or “between”

Returns:
numpy array

p-values for each number of occurrences and occurrences in Group A between 0 and N

westfall_young(num_perm=10000, verbose=True)[source]

Determines significant subgraphs using the Westfall-Young Permutation procedure for multiple comparisons correction. This algorithm computes the permutation distribution of the smallest observed p-value permutation-by-permutation.

Args:
num_permint

Number of permutation used for Westfall-Young procedure.

verbosebool

If True, print summary of results

Returns:

None

westfall_young_light(num_perm=10000, verbose=True)[source]

Determines significant subgraphs using the Westfall-Young light algorithm described in

Llinares-Lopez F, Sugiyama M, Papaxanthos L, Borgwardt K. Fast and memory-efficient significant pattern mining via permutation testing. In:Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM; 2015. p. 725–734.

Args:
num_permint

Number of permutation used for Westfall-Young procedure.

verbosebool

If True, print summary of results

Returns:

None