Postprocessing of inferred networks¶

Significant subgraph mining¶

class idtxl.postprocessing.SignificantSubgraphMining(resultsA, resultsB, alpha, design, graph_type='directed', data_format='adjacency')[source]

Implementation of significant subgraph mining as described in

Sugiyama M, Lopez FL, Kasenburg N, Borgwardt KM. Significant subgraph mining with multiple testing correction. In: Proceedings of the 2015 SIAMInternational Conference on Data Mining. SIAM; 2015. p. 37–45.

Llinares-Lopez F, Sugiyama M, Papaxanthos L, Borgwardt K. Fast and memory-efficient significant pattern mining via permutation testing. In:Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM; 2015. p. 725–734.

Gutknecht, Wibral (2021): “Significant Subgraph Mining for Neural Network Inference with Multiple Comparisons Correction”. bioRxiv. https://www.biorxiv.org/content/10.1101/2021.11.03.467050v1.full

Attributes:

resultsAlist: List of lists of IDTxl results dicts. One list per subject in Group A and one results dict per target.
resultsBlist: List of lists of IDTxl results dicts. One list per subject in Group B and one results dict per target
coding_listlist: List of all target-source-lag triplets in data set. Used to encode networks as lists of indices.
groupA_networkslist: List of lists of indices representing networks of subjects in Group A
groupB_networkslist: List of lists of indices representing networks of subjects in Group B
graph_typestring: can be either “directed” or “undirected” (undirected is only possible if data_format is “adjacency”)
data_formatstring: can be either “idtxl” or “adjacency”
designstring: Sampling design. Either “within” for within-subject designs (the same group of subjects is measured under two different conditions) or “between” for between-subjects designs (two different groups of subjects are measured under the same condition)
n_Aint: Sample size of group A
n_Bint: Sample size of group B
Nint: Total sample size
alphafloat: Uncorrected significance level
min_p_value_tablelist: List of minimum achievable p-values for all possible numbers of total occurrences of a subgraph (0 to N)
p_value_tablenumpy array: Lookup table of p-values for each combination of occurrences in Group A and total occurrences between 0 and N
min_freqint: Minimum number of occurrences required for testability at level alpha
link_countslist: List of links counts for each link that occurs at least once in the data set (i.e. for each target-source-lag triplet in coding_list)
union_indiceslist: List of indices of target-source-lag triplets in coding_list occuring at east min_freq times. All other triplets and all their supergraphs can be ignored because they are not even testable at level alpha
frequent_graphslist: List of frequent subgraphs in data set occuring at least min_freq times. Initialized empty and filled by calling the enumerate_frequent_subgraphs method
p_valueslist: List of p-values for each frequent subgraph. Initialized empty and filled by calling the enumerate_frequent_subgraphs method
minimum_p_valueslist: List of minimum p-values for each frequent subgraph. Initialized empty and filled by calling the enumerate_frequent_subgraphs method
num_testable_graphsint: Number of subgraphs testable at level alpha. Initialized as 0 and determined by calling the enumerate_significant_subgraphs method.
k_rtint: Tarones correction factor. Initialized as None. Determined by calling the enumerate_significant_subgraphs method
p_values_corrlist: List of corrected p-values. Intialized empty and filled by calling the enumerate_significant_subgraphs method
significant_graphslist: List of tuples of significant subgraphs in data set and their associated corrected p-values. Initialized empty and filled by calling the enumerate_significant_subgraphs method

count_discordants(indices, where='original')[source]

Counts the discordant pairs for a given subgraph represented as a list of indices.

Args:

indiceslist of integers: indices of all links the subgraph to be counted consists of
wherestring: if “original” then the discordants are counted in the original data set. if “perm” the discordants are counted in the permuted data set (for WY procedure)

Returns:

tuple of integers: number of cases in which the subgraph occurred in condition A but not in B, and number of cases in which the subgraph occurred in B but not in A

count_discordants_wylight(indices, k)[source]

Counts discordant pairs for subgraph given by list if indices in k-th permuted data set.

Args:

indiceslist of integers: indices of all links of the subgraph
kinteger: index of permuted data set

Returns:

tuple of integers: number of cases in which the subgraph occurred in condition A but not in B, and number of cases in which the subgraph occurred in B but not in A

count_subgraph(indices, where='original')[source]

Counts the number of occurrences of a subgraph represented by a list of indices

Args:

indiceslist of integers: indices of all links the subgraph to be counted consists of
wherestring: if “original” then the subgraph is counted in the original data set. if “perm” the subgraph is counted in the permuted data set (for WY procedure)

Returns:

tuple of integers: number of occurrences of subgraph in GroupA and in GroupB

count_subgraph_wylight(indices, k)[source]

Counts subgraph occurrences in k-th permuted data set

Args:

indiceslist of integers: indices of all links of the subgraph
kinteger: index of permuted data set

Returns:

tuple of integers: number of occurrences in group A and number of occurrences in group B

decode(indices)[source]

Converts a given list of indices (representing a subgraph) into a list of corresponding target-source-lag triplets using the mapping described in the coding list.

Args:: indices : list of integers
Returns:: List of 3-tuples

decode_adjacency(indices)[source]: Decodes list of indices as adjacency matrix

determine_tarone_factor()[source]

Determines Tarone’s correction factor in case there are at least two testable subgraphs.

Returns:

int: Tarone’s correction factor

encode()[source]

Encodes all subject networks as lists of indices. The ith entry describes the occurrence of the ith target-source-lag triplet in the coding list (self.coding_list).

Returns:

tuple of lists of integers: The first entry of the tuple is a list of integers describing the networks of subjects in Group A. The second entry is a list of integers for Group B.

encode_adjacency()[source]: Encodes all input adjacency matrices as lists of indices

enumerate_frequent_graphs(freq)[source]

Adds all subgraphs occuring at least freq times to self.frequent_graphs The process is carried out recursively using the extend() method. Individual links of the union network are successively extended to build more complex subgraphs. As soon as a subgraph does not occur often enough the extension process can be stopped because all supergraphs can at best occur with the same frequency. The extend() method also saves the minimum and actual p-values of all frequent subgraphs.

Args:

freqint: desired minimum frequency

enumerate_significant_subgraphs(method='Hommel', wy_algorithm='simple_depth_first', verbose=True, num_perm=10000, max_depth=inf)[source]

This is the main function carrying out significant subgraph mining according to the multiple comparisons correction method (and algorithm in the case of Westfall-Young) of choice. It calls the relevant methods depending on the input arguments.

Args:

verbosebool: If True, print summary of results
methodstring: Determines method used for multiple comparisons correction. can be “Tarone”, “Hommel”, or “Westfall-Young”
num_permint: Number of permutations used for Westfall-Young procedure.
wy_algorithmstring: algorithm used for Westfall-Young permutation procedure. Can be either “simple_depth_fist” (evaluates one permuted data set at a time) or “wy_light” for the Westfall-Young light algorithm introduced by Llinares-Lopez et al 2015 (distributes computations across permutations)
max_depthinteger: maximum complexity of subgraphs (number of links) up to which subgraphs are mined.

Returns:

list of tuples: The first entry of each tuple is a list of indices representing the identified significant subgraph. The second entry is the associated (uncorrected) p-value.

extend(to_be_extended, freq)[source]

Recursively extends the input subgraph checking at each recursion step if the current subgraph occurs frequently enough to reach significance at level alpha. If this is not the case, it is not extended any further. If it is, the extend method is called again. Frequent subgraphs are appended to self.frequent_subgraphs.

Args:

to_be_extendedlist
list of indices describing the locations of 1s in the union network. Each such list represent a particular subgraph.

freqint: desired minimum frequency
max_depthint: If specified, only subgraphs with at most max_depth links are considered. For instance, if max_depth = 1 only individual links are tested. The default value is infinity meaning that all possible subgraphs are considered.

Returns:

None

extend_mcnemar(to_be_extended, freq)[source]

Same as extend() method but using McNemar’s test for within subject designs

Args:

to_be_extendedlist
list of indices describing the locations of 1s in the union network. Each such list represent a particular subgraph.

freqint: desired minimum frequency

Returns:

None

extend_wy(to_be_extended)[source]

Determines the smallest observed p-value in permuted version of the data set by recursively extending the input subgraph. At each recursion step the function checks if the current subgraph occurs frequently enough (> self.current_min_freq) to obtain a p-value smaller than the smallest p-value observed so far (self.current_min_p). If this is not the case, it is not extended any further. If it is, the actual p-value is calculated. If this p-value happens to be smaller than current_min_p, then current_min_p and self.current_min_freq are updated, and the extend method is called again. If this p-value happens to be larger than current_min_p, the extend method is called again immediately.

Args:: list of indices of coding list. Each such list represent a particular subgraph.
Returns:: None

extend_wy_light(to_be_extended)[source]

Westfall-Young light extension method. Evaluates all permutations at the same time for each subgraph. The goal is to determine the Westfall-Young corrected level, i.e. the alpha quantile of the permutation distribution of the smallest observed p-value among subgraphs. Recursively, evaluates subgraphs and updates the current estimate of the Westfall-Young corrected level self.wy_level_light.

Args:

indiceslist of integers: indices of all links of the subgraph

Returns:

None

extend_wy_light_mcnemar(to_be_extended)[source]

Westfall-Young light extension method for the within-subjects case using McNemars test. Recursively, evaluates subgraphs and updates the current estimate of the Westfall-Young corrected level self.wy_level_light.

Args:

indiceslist of integers: indices of all links of the subgraph

Returns:

None

extend_wy_mcnemar(to_be_extended)[source]

Same as extend_wy but using McNemars test

Args:: list of indices of coding list. Each such list represent a particular subgraph.
Returns:: None

generate_coding_list()[source]

If data_format = “idtxl”: Creates list of all target-source-lag triplets occuring at least once in the data set. This list is used to encode subject networks as lists including all indices of the coding list such that the corresponding triplet is part of the network.

Returns:

list of 3-tuples: each tuple has the form (target index, source index, lags)

If data_format = “adjacency”: Creates list of all source-target tuples occuring at least once in the data set. This list is used to encode subject networks as lists including all indices of the coding list such that the corresponding tuple is part of the network.

Returns:

list of 2-tuples: each tuple has the form (source index, target index)

generate_min_p_table(design)[source]

Computes list of minimum p_values depending on the total number of occurrences and given the group sample sizes.

Returns:

list: minimum p-values for each number of occurrences between 0 and N

generate_p_table(design)[source]

Computes table of p-values depending on the total number of occurrences, the occurrences in Group A, and given the group sample sizes.

Args:

designstring: sampling design. either “within” or “between”

Returns:

numpy array: p-values for each number of occurrences and occurrences in Group A between 0 and N

westfall_young(num_perm=10000, verbose=True)[source]

Determines significant subgraphs using the Westfall-Young Permutation procedure for multiple comparisons correction. This algorithm computes the permutation distribution of the smallest observed p-value permutation-by-permutation.

Args:

num_permint: Number of permutation used for Westfall-Young procedure.
verbosebool: If True, print summary of results

Returns:

None

westfall_young_light(num_perm=10000, verbose=True)[source]

Determines significant subgraphs using the Westfall-Young light algorithm described in

Llinares-Lopez F, Sugiyama M, Papaxanthos L, Borgwardt K. Fast and memory-efficient significant pattern mining via permutation testing. In:Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM; 2015. p. 725–734.

Args:

num_permint: Number of permutation used for Westfall-Young procedure.
verbosebool: If True, print summary of results

Returns:

None

Postprocessing of inferred networks¶

Significant subgraph mining¶

Table of Contents

Previous topic

Next topic

This Page