Postprocessing of inferred networks¶
Significant subgraph mining¶
- class idtxl.postprocessing.SignificantSubgraphMining(resultsA, resultsB, alpha, design, graph_type='directed', data_format='adjacency')[source]
Implementation of significant subgraph mining as described in
Sugiyama M, Lopez FL, Kasenburg N, Borgwardt KM. Significant subgraph mining with multiple testing correction. In: Proceedings of the 2015 SIAMInternational Conference on Data Mining. SIAM; 2015. p. 37–45.
Llinares-Lopez F, Sugiyama M, Papaxanthos L, Borgwardt K. Fast and memory-efficient significant pattern mining via permutation testing. In:Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM; 2015. p. 725–734.
Gutknecht, Wibral (2021): “Significant Subgraph Mining for Neural Network Inference with Multiple Comparisons Correction”. bioRxiv. https://www.biorxiv.org/content/10.1101/2021.11.03.467050v1.full
- Attributes:
- resultsAlist
List of lists of IDTxl results dicts. One list per subject in Group A and one results dict per target.
- resultsBlist
List of lists of IDTxl results dicts. One list per subject in Group B and one results dict per target
- coding_listlist
List of all target-source-lag triplets in data set. Used to encode networks as lists of indices.
- groupA_networkslist
List of lists of indices representing networks of subjects in Group A
- groupB_networkslist
List of lists of indices representing networks of subjects in Group B
- graph_typestring
can be either “directed” or “undirected” (undirected is only possible if data_format is “adjacency”)
- data_formatstring
can be either “idtxl” or “adjacency”
- designstring
Sampling design. Either “within” for within-subject designs (the same group of subjects is measured under two different conditions) or “between” for between-subjects designs (two different groups of subjects are measured under the same condition)
- n_Aint
Sample size of group A
- n_Bint
Sample size of group B
- Nint
Total sample size
- alphafloat
Uncorrected significance level
- min_p_value_tablelist
List of minimum achievable p-values for all possible numbers of total occurrences of a subgraph (0 to N)
- p_value_tablenumpy array
Lookup table of p-values for each combination of occurrences in Group A and total occurrences between 0 and N
- min_freqint
Minimum number of occurrences required for testability at level alpha
- link_countslist
List of links counts for each link that occurs at least once in the data set (i.e. for each target-source-lag triplet in coding_list)
- union_indiceslist
List of indices of target-source-lag triplets in coding_list occuring at east min_freq times. All other triplets and all their supergraphs can be ignored because they are not even testable at level alpha
- frequent_graphslist
List of frequent subgraphs in data set occuring at least min_freq times. Initialized empty and filled by calling the enumerate_frequent_subgraphs method
- p_valueslist
List of p-values for each frequent subgraph. Initialized empty and filled by calling the enumerate_frequent_subgraphs method
- minimum_p_valueslist
List of minimum p-values for each frequent subgraph. Initialized empty and filled by calling the enumerate_frequent_subgraphs method
- num_testable_graphsint
Number of subgraphs testable at level alpha. Initialized as 0 and determined by calling the enumerate_significant_subgraphs method.
- k_rtint
Tarones correction factor. Initialized as None. Determined by calling the enumerate_significant_subgraphs method
- p_values_corrlist
List of corrected p-values. Intialized empty and filled by calling the enumerate_significant_subgraphs method
- significant_graphslist
List of tuples of significant subgraphs in data set and their associated corrected p-values. Initialized empty and filled by calling the enumerate_significant_subgraphs method
- count_discordants(indices, where='original')[source]
Counts the discordant pairs for a given subgraph represented as a list of indices.
- Args:
- indiceslist of integers
indices of all links the subgraph to be counted consists of
- wherestring
if “original” then the discordants are counted in the original data set. if “perm” the discordants are counted in the permuted data set (for WY procedure)
- Returns:
- tuple of integers
number of cases in which the subgraph occurred in condition A but not in B, and number of cases in which the subgraph occurred in B but not in A
- count_discordants_wylight(indices, k)[source]
Counts discordant pairs for subgraph given by list if indices in k-th permuted data set.
- Args:
- indiceslist of integers
indices of all links of the subgraph
- kinteger
index of permuted data set
- Returns:
- tuple of integers
number of cases in which the subgraph occurred in condition A but not in B, and number of cases in which the subgraph occurred in B but not in A
- count_subgraph(indices, where='original')[source]
Counts the number of occurrences of a subgraph represented by a list of indices
- Args:
- indiceslist of integers
indices of all links the subgraph to be counted consists of
- wherestring
if “original” then the subgraph is counted in the original data set. if “perm” the subgraph is counted in the permuted data set (for WY procedure)
- Returns:
- tuple of integers
number of occurrences of subgraph in GroupA and in GroupB
- count_subgraph_wylight(indices, k)[source]
Counts subgraph occurrences in k-th permuted data set
- Args:
- indiceslist of integers
indices of all links of the subgraph
- kinteger
index of permuted data set
- Returns:
- tuple of integers
number of occurrences in group A and number of occurrences in group B
- decode(indices)[source]
Converts a given list of indices (representing a subgraph) into a list of corresponding target-source-lag triplets using the mapping described in the coding list.
- Args:
indices : list of integers
- Returns:
List of 3-tuples
- decode_adjacency(indices)[source]
Decodes list of indices as adjacency matrix
- determine_tarone_factor()[source]
Determines Tarone’s correction factor in case there are at least two testable subgraphs.
- Returns:
- int
Tarone’s correction factor
- encode()[source]
Encodes all subject networks as lists of indices. The ith entry describes the occurrence of the ith target-source-lag triplet in the coding list (self.coding_list).
- Returns:
- tuple of lists of integers
The first entry of the tuple is a list of integers describing the networks of subjects in Group A. The second entry is a list of integers for Group B.
- encode_adjacency()[source]
Encodes all input adjacency matrices as lists of indices
- enumerate_frequent_graphs(freq)[source]
Adds all subgraphs occuring at least freq times to self.frequent_graphs The process is carried out recursively using the extend() method. Individual links of the union network are successively extended to build more complex subgraphs. As soon as a subgraph does not occur often enough the extension process can be stopped because all supergraphs can at best occur with the same frequency. The extend() method also saves the minimum and actual p-values of all frequent subgraphs.
- Args:
- freqint
desired minimum frequency
- enumerate_significant_subgraphs(method='Hommel', wy_algorithm='simple_depth_first', verbose=True, num_perm=10000, max_depth=inf)[source]
This is the main function carrying out significant subgraph mining according to the multiple comparisons correction method (and algorithm in the case of Westfall-Young) of choice. It calls the relevant methods depending on the input arguments.
- Args:
- verbosebool
If True, print summary of results
- methodstring
Determines method used for multiple comparisons correction. can be “Tarone”, “Hommel”, or “Westfall-Young”
- num_permint
Number of permutations used for Westfall-Young procedure.
- wy_algorithmstring
algorithm used for Westfall-Young permutation procedure. Can be either “simple_depth_fist” (evaluates one permuted data set at a time) or “wy_light” for the Westfall-Young light algorithm introduced by Llinares-Lopez et al 2015 (distributes computations across permutations)
- max_depthinteger
maximum complexity of subgraphs (number of links) up to which subgraphs are mined.
- Returns:
- list of tuples
The first entry of each tuple is a list of indices representing the identified significant subgraph. The second entry is the associated (uncorrected) p-value.
- extend(to_be_extended, freq)[source]
Recursively extends the input subgraph checking at each recursion step if the current subgraph occurs frequently enough to reach significance at level alpha. If this is not the case, it is not extended any further. If it is, the extend method is called again. Frequent subgraphs are appended to self.frequent_subgraphs.
- Args:
- to_be_extendedlist
list of indices describing the locations of 1s in the union network. Each such list represent a particular subgraph.
- freqint
desired minimum frequency
- max_depthint
If specified, only subgraphs with at most max_depth links are considered. For instance, if max_depth = 1 only individual links are tested. The default value is infinity meaning that all possible subgraphs are considered.
- Returns:
None
- extend_mcnemar(to_be_extended, freq)[source]
Same as extend() method but using McNemar’s test for within subject designs
- Args:
- to_be_extendedlist
list of indices describing the locations of 1s in the union network. Each such list represent a particular subgraph.
- freqint
desired minimum frequency
- Returns:
None
- extend_wy(to_be_extended)[source]
Determines the smallest observed p-value in permuted version of the data set by recursively extending the input subgraph. At each recursion step the function checks if the current subgraph occurs frequently enough (> self.current_min_freq) to obtain a p-value smaller than the smallest p-value observed so far (self.current_min_p). If this is not the case, it is not extended any further. If it is, the actual p-value is calculated. If this p-value happens to be smaller than current_min_p, then current_min_p and self.current_min_freq are updated, and the extend method is called again. If this p-value happens to be larger than current_min_p, the extend method is called again immediately.
- Args:
list of indices of coding list. Each such list represent a particular subgraph.
- Returns:
None
- extend_wy_light(to_be_extended)[source]
Westfall-Young light extension method. Evaluates all permutations at the same time for each subgraph. The goal is to determine the Westfall-Young corrected level, i.e. the alpha quantile of the permutation distribution of the smallest observed p-value among subgraphs. Recursively, evaluates subgraphs and updates the current estimate of the Westfall-Young corrected level self.wy_level_light.
- Args:
- indiceslist of integers
indices of all links of the subgraph
- Returns:
None
- extend_wy_light_mcnemar(to_be_extended)[source]
Westfall-Young light extension method for the within-subjects case using McNemars test. Recursively, evaluates subgraphs and updates the current estimate of the Westfall-Young corrected level self.wy_level_light.
- Args:
- indiceslist of integers
indices of all links of the subgraph
- Returns:
None
- extend_wy_mcnemar(to_be_extended)[source]
Same as extend_wy but using McNemars test
- Args:
list of indices of coding list. Each such list represent a particular subgraph.
- Returns:
None
- generate_coding_list()[source]
If data_format = “idtxl”: Creates list of all target-source-lag triplets occuring at least once in the data set. This list is used to encode subject networks as lists including all indices of the coding list such that the corresponding triplet is part of the network.
- Returns:
- list of 3-tuples
each tuple has the form (target index, source index, lags)
If data_format = “adjacency”: Creates list of all source-target tuples occuring at least once in the data set. This list is used to encode subject networks as lists including all indices of the coding list such that the corresponding tuple is part of the network.
- Returns:
- list of 2-tuples
each tuple has the form (source index, target index)
- generate_min_p_table(design)[source]
Computes list of minimum p_values depending on the total number of occurrences and given the group sample sizes.
- Returns:
- list
minimum p-values for each number of occurrences between 0 and N
- generate_p_table(design)[source]
Computes table of p-values depending on the total number of occurrences, the occurrences in Group A, and given the group sample sizes.
- Args:
- designstring
sampling design. either “within” or “between”
- Returns:
- numpy array
p-values for each number of occurrences and occurrences in Group A between 0 and N
- westfall_young(num_perm=10000, verbose=True)[source]
Determines significant subgraphs using the Westfall-Young Permutation procedure for multiple comparisons correction. This algorithm computes the permutation distribution of the smallest observed p-value permutation-by-permutation.
- Args:
- num_permint
Number of permutation used for Westfall-Young procedure.
- verbosebool
If True, print summary of results
- Returns:
None
- westfall_young_light(num_perm=10000, verbose=True)[source]
Determines significant subgraphs using the Westfall-Young light algorithm described in
Llinares-Lopez F, Sugiyama M, Papaxanthos L, Borgwardt K. Fast and memory-efficient significant pattern mining via permutation testing. In:Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM; 2015. p. 725–734.
- Args:
- num_permint
Number of permutation used for Westfall-Young procedure.
- verbosebool
If True, print summary of results
- Returns:
None