High-level description
TheSpectralGreedySolver class, inheriting from GreedySolver, implements a phylogenetic tree solver that combines a greedy splitting strategy with a spectral heuristic for optimizing partitions. It aims to construct a more accurate tree by refining the initial greedy splits using spectral methods to maximize the similarity within partitions based on a mutation similarity graph.
Code Structure
TheSpectralGreedySolver class inherits from GreedySolver and overrides the perform_split method. It utilizes functions from graph_utilities for graph construction and optimization. The key logic lies in how it first performs a greedy split based on mutation frequencies and then refines this split using spectral methods on a similarity graph.
References
This code references the following symbols from other files:GreedySolver(fromcassiopeia.solver.GreedySolver)missing_data_methods(fromcassiopeia.solver.missing_data_methods)dissimilarity_functions(fromcassiopeia.solver.dissimilarity_functions)graph_utilities(fromcassiopeia.solver.graph_utilities)solver_utilities(fromcassiopeia.solver.solver_utilities)
Symbols
SpectralGreedySolver
Description
This class implements a phylogenetic tree solver that combines a greedy splitting strategy with a spectral heuristic for optimizing partitions.Inputs
| Name | Type | Description |
|---|---|---|
| missing_data_classifier | Callable | A function to handle missing data during partition. Defaults to assign_missing_average. |
| similarity_function | Optional[Callable] | A function to calculate similarity between samples. Defaults to hamming_similarity_without_missing. |
| threshold | Optional[int] | Minimum similarity threshold for edge inclusion in the similarity graph. Defaults to 0. |
| prior_transformation | str | Transformation method for converting priors to weights. Defaults to “negative_log”. |
Outputs
The class itself doesn’t have direct outputs. It’s meant to be used to solve aCassiopeiaTree object.
Internal Logic
The solver first identifies the most frequent mutation to perform an initial greedy split. Then, it constructs a similarity graph based on the providedsimilarity_function and refines the partition using the spectral_improve_cut function from graph_utilities. This iterative optimization aims to minimize a modified normalized cut, effectively grouping samples with similar mutations.
perform_split
Description
This method overrides the parent class’s method and performs a partition of samples using both greedy and spectral criteria.Inputs
| Name | Type | Description |
|---|---|---|
| character_matrix | pd.DataFrame | Character matrix representing the mutation profiles of samples. |
| samples | List[int] | List of samples to be partitioned. |
| weights | Optional[Dict[int, Dict[int, float]]] | Weights for each (character, state) pair, often derived from priors. |
| missing_state_indicator | int | Character representing missing data in the character matrix. Defaults to -1. |
Outputs
| Name | Type | Description |
|---|---|---|
| improved_left_set | List[str] | List of samples in the left partition after spectral improvement. |
| improved_right_set | List[str] | List of samples in the right partition after spectral improvement. |
Internal Logic
- Identify Splitting Point: The method first identifies the most frequent (character, state) pair based on the input
character_matrixandweights. This pair will be used to perform the initial split. - Initial Split: Samples are divided into
left_setandright_setbased on the presence or absence of the chosen character state. Samples with missing data for this character are placed in themissinglist. - Handle Missing Data: The
missing_data_classifierfunction is applied to impute the missing data and assign themissingsamples to eitherleft_setorright_set. - Construct Similarity Graph: A similarity graph is constructed using the
construct_similarity_graphfunction fromgraph_utilities. This graph represents the similarity between samples based on their mutation profiles. - Spectral Improvement: The
spectral_improve_cutfunction fromgraph_utilitiesis applied to optimize the initial split by minimizing a modified normalized cut on the similarity graph. This step refines the partition by moving samples betweenleft_setandright_setto maximize the similarity within each partition. - Return Improved Partitions: The method returns the optimized
improved_left_setandimproved_right_setlists.
