High-level description
TheMaxCutGreedySolver class, inheriting from GreedySolver, implements a phylogenetic tree reconstruction algorithm that combines a greedy approach with a max-cut optimization. It recursively partitions samples based on the most frequent mutation and then refines the split using a hill-climbing method to maximize the weight of edges cut in a connectivity graph built from shared mutations.
Code Structure
TheMaxCutGreedySolver class overrides the perform_split method of its parent class, GreedySolver. It leverages the compute_mutation_frequencies method from the parent class and utilizes functions like construct_connectivity_graph and max_cut_improve_cut from the graph_utilities module.
References
This code references the following symbols from other files:GreedySolver(from cassiopeia/solver/GreedySolver.py)missing_data_methods(from cassiopeia/solver/missing_data_methods.py)solver_utilities(from cassiopeia/solver/solver_utilities.py)graph_utilities(from cassiopeia/solver/graph_utilities.py)
Symbols
MaxCutGreedySolver
Description
This class implements the MaxCut Greedy algorithm for phylogenetic tree reconstruction. It extends the basic GreedySolver by adding a max-cut optimization step to each split.Inputs
| Name | Type | Description |
|---|---|---|
| missing_data_classifier | Callable | Function to handle missing data during character splits (default: missing_data_methods.assign_missing_average). |
| prior_transformation | str | Transformation function for converting priors to weights (default: "negative_log"). |
Outputs
This class doesn’t directly return any output. It populates aCassiopeiaTree object during the solve method (inherited from GreedySolver).
Internal Logic
The core logic resides in theperform_split method:
- Identify Splitting Character/State: It identifies the most frequent character/state pair, considering potential weights derived from priors.
- Initial Split: Samples are initially partitioned based on the chosen character/state.
- Handle Missing Data: The
missing_data_classifierfunction is used to assign samples with missing data to either side of the partition. - Construct Connectivity Graph: A connectivity graph is built where nodes represent samples and edge weights reflect the similarity based on shared mutations.
- Max-Cut Optimization: The
max_cut_improve_cutfunction refines the initial split by iteratively moving nodes between partitions to maximize the weight of edges cut, effectively grouping samples with similar mutations. - Return Refined Split: The optimized left and right partitions are returned.
perform_split
Description
This method partitions a set of samples based on the most frequent mutation and optimizes the split using the max-cut criterion.Inputs
| Name | Type | Description |
|---|---|---|
| character_matrix | pd.DataFrame | Character matrix containing character states for each sample. |
| samples | List[int] | List of samples to be partitioned. |
| weights | Optional[Dict[int, Dict[int, float]]] | Weights for each (character, state) pair, usually derived from priors. |
| missing_state_indicator | int | Character representing missing data (default: -1). |
Outputs
| Name | Type | Description |
|---|---|---|
| improved_left_set | List[str] | List of samples in the left partition after optimization. |
| improved_right_set | List[str] | List of samples in the right partition after optimization. |
Internal Logic
See the ‘Internal Logic’ section of theMaxCutGreedySolver class description.
Side Effects
Theperform_split method modifies the input left_set and right_set lists in-place during the missing data imputation and max-cut optimization steps.
Dependencies
This code depends on the following external libraries:| Dependency | Purpose |
|---|---|
| pandas | Handling and manipulating character matrices. |
| networkx | Constructing and manipulating graphs for representing relationships between samples. |
| numpy | Numerical operations, particularly in calculating scores and frequencies. |
Error Handling
This code does not implement specific error handling beyond what is inherited from the parent classGreedySolver.