High-level description
TheMaxCutSolver class is a subclass of GreedySolver that implements a graph-based algorithm for inferring phylogenetic trees. It aims to find an optimal partition of samples based on their shared and differing mutations, represented as a connectivity graph, by approximating the max-cut problem.
Code Structure
TheMaxCutSolver class inherits from GreedySolver and overrides the perform_split method. It utilizes the construct_connectivity_graph, evaluate_cut, and max_cut_improve_cut functions from the graph_utilities module to construct and manipulate the connectivity graph.
References
This code references the following symbols from other files:GreedySolver(cassiopeia/solver/GreedySolver.py)graph_utilities(cassiopeia/solver/graph_utilities.py)
Symbols
MaxCutSolver
Description
This class implements the MaxCut algorithm for phylogenetic tree inference. It inherits from theGreedySolver class and provides a concrete implementation for the perform_split method.
Inputs
| Name | Type | Description |
|---|---|---|
| sdimension | Optional[int] | The number of dimensions for the embedding space (default: 3) |
| iterations | Optional[int] | The number of iterations for updating the embeddings (default: 50) |
| prior_transformation | str | The transformation function to apply to priors (default: “negative_log”) |
Outputs
This class doesn’t directly return any output. It modifies the inputCassiopeiaTree object during the solve method.
Internal Logic
Theperform_split method performs the following steps:
- Construct Connectivity Graph: It builds a connectivity graph where nodes represent samples and edge weights represent the difference between the number of triplets separating and grouping the samples based on shared mutations.
- Embed Samples: Randomly embed the samples in a d-dimensional sphere.
- Update Embeddings: Iteratively update the embeddings based on edge weights, pulling together nodes with strong negative connections and pushing apart nodes with positive connections.
- Find Max Cut: Generate random hyperplanes to bisect the d-sphere and select the one that maximizes the cut weight, effectively partitioning the samples.
- Improve Cut: Apply a greedy hill-climbing procedure to further optimize the cut by iteratively moving nodes between partitions to maximize the cut weight.
Performance Considerations
The performance of the algorithm is influenced by thesdimension and iterations parameters. Higher values may lead to better solutions but increase computational cost.
evaluate_cut
Description
This function calculates the weight of a cut in a graph.Inputs
| Name | Type | Description |
|---|---|---|
| cut | List[str] | A list of nodes representing one side of the cut |
| G | nx.DiGraph | The graph on which the cut is evaluated |
Outputs
| Name | Type | Description |
|---|---|---|
| cut_score | float | The total weight of edges crossing the cut |
Internal Logic
The function iterates through each edge in the graph and checks if it crosses the cut (i.e., if one node is in thecut list and the other is not). If an edge crosses the cut, its weight is added to the cut_score.
