High-level description
TheIIDExponentialMLE class implements a Maximum Likelihood Estimator (MLE) for inferring branch lengths of a tree under the assumption of independent and identically distributed (IID) memoryless CRISPR/Cas9 mutations with an exponential waiting time. This estimator assumes a tree depth of 1 and utilizes convex optimization to find the mutation rate and branch lengths that maximize the likelihood of the observed character data.
Code Structure
TheIIDExponentialMLE class inherits from the abstract base class BranchLengthEstimator and implements the estimate_branch_lengths method. This method constructs and solves a convex optimization problem to estimate the branch lengths and mutation rate.
References
This code references the following key symbols:CassiopeiaTree: Used to represent the tree structure and character data.IIDExponentialMLEError: A custom exception class for errors specific to this estimator.
Symbols
IIDExponentialMLE
Description
This class implements the IID Exponential MLE algorithm for estimating branch lengths in a phylogenetic tree. It assumes that mutations occur independently and identically at each site with an exponential waiting time.Inputs
| Name | Type | Description |
|---|---|---|
| minimum_branch_length | float | Minimum allowed branch length (default: 0.01) |
| relative_mutation_rates | Optional[List[float]] | List of relative mutation rates for each character site (default: None, meaning all sites evolve at the same rate) |
| verbose | bool | Whether to print verbose output during optimization (default: False) |
| solver | str | Convex optimization solver to use (default: “SCS”, options: “ECOS”, “SCS”, “MOSEK”) |
Outputs
This class doesn’t directly return any outputs. Instead, it modifies the inputCassiopeiaTree object in place, updating the branch lengths and storing the estimated mutation rate and log-likelihood as attributes.
Internal Logic
- Data Validation: Checks for degenerate cases like no mutations or complete saturation in the character matrix.
- Problem Setup:
- Creates optimization variables for each node’s time.
- Defines constraints:
- Root node has time 0.
- Minimum branch length constraint.
- Ultrametric constraint for all leaves (except the reference leaf).
- Formulates the log-likelihood function based on the IID exponential mutation model.
- Optimization:
- Defines the objective function as maximizing the log-likelihood.
- Constructs and solves the convex optimization problem using the specified solver.
- Result Extraction:
- Extracts the estimated mutation rate from the scaling factor of the reference leaf’s time.
- Retrieves the log-likelihood value.
- Tree Update:
- Computes and sets the branch lengths based on the optimized node times.
- Updates the
CassiopeiaTreeobject with the estimated branch lengths.
Side Effects
- Modifies the input
CassiopeiaTreeobject in place by updating its branch lengths. - Sets the
mutation_rateandlog_likelihoodattributes of theIIDExponentialMLEobject.
Performance Considerations
- The choice of solver can impact the performance of the optimization.
- The
minimum_branch_lengthparameter can influence the convergence and runtime of the solver. - Specifying
relative_mutation_ratescan increase the complexity of the optimization problem.
Dependencies
| Dependency | Purpose |
|---|---|
| cvxpy | Used for defining and solving the convex optimization problem. |
| numpy | Used for numerical operations. |
Error Handling
- Raises
ValueErrorfor invalid input parameters or degenerate character matrices. - Raises
IIDExponentialMLEErrorif the solver fails or produces unexpected results.
