High-level description
This file provides functions for estimating parameters related to lineage tracing, specifically the mutation rate and missing data rates. These functions are used to infer the underlying biological processes from observed data in a CassiopeiaTree.References
This code references theCassiopeiaTree class from cassiopeia.data.CassiopeiaTree and the ParameterEstimateError and ParameterEstimateWarning exceptions from cassiopeia.mixins.
Symbols
get_proportion_of_missing_data
Description
Calculates the proportion of missing entries in the character matrix of a CassiopeiaTree. The missing state is indicated by thetree.missing_state_indicator attribute.
Inputs
| Name | Type | Description |
|---|---|---|
| tree | CassiopeiaTree | The CassiopeiaTree object containing the character matrix. |
| layer | str, optional | The name of the layer in the tree.layers dictionary to use for the character matrix. If None, the default character_matrix is used. |
Outputs
| Name | Type | Description |
|---|---|---|
| missing_proportion | float | The proportion of missing cell/character entries in the character matrix. |
Internal Logic
The function counts the number of entries in the character matrix that are equal to themissing_state_indicator. This count is then divided by the total number of entries in the matrix to calculate the proportion of missing data.
Error Handling
Raises aParameterEstimateError if no character matrix is found in the tree.
get_proportion_of_mutation
Description
Calculates the proportion of mutated entries in the character matrix of a CassiopeiaTree, excluding missing entries.Inputs
| Name | Type | Description |
|---|---|---|
| tree | CassiopeiaTree | The CassiopeiaTree object containing the character matrix. |
| layer | str, optional | The name of the layer in the tree.layers dictionary to use for the character matrix. If None, the default character_matrix is used. |
Outputs
| Name | Type | Description |
|---|---|---|
| mutation_proportion | float | The proportion of non-missing cell/character entries that are mutated. |
Internal Logic
The function first calculates the number of missing entries. Then, it counts the number of non-missing entries that are not equal to 0 (uncut state). This count is then divided by the total number of non-missing entries to calculate the proportion of mutated entries.Error Handling
Raises aParameterEstimateError if no character matrix is found in the tree.
estimate_mutation_rate
Description
Estimates the mutation rate from a CassiopeiaTree, considering either a continuous or per-generation model.Inputs
| Name | Type | Description |
|---|---|---|
| tree | CassiopeiaTree | The CassiopeiaTree object containing the character matrix and tree topology. |
| continuous | bool, optional | Whether to calculate a continuous mutation rate (accounting for branch lengths) or a discrete per-generation rate. Defaults to True. |
| assume_root_implicit_branch | bool, optional | Whether to assume an implicit branch leading from the root if it doesn’t exist. Defaults to True. |
| layer | str, optional | The name of the layer in the tree.layers dictionary to use for the character matrix. If None, the default character_matrix is used. |
Outputs
| Name | Type | Description |
|---|---|---|
| mutation_rate | float | The estimated mutation rate. |
Internal Logic
The function first retrieves the proportion of mutated entries from the tree’s parameters or calculates it usingget_proportion_of_mutation. Then, it estimates the mutation rate based on the chosen model:
- Discrete (per-generation): The rate is estimated using the formula:
mutation_rate = 1 - (1 - mutation_proportion) ^ (1 / mean_depth), wheremean_depthis the average depth of the tree. - Continuous: The rate is estimated using the formula:
mutation_rate = -np.log(1 - mutation_proportion) / mean_time, wheremean_timeis the average time of the tree.
assume_root_implicit_branch is True and the tree doesn’t have a single leading edge from the root, an implicit branch is added to the calculation of mean_depth or mean_time.
Error Handling
Raises aParameterEstimateError if the mutation_proportion parameter is not between 0 and 1.
estimate_missing_data_rates
Description
Estimates both the stochastic missing probability and the heritable missing rate from a CassiopeiaTree, given one of the two parameters.Inputs
| Name | Type | Description |
|---|---|---|
| tree | CassiopeiaTree | The CassiopeiaTree object containing the character matrix and tree topology. |
| continuous | bool, optional | Whether to calculate a continuous missing rate (accounting for branch lengths) or a discrete per-generation rate. Defaults to True. |
| assume_root_implicit_branch | bool, optional | Whether to assume an implicit branch leading from the root if it doesn’t exist. Defaults to True. |
| stochastic_missing_probability | float, optional | The stochastic missing probability. If provided, overrides the value on the tree. |
| heritable_missing_rate | float, optional | The heritable missing rate. If provided, overrides the value on the tree. |
| layer | str, optional | The name of the layer in the tree.layers dictionary to use for the character matrix. If None, the default character_matrix is used. |
Outputs
| Name | Type | Description |
|---|---|---|
| stochastic_missing_probability | float | The stochastic missing probability. |
| heritable_missing_rate | float | The heritable missing rate. |
Internal Logic
The function first retrieves the total missing proportion from the tree’s parameters or calculates it usingget_proportion_of_missing_data. It then checks if either the stochastic missing probability or the heritable missing rate is provided as an argument or stored in the tree’s parameters. If both or neither are provided, it raises a ParameterEstimateError.
If only one of the parameters is provided, the function estimates the other parameter based on the chosen model (continuous or discrete) and the formula: total_missing_proportion = heritable_proportion + stochastic_proportion - heritable_proportion * stochastic_proportion.
The heritable proportion is calculated using either the average depth or average time of the tree, depending on the chosen model and whether to assume an implicit root branch.
Error Handling
- Raises a
ParameterEstimateErrorif:- The
total_missing_proportion,stochastic_missing_probability, orheritable_missing_ratehave invalid values (e.g., outside the range of 0 to 1). - Both or neither of
stochastic_missing_probabilityandheritable_missing_rateare provided.
- The
- Raises a
ParameterEstimateWarningif the estimated parameter is negative, indicating a potential issue with the provided parameter.
