cassiopeia/tools/autocorrelation.py:
High-level description
This file provides utility functions for computing autocorrelation statistics on trees, specifically the Moran’s I statistic. It is designed to work with CassiopeiaTree objects and can handle both single and multiple variables for autocorrelation analysis.Symbols
compute_morans_i
Description
Computes Moran’s I statistic for a given CassiopeiaTree. This statistic measures the spatial autocorrelation of numerical data associated with the leaves of the tree.Inputs
| Name | Type | Description |
|---|---|---|
| tree | CassiopeiaTree | The tree structure on which to compute the statistic |
| meta_columns | Optional[List] | Columns in the tree’s cell_meta object to use for computation |
| X | Optional[pd.DataFrame] | Extra data matrix for computing autocorrelations |
| W | Optional[pd.DataFrame] | Phylogenetic weight matrix |
| inverse_weight_fn | Callable[[Union[int, float]], float] | Function to apply to weights if W is not provided |
Outputs
| Name | Type | Description |
|---|---|---|
| I | Union[float, pd.DataFrame] | Moran’s I statistic(s) |
Internal Logic
- Validates input data and combines meta_columns and X if both are provided.
- Ensures all data is numerical.
- Computes or uses the provided weight matrix W.
- Normalizes W and standardizes X.
- Computes Moran’s I using the formula: I = X’ * Wn * X.
Dependencies
| Dependency | Purpose |
|---|---|
| numpy | Numerical computations |
| pandas | Data manipulation |
| cassiopeia.data | CassiopeiaTree class and utilities |
| cassiopeia.mixins | Custom error classes |
Error Handling
The function raisesAutocorrelationError in the following cases:
- No data is specified for computing autocorrelations.
- The provided X dataframe doesn’t match the tree’s leaves.
- Non-numeric data is found in the specified columns.
- The weight matrix doesn’t match the tree’s leaves.
