High-level description
The code defines a classSFS that represents a Site Frequency Spectrum (SFS), a fundamental concept in population genetics used to summarize genetic variation within a population. It provides methods to calculate various statistics from the SFS, such as Tajima’s D, Fay and Wu’s H, Zeng’s E, and Ferretti’s L, which are used to infer demographic history and selection pressures.
Code Structure
TheSFS class is the main component of the code. It holds the SFS data (counts) and provides methods to calculate different statistics. The statistics calculation methods (fay_and_wus_H, zengs_E, tajimas_D, ferrettis_L) rely on the SFS data and use helper methods (_calculate_fay_and_wus_H, _calculate_zengs_E, _calculate_tajimas_D, _calculate_ferrettis_L) to perform the actual calculations. The bin method is used to group the SFS data into bins for easier visualization and analysis.
Symbols
Symbol Name
Description
This class represents a Site Frequency Spectrum (SFS) and provides methods for calculating various statistics from it.Inputs
| Name | Type | Description |
|---|---|---|
| counts | numpy.ndarray | An array representing the counts of mutations at different frequencies. |
| T | cassiopeia.data.CassiopeiaTree | Optional. A CassiopeiaTree object representing the phylogenetic tree. |
Outputs
AnSFS object.
Internal Logic
The constructor initializes thecounts and T attributes. It also initializes several attributes to None, which will be calculated lazily when their corresponding methods are called.
Symbol Name
Description
This class method constructs an SFS object from a given phylogenetic tree.Inputs
| Name | Type | Description |
|---|---|---|
| T | cassiopeia.data.CassiopeiaTree | A CassiopeiaTree object representing the phylogenetic tree. |
Outputs
AnSFS object.
Internal Logic
The method iterates through the clades of the tree and counts the number of mutations at each frequency (number of leaves in each clade). It then creates anSFS object with the calculated counts and the input tree.
Symbol Name
Description
This method bins the SFS data into specified bins for easier visualization and analysis.Inputs
| Name | Type | Description |
|---|---|---|
| bins | str or array-like | Specifies the binning strategy. Can be “linear”, “log”, “logit”, or an array-like object defining the bin edges. |
| n_bins | int | Optional. Number of bins to use if bins is “linear” or “log”. |
| min_edge | float | Optional. Minimum edge for binning if bins is “linear” or “log”. |
| max_edge | float | Optional. Maximum edge for binning if bins is “linear” or “log”. |
Outputs
None. The method modifies theSFS object in place, adding attributes related to the binning.
Internal Logic
The method first determines the bin edges and centers based on thebins argument and optional parameters. It then bins the SFS data by counting the number of mutations falling into each bin. The binned data is normalized by the bin size. If a tree is provided, the method also calculates a cut version of the normalized binned data, setting bins beyond the tree size to np.nan.
Symbol Name
Description
This method calculates and returns Fay and Wu’s H statistic from the SFS.Inputs
None.Outputs
| Name | Type | Description |
|---|---|---|
| fay_and_wus_H | float | Fay and Wu’s H statistic. |
Internal Logic
The method first checks if the statistic has been previously calculated and cached. If not, it calls the_calculate_fay_and_wus_H method to calculate it.
Symbol Name
Description
This private method calculates Fay and Wu’s H statistic from the SFS data.Inputs
None.Outputs
None. The method modifies theSFS object in place, setting the _fay_and_wus_H attribute.
Internal Logic
The method calculates the statistic based on the formula:theta_pi - theta_H, where theta_pi and theta_H are calculated from the SFS data.
Symbol Name
Description
This method calculates and returns Zeng’s E statistic from the SFS.Inputs
None.Outputs
| Name | Type | Description |
|---|---|---|
| zengs_E | float | Zeng’s E statistic. |
Internal Logic
The method first checks if the statistic has been previously calculated and cached. If not, it calls the_calculate_zengs_E method to calculate it.
Symbol Name
Description
This private method calculates Zeng’s E statistic from the SFS data.Inputs
None.Outputs
None. The method modifies theSFS object in place, setting the _zengs_E attribute.
Internal Logic
The method calculates the statistic based on the formula:theta_L - theta_W, where theta_L and theta_W are calculated from the SFS data.
Symbol Name
Description
This method calculates and returns Tajima’s D statistic from the SFS.Inputs
None.Outputs
| Name | Type | Description |
|---|---|---|
| tajimas_D | float | Tajima’s D statistic. |
Internal Logic
The method first checks if the statistic has been previously calculated and cached. If not, it calls the_calculate_tajimas_D method to calculate it.
Symbol Name
Description
This private method calculates Tajima’s D statistic from the SFS data.Inputs
None.Outputs
None. The method modifies theSFS object in place, setting the _tajimas_D attribute.
Internal Logic
The method calculates the statistic based on the formula:theta_pi - theta_W, where theta_pi and theta_W are calculated from the SFS data.
Symbol Name
Description
This method calculates and returns Ferretti’s L statistic from the SFS.Inputs
None.Outputs
| Name | Type | Description |
|---|---|---|
| ferrettis_L | float | Ferretti’s L statistic. |
Internal Logic
The method first checks if the statistic has been previously calculated and cached. If not, it calls the_calculate_ferrettis_L method to calculate it.
Symbol Name
Description
This private method calculates Ferretti’s L statistic from the SFS data.Inputs
None.Outputs
None. The method modifies theSFS object in place, setting the _ferrettis_L attribute.
Internal Logic
The method calculates the statistic based on the formula:theta_W - theta_H, where theta_W and theta_H are calculated from the SFS data.
