High-level description
This code defines a functionmap_intbcs that processes a molecule table to resolve allele ambiguity for each cellBC/intBC pair. It identifies the most frequent allele based on read count and UMI count, then filters out alignments with other alleles for each pair.
Symbols
map_intbcs
Description
This function takes a molecule table as input and returns a modified allele table where each cellBC/intBC pair is associated with a single allele. It achieves this by grouping the input table by cellBC, intBC, and allele, then selecting the allele with the highest read count (and UMI count as a tie-breaker) for each group. Alignments with other alleles for the same cellBC/intBC pair are removed.Inputs
| Name | Type | Description |
|---|---|---|
| molecule_table | pandas.DataFrame | A molecule table containing cellBC, intBC, allele, readCount, and UMI information. |
Outputs
| Name | Type | Description |
|---|---|---|
| mapped_table | pandas.DataFrame | A modified allele table where each cellBC/intBC pair is associated with a single allele. |
Internal Logic
- Drops rows with missing
intBCvalues from the inputmolecule_table. - Groups the remaining rows by
cellBC,intBC, andallele. - Aggregates the groups by summing
readCountand countingUMI. - Sorts the aggregated table first by
UMIin descending order, then byreadCountin descending order. - Identifies duplicate
cellBC/intBCpairs, keeping only the first occurrence (which corresponds to the highest read and UMI count). - Creates a set of tuples representing the selected
cellBC,intBC, andallelecombinations. - Filters the original
molecule_tableto keep only rows where thecellBC,intBC, andallelecombination exists in the set created in step 6. - Logs the number of removed alleles and UMIs during the filtering process.
- Returns the filtered
molecule_table.
References
logger: Used for logging debug messages. Imported fromcassiopeia.mixins.utilities.log_molecule_table: A decorator function used to log statistics of the molecule table before and after the function execution. Imported fromcassiopeia.preprocess.
Side Effects
- Logs debug messages indicating the number of alleles and UMIs removed during the filtering process.
