test/preprocess_tests/filter_molecule_table_test.py file:
High-level description
This file contains unit tests for thefilter_molecule_table function in the Cassiopeia preprocessing pipeline. It tests various aspects of the function, including filtering based on UMI and cell barcode counts, handling of doublets, error correction of integration barcodes (intBCs), and allowing for allele conflicts.
Code Structure
The main classTestFilterMolculeTable inherits from unittest.TestCase and contains several test methods. Each test method focuses on a specific aspect of the filter_molecule_table function’s behavior.
Symbols
TestFilterMolculeTable
Description
This is the main test class that contains all the unit tests for thefilter_molecule_table function.
Internal Logic
- Sets up test data in the
setUpmethod. - Defines various test methods to check different aspects of the
filter_molecule_tablefunction. - Cleans up temporary directories in the
tearDownmethod.
setUp
Description
Initializes test data for use in the test methods.Internal Logic
- Creates a base case DataFrame (
self.base_filter_case) with sample data. - Creates a DataFrame for testing doublet handling (
self.doublets_case). - Creates a DataFrame for testing intBC error correction (
self.intBC_case). - Sets up a temporary directory for output files.
test_format
Description
Tests if the output DataFrame fromfilter_molecule_table has the expected columns.
Internal Logic
- Calls
filter_molecule_tablewith the base case data. - Checks if the resulting DataFrame contains all the expected columns.
test_umi_and_cellbc_filter
Description
Tests the UMI and cell barcode filtering functionality offilter_molecule_table.
Internal Logic
- Calls
filter_molecule_tablewith specific filtering parameters. - Checks if the resulting DataFrame contains only the expected alignments after filtering.
test_doublet_and_map
Description
Tests the doublet handling and mapping functionality offilter_molecule_table.
Internal Logic
- Calls
filter_molecule_tablewith doublet-specific test data and parameters. - Verifies if the resulting DataFrame contains the expected alleles after doublet handling.
test_error_correct_intBC
Description
Tests the integration barcode (intBC) error correction functionality offilter_molecule_table.
Internal Logic
- Calls
filter_molecule_tablewith intBC-specific test data. - Checks if the resulting DataFrame contains the expected corrected intBCs.
test_filter_allow_conflicts
Description
Tests theallow_allele_conflicts parameter of filter_molecule_table.
Internal Logic
- Calls
filter_molecule_tablewithallow_allele_conflicts=True. - Verifies if the resulting DataFrame retains the expected allele conflicts.
tearDown
Description
Cleans up the temporary directory created during the tests.Internal Logic
Removes the temporary directory and its contents.Dependencies
- unittest: Python’s built-in unit testing framework
- shutil: For file and directory operations
- tempfile: For creating temporary directories
- numpy: For numerical operations
- pandas: For data manipulation and analysis
- cassiopeia: The main package being tested
filter_molecule_table function, which is an important part of the Cassiopeia preprocessing pipeline. It covers various edge cases and scenarios that the function might encounter when processing real data.