DELFI#

finaletoolkit.frag.delfi(input_file: str, autosomes: str, bins_file: str, reference_file: str, blacklist_file: str | None = None, gap_file: str | GenomeGaps | None = None, output_file: str | None = None, gc_correct: bool = True, remove_nocov: bool = True, merge_bins: bool = True, window_size: int = 5000000, quality_threshold: int = 30, workers: int = 1, verbose: int | bool = False) DataFrame#

A function that replicates the methodology of Christiano et al (2019).

Parameters:
  • input_file (str) – Path string pointing to a bam file containing PE fragment reads.

  • autosomes (str) – Path string to a chrom.sizes file containing only autosomal chromosomes

  • bins_file (str) – Path string to a BED file containing 100kb bins for reference genome of choice.

  • reference_file (str) – Path string to .2bit file for reference genoe.

  • gap_file (str or GenomeGaps) – Specifies locations of telomeres and centromeres for reference genome. There are three options: - Path string to a BED4+ file where each interval is a centromere or telomere. A bed file can be used only if the fourth field for each entry corresponding to a telomere or centromere is labled “telomere” or “centromere, respectively. - String naming reference genome used. Options are “b37”, “hg19”, “hg38”, and “GRCh38”. - Alternatively, a finaletoolkit.genome.GenomeGaps with gap info associated with the reference genome of choice may be used.

  • blacklist_file (str) – Path string to BED file containing genome blacklist regions.

  • output_file (str, optional) – Path to output tsv.

  • gc_correct (bool) – Perform gc-correction. Default is True.

  • remove_nocov (bool) – Remove two windows described by Cristiano et al (2019) as low coverage. These windows might not apply to reference genomes other than hg19. Default is True.

  • merge_bins (bool) – Perform merging from 100kb bins to 5Mb bins. Default is True.

  • window_size (int) – Size (in bases) of non-overlapping windows to cover genome. Default is 5000000.

  • workers (int, optional) – Number of worker processes to use. Default is 1.

  • verbose (int or bool, optional) – Determines how many print statements and loading bars appear in stdout. Default is False.

finaletoolkit.frag.delfi_gc_correct(windows: DataFrame, alpha: float = 0.75, it: int = 8, verbose: bool = False)#

Helper function that takes window data and performs GC adjustment.

finaletoolkit.frag.delfi_merge_bins(hundred_kb_bins: DataFrame, gc_corrected: bool = True, verbose: bool = False) DataFrame#