finaletoolkit.frag.delfi(input_file: str, chrom_sizes: str, bins_file: str, reference_file: str, blacklist_file: str = None, gap_file: str | GenomeGaps = None, output_file: str = None, no_gc_correct: bool = False, gc_correct: bool | None = None, remove_nocov: bool = True, merge_bins: bool = True, window_size: int = 5000000, quality_threshold: int = 30, workers: int = 1, verbose: int | bool = False) DataFrame#

A function that replicates the methodology of Christiano et al (2019).

  • input_file (str) – Path string pointing to a bam file containing PE fragment reads.

  • chrom_sizes (str) – Path string to a chrom.sizes file containing only autosomal chromosomes

  • bins_file (str) – Path string to a BED file containing 100kb bins for reference genome of choice.

  • reference_file (str) – Path string to .2bit file for reference genome.

  • gap_file (str or GenomeGaps) – Specifies locations of telomeres and centromeres for reference genome. There are three options: - Path string to a BED4+ file where each interval is a centromere or telomere. A bed file can be used only if the fourth field for each entry corresponding to a telomere or centromere is labled “telomere” or “centromere, respectively. - String naming reference genome used. Options are “b37”, “hg19”, “hg38”, and “GRCh38”. - Alternatively, a finaletoolkit.genome.GenomeGaps with gap info associated with the reference genome of choice may be used.

  • blacklist_file (str) – Path string to BED file containing genome blacklist regions.

  • output_file (str, optional) – Path to output tsv.

  • no_gc_correct (bool) – Skip gc-correction. Default is False.

  • gc_correct (bool, optional) – Deprecated command to perform gc-correction. Use no_gc_correct instead.

  • remove_nocov (bool) – Remove two windows described by Cristiano et al (2019) as low coverage. These windows might not apply to reference genomes other than hg19. Default is True.

  • merge_bins (bool) – Perform merging from 100kb bins to 5Mb bins. Default is True.

  • window_size (int) – Size (in bases) of non-overlapping windows to cover genome. Default is 5000000.

  • workers (int, optional) – Number of worker processes to use. Default is 1.

  • verbose (int or bool, optional) – Determines how many print statements and loading bars appear in stdout. Default is False.


Results of delfi analysis, with column names corresponding to those generated by the original author’s scripts.

Return type:

pandas DataFrame

finaletoolkit.frag.delfi_gc_correct(windows: DataFrame, alpha: float = 0.75, it: int = 8, verbose: bool = False)#

Helper function that takes window data and performs GC adjustment.

finaletoolkit.frag.delfi_merge_bins(hundred_kb_bins: DataFrame, gc_corrected: bool = True, verbose: bool = False) DataFrame#