DELFI#

finaletoolkit.frag.delfi(input_file: str, autosomes: str, bins_file: str, reference_file: str, blacklist_file: str = None, gap_file: Union(str, GenomeGaps) = None, output_file: str = None, gc_correct: bool = True, merge_bins: bool = True, window_size: int = 5000000, subsample_coverage: float = 2, quality_threshold: int = 30, workers: int = 1, preprocessing: bool = True, verbose: int | bool = False) → pandas.DataFrame#

A function that replicates the methodology of Christiano et al (2019).

Parameters#

input_file: str: Path string pointing to a bam file containing PE fragment reads.
autosomes: str: Path string to a .genome file containing only autosomal chromosomes
bins_file: str: Path string to a BED file containing 100kb bins for reference genome of choice. Cristiano et al uses
reference_file: str: Path string to .2bit file.
blacklist_file: str: Path string to BED file containing genome blacklist.
gap_file: str: Path string to a BED4+ file where each interval is a centromere or telomere. A bed file can be used only if the fourth field for each entry corresponding to a telomere or centromere is labled “telomere” or “centromere, respectively.
output_file: str, optional: Path to output tsv.
window_size: int: Size of non-overlapping windows to cover genome. Default is 5 megabases.
subsample_coverage: int, optional: The depth at which to subsample the input_bam. Default is 2.
workers: int, optional: Number of worker processes to use. Default is 1.
preprocessing: bool, optional: Christiano et al (2019)
verbose: int or bool, optional: Determines how many print statements and loading bars appear in stdout. Default is False.

finaletoolkit.frag.delfi_gc_correct(windows: DataFrame, alpha: float = 0.75, it: int = 8, verbose: bool = False)#: Helper function that takes window data and performs GC adjustment.

finaletoolkit.frag.delfi_merge_bins(hundred_kb_bins: DataFrame, gc_corrected: bool = True, add_chr: bool = False, verbose: bool = False)#