DELFI#

finaletoolkit.frag.delfi(input_file: str, autosomes: str, bins_file: str, reference_file: str, blacklist_file: str = None, gap_file: Union(str, GenomeGaps) = None, output_file: str = None, gc_correct: bool = True, merge_bins: bool = True, window_size: int = 5000000, subsample_coverage: float = 2, quality_threshold: int = 30, workers: int = 1, preprocessing: bool = True, verbose: int | bool = False) pandas.DataFrame#

A function that replicates the methodology of Christiano et al (2019).

Parameters#

input_file: str

Path string pointing to a bam file containing PE fragment reads.

autosomes: str

Path string to a .genome file containing only autosomal chromosomes

bins_file: str

Path string to a BED file containing 100kb bins for reference genome of choice. Cristiano et al uses

reference_file: str

Path string to .2bit file.

blacklist_file: str

Path string to BED file containing genome blacklist.

gap_file: str

Path string to a BED4+ file where each interval is a centromere or telomere. A bed file can be used only if the fourth field for each entry corresponding to a telomere or centromere is labled “telomere” or “centromere, respectively.

output_file: str, optional

Path to output tsv.

window_size: int

Size of non-overlapping windows to cover genome. Default is 5 megabases.

subsample_coverage: int, optional

The depth at which to subsample the input_bam. Default is 2.

workers: int, optional

Number of worker processes to use. Default is 1.

preprocessing: bool, optional

Christiano et al (2019)

verbose: int or bool, optional

Determines how many print statements and loading bars appear in stdout. Default is False.

finaletoolkit.frag.delfi_gc_correct(windows: DataFrame, alpha: float = 0.75, it: int = 8, verbose: bool = False)#

Helper function that takes window data and performs GC adjustment.

finaletoolkit.frag.delfi_merge_bins(hundred_kb_bins: DataFrame, gc_corrected: bool = True, add_chr: bool = False, verbose: bool = False)#