DELFI#
- finaletoolkit.frag.delfi(input_file: str, autosomes: str, bins_file: str, reference_file: str, blacklist_file: str = None, gap_file: Union(str, GenomeGaps) = None, output_file: str = None, gc_correct: bool = True, merge_bins: bool = True, window_size: int = 5000000, subsample_coverage: float = 2, quality_threshold: int = 30, workers: int = 1, preprocessing: bool = True, verbose: int | bool = False) pandas.DataFrame #
A function that replicates the methodology of Christiano et al (2019).
- Parameters:
input_file (str) – Path string pointing to a bam file containing PE fragment reads.
autosomes (str) – Path string to a .genome file containing only autosomal chromosomes
bins_file (str) – Path string to a BED file containing 100kb bins for reference genome of choice. Cristiano et al uses
reference_file (str) – Path string to .2bit file.
blacklist_file (str) – Path string to BED file containing genome blacklist.
gap_file (str) – Path string to a BED4+ file where each interval is a centromere or telomere. A bed file can be used only if the fourth field for each entry corresponding to a telomere or centromere is labled “telomere” or “centromere, respectively.
output_file (str, optional) – Path to output tsv.
window_size (int) – Size of non-overlapping windows to cover genome. Default is 5 megabases.
subsample_coverage (int, optional) – The depth at which to subsample the input_bam. Default is 2.
workers (int, optional) – Number of worker processes to use. Default is 1.
preprocessing (bool, optional) – Christiano et al (2019)
verbose (int or bool, optional) – Determines how many print statements and loading bars appear in stdout. Default is False.
- finaletoolkit.frag.delfi_gc_correct(windows: DataFrame, alpha: float = 0.75, it: int = 8, verbose: bool = False)#
Helper function that takes window data and performs GC adjustment.
- finaletoolkit.frag.delfi_merge_bins(hundred_kb_bins: DataFrame, gc_corrected: bool = True, add_chr: bool = False, verbose: bool = False)#